Formatting rules for Tibetan text
material collection / work in progress
- 1 Introduction
- 2 Basic formatting rules for Tibetan text
- 3 Advanced formatting
- 4 OpenOffice tools
- 5 Printing
- 6 External References
- 7 General rules for formating Tibetan texts
- 8 Difficulties using typesetting software (focus: OpenOffice)
Current office programs like Microsoft office 2003 / 2007 and OpenOffice 3.x handle Tibetan script quite well, and once they are set up correctly, line-breaks are handled correctly in most cases. If your text processor breaks Tibetan syllables in the middle, you either need to update to a newer version, or check the setup for Microsoft Office or Open Office.
The following describes the formatting process using the example of the following short Tibetan text:
- Download example: Tibetan OpenOffice sample (unformatted)
This example (see image ) shows quite a number of formatting short-comings:
- There are shads ། at a beginning of a line, which is forbidden,
- There is no difference in font size for headline, commentary at the end and main text,
- There is no yig mgo ༄༅ or sbrul shad ༈ marking the start of the text.
- There is no justification.
The following chapter shows how to enhance the formatting of our example.
Basic formatting rules for Tibetan text
Line breaking rules
- line breaks must not occure in the middle of a syllable, (your word processor should take care of that already).
- line breaks can appear after a syllable separator dot tsheg ་ (preferably not in the middle of a Tibetan word). Exception: the sequence nga <tsheg> <shad> ང་། Here the tsheg is a so-called non-breaking tsheg.
- Additionally line breaks are possible after shad །, terma-sign gter ma ༔ and a visarga ཿ.
Inter syllable marker tsheg
- There is never a tsheg after a visarga: Example oṃ āḥ huṃ, Wylie: oM AHhU~M, ཨོཾ་ཨཱཿཧཱུྃ་ there is no tsheg after āḥ.
- Tibetan uses only non-breaking spaces which do not vary in size on lift-right justification.
Usage of punctuation character shad
- A line must not start with a shad །.
- A shad ། is used as a Tibetan inter-punctuation, similar but not identical to a comma. Verses, headlines or ends of longer paragraphs are ended by the sequence <shad> <space> <shad> ། །. Exception: if the last letter of a line is either a ka ཀ or a ga ག, one shad ། is ommitted. This is also the case if ka ཀ or ga ག have vowel-signs. A shad ། is *not* omitted if they have a sub- or superscript. Examples:
- Incorrect: གི།, ཀུ། །,
- Correct: གི, ཀུ །, སྐུ།, གྲུ། །.
Rules for replacing shad by rin chen spungs shad
- In Tibetan, especially in pechas, it is considered a special case, if the last syllable of an expression that is terminated by a shad ། breaks to a new line. In that case the shad ། or double shad shad ། ། is replaced by rin chen spungs shad ༑ or ༑ ༑. This serves as an optitical indication that there is a left-over syllable at the beginning of the line that actually belongs to the preceding line.
- a special case would be for example le'u: in a line starting with ལེའུ། །, no rin chen spungs shad would be used, since le'u is pronounced as two syllables.
- Variants: some books-prints do not use rin chen spungs shad replacements, however the majority of books seems to apply the same rules as are used with pechas.
- Sometimes in the sequence ། ། only the first shad is replaced: ༑ །, but this style is considered less beautiful.
Numbers and special signs
- terma signs
- honorific marks (7, circle)
- yig mgo, sdrul shad
Small print yig chung
Mixed Tibetan and Western text
- inline Western text
Image  shows the same text as image  with all formatting rules applied.
- You can download the formatted example: Tibetan OpenOffice sample (simple formatting)
- Download example: Tibetan OpenOffice sample (justified Tibetan)
- left/right justification (platforms, linux)
- non-breaking spaces
- colorizing text (root-text, honorific text, yigchung)
- Tibetan Formatting tool
- PDF tools
<the following needs to be rewritten>
This style guide defines formatting, layout and styles used to publish Tibetan booklets, Pechas, and Tibetan-English practice books. A collection of current Unicode standards concerning encoding and formatting of Tibetan texts should be found here.
General rules for formating Tibetan texts
Titles and beginnings of a large text (larger than half a page) start with a sequence starting with yi mgo ༄༅། །. The second and following lines of the heading are indented to the position of the second shad །.
Indented text (like verses)
Headlines use paragraph breaks. Headlines may be colour coded using dark blue.
If a root text contains few comments, comments are written in a smaller font than the root text. If the majority of the text is comment, then the root text is marked by colour (dark red).
A shad ། never starts a new line.
If a shad ། appears after the first syllable of a new line, it is replaced by a rinchen spungs shad ༑. Equally the sequence after the first syllable of a line ། ། is replaced by ༑ ༑. This rule does not apply for the first line of a paragraph or a last shad of a paragraph. If it is an and of a section that is followed by yig chung, this rule still applies.
ཀ and ག are never directly followed by a shad །. Cases with two shad (verses) are written ག ། or ཀ །. This also applies for ཀ and ག with vowel markers, but does not apply if super- or subscripts exist as in the example of སྒ། །.
We developed a OpenOffice.org Extension that implements these rules in your document.
Ideally Tibetan text should be left and right justified.
Difficulties using typesetting software (focus: OpenOffice)
To solve the problems listed below, we wrote an OpenOffice.org Extension which handles these problems.
All spaces within Tibetan texts must be U+00A0 non-breaking space. To type them from the keyboard use <Ctrl><Space> (OOo handled wrongly non breaking spaces, however it is solved in version 2.4)
A shad ། should never be the first character in a new line. Non of the current text editors acknowledges that. Each reformatting of a given text can generate illegally positioned shad at the beginning of a line. When you have two shad ། ། make double sure to use non breaking spaces between them.
If the first syllable of a new line is followed by a shad །, this needs to be replaced by a rinchen spungs shad (རིན་ཆེན་སྤུངས་ཤད་ ) ༑. Non of the current text editors acknowledges that. Each reformatting of a given text can generate illegally positioned shad ། or rinchen spungs shad ༑.
Because of these previous problems it is very difficult to finalize correctly formatted documents. The slightest change, caused for example by changing a single letter causes reformatting of pages which will most likely cause a breach of the rules mentioned above. To solve this one should run the formating macro only on the final document, or rerun it after changes are applied.
It is illegal to break a line at a tsheg ་ in the middle of a Tibetan word.
Left- and right-justification of Tibetan text works only for 'small' font sizes in OpenOffice. This seems to be solved for the Linux version of OO 2.4. The formating macro solves this problem by inserting zero-wight-spaces (U+200B) after every thseg in the windows OOo version. It seems to work fine most of the time. Anyhow if the last character of the line is a non-breaking space (NBspace) it messes justification up in some cases. In order to solve this, one can replace these NBspaces with regular ones.