Formatting rules for Tibetan text

From Digital Tibetan
Revision as of 11:16, 29 January 2010 by Domschl (Talk | contribs)

Jump to: navigation, search

material collection / work in progress


[1] A short Tibetan text in OpenOffice (unformatted)

Current office programs like Microsoft office 2003 / 2007 and OpenOffice 3.x handle Tibetan script quite well, and once they are set up correctly, line-breaks are handled correctly in most cases. If your text processor breaks Tibetan syllables in the middle, you either need to update to a newer version, or check the setup for Microsoft Office or Open Office.

The following describes the formatting process using the example of the following short Tibetan text:

This example (see image [1]) shows quite a number of formatting short-comings:

  • There are shads at a beginning of a line, which is forbidden,
  • There is no difference in font size for headline, commentary at the end and main text,
  • There is no yig mgo ༄༅ or sbrul shad marking the start of the text.
  • There is no justification.

The following chapter shows how to enhance the formatting of our example.

Basic formatting rules for Tibetan text

[2] A short Tibetan text in OpenOffice (simple formatting)
  • line breaks must not occure in the middle of a syllable, (your word processor should take care of that already).
  • line breaks can appear after a syllable separator dot tsheg (preferably not in the middle of a Tibetan word). Exception: the sequence nga <tsheg> <shad> ང་། Here the tsheg is a so-called non-breaking tsheg.
  • Additionally line breaks are possible after shad , terma-sign gter ma and a visarga ཿ.
  • There is never a tsheg after a visarga: Example oṃ āḥ huṃ, Wylie: oM AHhU~M, ཨོཾ་ཨཱཿཧཱུྃ་ there is no tsheg after āḥ.
  • Tibetan uses only non-breaking spaces which do not vary in size on lift-right justification.
  • A line must not start with a shad .
  • A shad is used as a Tibetan inter-punctuation, similar but not identical to a comma. Verses, headlines or ends of longer paragraphs are ended by the sequence <shad> <space> <shad> ། །. Exception: if the last letter of a line is either a ka or a ga , one shad is ommitted. This is also the case if ka or ga have vowel-signs. A shad is *not* omitted if they have a sub- or superscript. Examples:
    • Incorrect: གི།, ཀུ། །,
    • Correct: གི, ཀུ །, སྐུ།, གྲུ། །.
  • Terma signs
  • rin chen spungs shad (book, pecha), one or two
  • numbers
  • honorific marks (7, circle)
  • headlines
  • yig mgo, sdrul shad
  • yigchung
  • inline Western text

Advanced formatting

[3] A short Tibetan text in OpenOffice (formatting with left/right justification)
  • left/right justification (platforms, linux)
    • compress
    • non-breaking spaces
  • colorizing text (root-text, honorific text, yigchung)
  • pechas

OpenOffice tools

  • Tibetan Formatting tool
  • PDF tools


External References


<the following needs to be rewritten>

This style guide defines formatting, layout and styles used to publish Tibetan booklets, Pechas, and Tibetan-English practice books. A collection of current Unicode standards concerning encoding and formatting of Tibetan texts should be found here.

General rules for formating Tibetan texts

Titles and beginnings of a large text (larger than half a page) start with a sequence starting with yi mgo ༄༅། །. The second and following lines of the heading are indented to the position of the second shad །.

Indented text (like verses)

Headlines use paragraph breaks. Headlines may be colour coded using dark blue.

If a root text contains few comments, comments are written in a smaller font than the root text. If the majority of the text is comment, then the root text is marked by colour (dark red).

A shad ། never starts a new line.

If a shad ། appears after the first syllable of a new line, it is replaced by a rinchen spungs shad ༑. Equally the sequence after the first syllable of a line ། ། is replaced by ༑ ༑. This rule does not apply for the first line of a paragraph or a last shad of a paragraph. If it is an and of a section that is followed by yig chung, this rule still applies.

ཀ and ག are never directly followed by a shad །. Cases with two shad (verses) are written ག ། or ཀ །. This also applies for ཀ and ག with vowel markers, but does not apply if super- or subscripts exist as in the example of སྒ། །.

We developed a Extension that implements these rules in your document.

Ideally Tibetan text should be left and right justified.

Difficulties using typesetting software (focus: OpenOffice)

To solve the problems listed below, we wrote an Extension which handles these problems.

All spaces within Tibetan texts must be U+00A0 non-breaking space. To type them from the keyboard use <Ctrl><Space> (OOo handled wrongly non breaking spaces, however it is solved in version 2.4)

A shad ། should never be the first character in a new line. Non of the current text editors acknowledges that. Each reformatting of a given text can generate illegally positioned shad at the beginning of a line. When you have two shad ། ། make double sure to use non breaking spaces between them.

If the first syllable of a new line is followed by a shad །, this needs to be replaced by a rinchen spungs shad (རིན་ཆེན་སྤུངས་ཤད་ ) ༑. Non of the current text editors acknowledges that. Each reformatting of a given text can generate illegally positioned shad ། or rinchen spungs shad ༑.

Because of these previous problems it is very difficult to finalize correctly formatted documents. The slightest change, caused for example by changing a single letter causes reformatting of pages which will most likely cause a breach of the rules mentioned above. To solve this one should run the formating macro only on the final document, or rerun it after changes are applied.

It is illegal to break a line at a tsheg ་ in the middle of a Tibetan word.

Left- and right-justification of Tibetan text works only for 'small' font sizes in OpenOffice. This seems to be solved for the Linux version of OO 2.4. The formating macro solves this problem by inserting zero-wight-spaces (U+200B) after every thseg in the windows OOo version. It seems to work fine most of the time. Anyhow if the last character of the line is a non-breaking space (NBspace) it messes justification up in some cases. In order to solve this, one can replace these NBspaces with regular ones.