Formatting rules for Tibetan text

From Digital Tibetan
Revision as of 12:01, 29 January 2010 by Domschl (Talk | contribs) (Basic formatting rules for Tibetan text)

Jump to: navigation, search

material collection / work in progress


[1] A short Tibetan text in OpenOffice (unformatted)

Current office programs like Microsoft office 2003 / 2007 and OpenOffice 3.x handle Tibetan script quite well, and once they are set up correctly, line-breaks are handled correctly in most cases. If your text processor breaks Tibetan syllables in the middle, you either need to update to a newer version, or check the setup for Microsoft Office or Open Office.

The following describes the formatting process using the example of the following short Tibetan text:

This example (see image [1]) shows quite a number of formatting short-comings:

  • There are shads at a beginning of a line, which is forbidden,
  • There is no difference in font size for headline, commentary at the end and main text,
  • There is no yig mgo ༄༅ or sbrul shad marking the start of the text.
  • There is no justification.

The following chapter shows how to enhance the formatting of our example.

Basic formatting rules for Tibetan text

[2] A short Tibetan text in OpenOffice (simple formatting)

Line breaking rules

  • line breaks must not occure in the middle of a syllable, (your word processor should take care of that already).
  • line breaks can appear after a syllable separator dot tsheg (preferably not in the middle of a Tibetan word). Exception: the sequence nga <tsheg> <shad> ང་། Here the tsheg is a so-called non-breaking tsheg.
  • Additionally line breaks are possible after shad , terma-sign gter ma and a visarga ཿ.

Inter syllable marker tsheg

  • There is never a tsheg after a visarga: Example oṃ āḥ huṃ, Wylie: oM AHhU~M, ཨོཾ་ཨཱཿཧཱུྃ་ there is no tsheg after āḥ.

White spaces

  • Tibetan uses only non-breaking spaces which do not vary in size on lift-right justification.

Usage of punctuation character shad

  • A line must not start with a shad .
  • A shad is used as a Tibetan inter-punctuation, similar but not identical to a comma. Verses, headlines or ends of longer paragraphs are ended by the sequence <shad> <space> <shad> ། །. Exception: if the last letter of a line is either a ka or a ga , one shad is ommitted. This is also the case if ka or ga have vowel-signs. A shad is *not* omitted if they have a sub- or superscript. Examples:
    • Incorrect: གི།, ཀུ། །,
    • Correct: གི, ཀུ །, སྐུ།, གྲུ། །.

Rules for replacing shad by rin chen spungs shad

  • In Tibetan, especially in pechas, it is considered a special case, if the last syllable of an expression that is terminated by a shad breaks to a new line. In that case the shad or double shad shad ། ། is replaced by rin chen spungs shad or ༑ ༑. This serves as an optitical indication that there is a left-over syllable at the beginning of the line that actually belongs to the preceding line.
    • a special case would be for example le'u: in a line starting with ལེའུ། །, no rin chen spungs shad would be used, since le'u is pronounced as two syllables.
    • Variants: some books-prints do not use rin chen spungs shad replacements, however the majority of books seems to apply the same rules as are used with pechas.
    • Sometimes in the sequence ། ། only the first shad is replaced: ༑ །, but this style is considered less beautiful.

Numbers and special signs

  • Numbers: Usually the Space character in Tibetan text is quite wide and occurs only after a shad or , gter ma , or visarga ཿ. Exception are numbers and embedded Western text (see below). Tibetan numbers are separated from left and right Tibetan letters by smaller spaces: Numbers-1.jpg
  • Terma signs: In case a section of text that is actually a gter ma, a single terma symbol replaces both shad and double shad ། །. Wood-block pechas sometimes simply the gter ma so that it looks like a visarga ཿ, but digital texts should use the proper terma sign .
  • Honorific marks: a honorific emphasis can be expressed by a special prefix , by colour, or by circles und the syllable as in the following example:
  • headlines

Head letters, yig mgo and sbrul shad

  • yig mgo, sdrul shad

Small print yig chung

  • yigchung

Mixed Tibetan and Western text

  • inline Western text

Image [2] shows the same text as image [1] with all formatting rules applied.

Advanced formatting

[3] A short Tibetan text in OpenOffice (formatting with left/right justification)
  • left/right justification (platforms, linux)
    • compress
    • non-breaking spaces
  • colorizing text (root-text, honorific text, yigchung)
  • pechas

OpenOffice tools

  • Tibetan Formatting tool
  • PDF tools


External References


<the following needs to be rewritten>

This style guide defines formatting, layout and styles used to publish Tibetan booklets, Pechas, and Tibetan-English practice books. A collection of current Unicode standards concerning encoding and formatting of Tibetan texts should be found here.

General rules for formating Tibetan texts

Titles and beginnings of a large text (larger than half a page) start with a sequence starting with yi mgo ༄༅། །. The second and following lines of the heading are indented to the position of the second shad །.

Indented text (like verses)

Headlines use paragraph breaks. Headlines may be colour coded using dark blue.

If a root text contains few comments, comments are written in a smaller font than the root text. If the majority of the text is comment, then the root text is marked by colour (dark red).

A shad ། never starts a new line.

If a shad ། appears after the first syllable of a new line, it is replaced by a rinchen spungs shad ༑. Equally the sequence after the first syllable of a line ། ། is replaced by ༑ ༑. This rule does not apply for the first line of a paragraph or a last shad of a paragraph. If it is an and of a section that is followed by yig chung, this rule still applies.

ཀ and ག are never directly followed by a shad །. Cases with two shad (verses) are written ག ། or ཀ །. This also applies for ཀ and ག with vowel markers, but does not apply if super- or subscripts exist as in the example of སྒ། །.

We developed a Extension that implements these rules in your document.

Ideally Tibetan text should be left and right justified.

Difficulties using typesetting software (focus: OpenOffice)

To solve the problems listed below, we wrote an Extension which handles these problems.

All spaces within Tibetan texts must be U+00A0 non-breaking space. To type them from the keyboard use <Ctrl><Space> (OOo handled wrongly non breaking spaces, however it is solved in version 2.4)

A shad ། should never be the first character in a new line. Non of the current text editors acknowledges that. Each reformatting of a given text can generate illegally positioned shad at the beginning of a line. When you have two shad ། ། make double sure to use non breaking spaces between them.

If the first syllable of a new line is followed by a shad །, this needs to be replaced by a rinchen spungs shad (རིན་ཆེན་སྤུངས་ཤད་ ) ༑. Non of the current text editors acknowledges that. Each reformatting of a given text can generate illegally positioned shad ། or rinchen spungs shad ༑.

Because of these previous problems it is very difficult to finalize correctly formatted documents. The slightest change, caused for example by changing a single letter causes reformatting of pages which will most likely cause a breach of the rules mentioned above. To solve this one should run the formating macro only on the final document, or rerun it after changes are applied.

It is illegal to break a line at a tsheg ་ in the middle of a Tibetan word.

Left- and right-justification of Tibetan text works only for 'small' font sizes in OpenOffice. This seems to be solved for the Linux version of OO 2.4. The formating macro solves this problem by inserting zero-wight-spaces (U+200B) after every thseg in the windows OOo version. It seems to work fine most of the time. Anyhow if the last character of the line is a non-breaking space (NBspace) it messes justification up in some cases. In order to solve this, one can replace these NBspaces with regular ones.