# Last edited on 2012-02-02 01:49:17 by stolfilocal FORMAT OF THE ".src" FILES The file contains a mix of "text" and "directives". A "stand-alone" directive must appear on a line by itself. An "embedded" directive may be inserted in text or in #-comments. Comments: (blank line) = treated like a comment # ... = #-comment (stand-alone) {...} = {}-comment (embedded) A #-comment should use constructs @{TEXT} to mark parts of the comment that are in the target language. In this way, encoding changes can be be applied to those parts of comments, too. Sectioning directives (stand-alone): @begin {TAG} = start of sub-section with tag TAG @end {TAG} = end of sub-section with tag TAG @section LEV {TAG} = start of level-LEV section with tag TAG The TAG must not include blanks or braces. Nested sections must have distinct TAGs. An "@end" may be omitted if it comes before another "@end". The directive "@section LEV {TAG}" is equivalent to (1) "@end" all open sections with level greater than or equal to LEV, then (2) "@begin TAG". Include directives (stand-alone): @include {FILE} = insert contents of FILE here Charset specs (stand-alone): @chars alpha {CHARS} = a word is `alpha' iff it uses these chars only. @chars symbol {CHARS} = any of these chars turns a word into a `symbol'. @chars punct {CHARS} = each of these chars is a word by itself. @chars blank {CHARS} = these chars are word separators. @chars null {CHARS} = these chars should be deleted. @chars invalid {CHARS} = these chars are not allowed (default). The CHARS must be ASCII SP or printable ISO-Latin-1 chars. ASCII SP is always implicitly included in the "blank" chars. The characters "#@{}" cannot be included in any chars. Characters "*", "รท" and "=" should be reserved for invalid (or unreadable) characters, significant line breaks in the original text (e.g. verse separators), and paragraph-like breaks (e.g. stanza separators), respectively. Word mapping directives (match /^[ ]*[@]/): @wordmap alpha {FILE} = words listed in FILE are "alpha" @wordmap symbol {FILE} = words listed in FILE are "symbol" @wordmap punct {FILE} = words listed in FILE are "punct" @wordmap blank {FILE} = words listed in FILE should be deleted @wordmap null {FILE} = words listed in FILE should be deleted @wordmap invalid {FILE} = words listed in FILE are not allowed The FILE must contain a list of words or word pairs, one per line. Word type directives (may be embedded): @TYPE{STRING} where TYPE is one of "a" ("alpha"), "s" ("symbol"), "p" ("punct"), "b" ("blank"), or "n" ("null"). The TEXT cannot contain any of the characters "#@{}". In the TEXT, SP is the only special character (word separator) and the @wordmap directives do not apply. Every non-blank line that is neither a stand-alone #-comment nor an @-directive is parsed from left to right into a sequence of zero or more colored words, delimited by spaces, as follows: 0. Let W, the current word, be empty. Insert spaces at both ends of the line. Repeat 1--9 below until the line is exhausted. 1. If the next character belongs to the "null" chars, delete it. 2. If the next thing on the line is an {}-comment, delete it. 3. If it is an @n{...} directive, delete it. 4. If it is an "alpha" "or "symbol" character, append it to the current word W. 5. If W is not empty, color it according to the "@chars" directives and the word tables, output it, and reset W to empty. 6. If the next thing is a @b{...} directive, delete it. 7. If it is a "punct" character, make it into a word by itself, color it "p", map it through the word tables, and output it. 8. If it is a @p{}, @s{} or @a{} directive, output the blank-delimited words in the argument, all colored "p", "s" or "a", respectively. 9. If it is none of the above, the input is invalid. In step 5, the word W is colored "alpha" iff it uses only characters from the "alpha" chars. Otherwise it must consist of one or more "symbol" characters possibly mixed with "alpha", and the whole word is colored "symbol. In steps 5 and 7, before the word W is written out it is looked up in the word tables. If the "alpha" table contains a pair "W X" then W is replaced by X and re-colored "alpha". Similarly for the other tables.