Hacking at the Voynich manuscript Notebook - volume 5 Warning: these notebooks aren't strictly chronological logs. Sometimes I go back and redo things, clarify comments, delete garbage, etc. Summary of previous notebooks ============================= On 97-07-05 I obtained Landini's interlinear transcription of the VMs, version 1.6 (landini-interln16.evt) from http://sun1.bham.ac.uk/G.Landini/evmt/intrln16.zip I manually extracted from it a homogeneous, full-text sample bio-m-evt.evt, consisting of pages 147-166 (f75r--f84v) of the "biological" section, in Currier's Language B, hand 2. This section includes Currier's and Friedman's transcriptions. Currier's seems to be the most complete of them. The two versions have many differences (affecting 5-10% of the words), and often disagree even in the grouping of symbols: where one sees two words the other sees a single word, what is [A] for one may be [CI] for the other, and so on. So I decided to break all characters doen to individual "logical" strokes, and use one (computer) character to encode each stroke. I called this new encoding "jsa" (Jorge's Super-Analytic). After mapping to jsa, I generated a "consensus" version of the biological section cat bio-m-evt.evt \ | fsg2jsa \ > bio-m-jsa.evt cat bio-m-jsa.evt \ | make-consensus-interlin \ > bio-x-jsa.evt cat bio-x-jsa.evt \ | egrep '^<.*;J> ' \ | sed \ -e 's/{[^}]*}//g' \ > bio-j-jsa.evt extract-words-from-interlin \ -chars "qocilgysxju" \ bio-j-jsa.evt \ bio-j-jsa lines words bytes file ------ ------- --------- ------------ 7054 7054 62690 bio-j-jsa.wds 2132 2132 24925 bio-j-jsa.dic 4661 4661 40897 bio-j-jsa-gut.wds 992 992 9720 bio-j-jsa-gut.dic 840 840 2445 bio-j-jsa-fun.wds 2 2 5 bio-j-jsa-fun.dic 1553 1553 19348 bio-j-jsa-bad.wds 1138 1138 15200 bio-j-jsa-bad.dic Digraph counts: q o c i l g y s x j u TOT ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- . 1398 965 1877 361 60 . . . . . . 4661 q 1 . 1229 18 . 1 154 . . . 700 . 2103 o 21 486 1 63 1087 1071 . . . . . . 2729 c 4 167 176 6137 1209 232 2114 2921 1019 . . . 13979 i 4 1 1 8 1997 2 . . 560 1616 37 457 4683 l . . . . . . 16 . . . 1566 . 1582 g 52 . 74 2150 4 4 . . . . . . 2284 y 2790 26 2 47 13 43 . . . . . . 2921 s 463 1 99 1013 1 2 . . . . . . 1579 x 827 24 105 488 5 167 . . . . . . 1616 j 46 . 76 2175 6 . . . . . . . 2303 u 453 . 1 3 . . . . . . . . 457 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- TOT 4661 2103 2729 13979 4683 1582 2284 2921 1579 1616 2303 457 40897 Some conclusions we get from this and other data: The valid \i/ sequences are \ij/ \is/ \iis/ \iiu/ \iiiu/ \ix/; the others are likely to be scription or transcription errors. \ci/ and \o/ are lexically similar but distinct glyphs. The suffixes \ij/, \iis/, \iiu/, and \iiiu/ are preceded almost exclusively by \ci/ and strictly word-final. It seems plausible that these are errors: \oij/ (4 occurrences) should be \ciij/ ( 32 occurrences) \oiiu/ (2 occurrences) should be \ciiiu/ (109 occurrences) \ciiu/ (4 occurrences) should be \ciiiu/ (109 occurrences) \oiiiu/ (9 occurrences) should be \ciiiiu/ (329 occurrences) \ciiiiiu/ (4 occurrences) should be \ciiiiu/ (329 occurrences) \ciiix/ (2 occurrences) should be \ciix/ (403 occurrences) \ciiis/ (19 occurrences) may also be a misreading of \ciis/ (291 occurrences). \cg/ is always a glyph. \qo/ is a combination that occurs only in word-initial position. \qc/ is likely to be a misreading/miswriting of \qo/. \cy/ is always a glyph, almost certainly a final form of \ci/. \qj/, \lj/, \qg/, \lg/ are glyphs. \cs/ is a glyph closely related to (but distinct from) \c/. \ccg/ is almost always followed by \ci/ or \cy/. Here "glyph" means a group of strokes that can be treated as a single symbol for analysis; it may actually be part of a larger, still unrecognized symbol. Summarizing again: \iiiu/, \iiu/, \iis/, \ij/ The ziggies: strictly final, preceded always by \ci/ or, more rarely, by \o/. \ix/ Usually initial or preceded by \ci/ or \o/; followed by any letter except ziggies and \qo/, \ix/, \is/ \is/ Similar to \ix/ except that it cannot be followed by capitals or \cg/, either. \cy/ Almost always final, but occasionaly followed by other letters. Preceded by about the same letters as \ci/; indeed, it is probably the final form of \ci/. \cg/ May be followed by many letters, most often \cy/ and \ci/. Almost always prededed by \c/, or initial; rarely by \ix/ or \o/. \cs/ Most often followed by \c/, somewhat less often by \o/, \ci/, or word break. Most often initial, but also preceded by \ix/, gallows, \c/, \cy/, \cg/, \is/. \lj/, \qj/ The H-gallows: Very similar to each other, different from the rest, but somewhat similar to the P-gallows. They probably combine with \c/ on both sides to make glyphs. It is very likely that \l/ and \q/ are exactly equivalent. \lg/, \qg/ The P-gallows: Very similar to each other, different from the rest, but somewhat similar to the P-gallows. They probably combine with \c/ on both sides to make glyphs. It is very likely that \l/ and \q/ are exactly equivalent. They may be merely ornate forms of some letter, or several letters (\cg/, perhaps), used mainly in the first line of each paragraph (and perhaps of each page?) \qo/ Strictly initial, almost always followed by a capital. Sometimes misread as \qc/? \ci/ May be followed only by the ziggies, \ix/, or \ir/ only. Often follows a capital, but also \cg/, \cs/, \c/, \ix/, \is/, or word break. \o/ Similar to \a/, but is very often word-initial. Other conclusions: * The manuscript does not appear to use any hyphenation mark. Either words are not broken across lines, which would be unusual, or they are broken without any extra marks. Such word breaks may result in statistical anomalies at the beginning and end of lines. Could this explain Currier's claim that lines are "functional units"? * Note that parsing sequences like \cij/, \ciis/, and \ciiis/ requires some care: the right parsings are c+ij, c+iis, ci+iis. * The parsing of \ciis/ is ambiguous: ci+is or c+iis. Declaring \ciiis/ to be a misreading of \ciis/ would remove the ambiguity. * The parsing of \ciiiu/ is ambiguous, too; but since the \iu/ series does not seem to follow a bare \c/, it seems safe to parse it as ci+iiu. * The gallows characters \qj/ and \lj/ appear to be closely related: for every common word with \lj/, there appears to be a a word with \qj/ that occurs with about 1/4 the frequency. * There seems to be a kinship between the glyphs \cs/ (when not attached to the following \c/s) \ir/, and the gallows \lj/ and \qj/ (also, when unattached). * The same phenomenon can be noted with respect to prefixes containing \cc/ and \csc/: for every word beginning with \cc/, there is a word where the first \cc/ is replaced by \csc/, and practically the same frequency. * There apepars to be much confusion between the suffixes \iu/ and \iiiu/. They are almost surely distinct letters, but in about one half of the cases, Currier sees \iiu/ where Friedman has \iiiu/. * There appears to be much confusions between \o/ and \ci/. The strings of \c/, \cs/, \lj/, \qj/, \lg/, \qg/ must be treated together, after collapsing the glyphs listed above, since there seem to be glyphs consisting of gallows preceded and followed by \c/ or \cc/. When this is taken into account, we can see that a single \c/ is not a glyph, but \cs/ is. In fact, after shrinking \ci/ to `a', \cs/ to `z', the gallows to `H' or `P', the only possible glyphs of the form [czHp]* with length at most 3 are freq glyph ---- ----- 795 H 52 P 152 z 138 cc 70 zc 482 Hc 484 ccc 439 zcc ? 493 Hcc ? 19 cHc 4 cPc The ones marked `?' may be composite, z+cc and H+cc, but this hypothesis does not seem very likely (perhaps they are *sometimes* composite?) The significant strings of length 4 that cannot be parsed into the glyphs above are 20 cHcc 4 cPcc Strings with 4 or more [czHP]'s tend to be quite ambiguous. 97-08-14 stolfi =============== Hacked make-consensus-interlin to preserve word spaces (".") that appear in one text but not on the other. Also changed the handling of paired but mismatched character by not advancing a string if the matching had a low score and the string is significantly shorter than the other. Looking at the raw texts, it seems that the main source of "?"s is the confusion between "M" and "N" by Currier and/or Friedman. So I decided to (1) map both [N] and [M] (and other lookalikes) to "m", and (2) build the consensus with this encoding, rather than JSA. I christened the new encoding "hop": --- jsa2hop ------------------------ #! /n/gnu/bin/sed -f # Yet another stroke-level, error-resistant encoding s/[ql]j/H/g s/[ql]g/P/g s/cs/z/g s/ij/k/g s/ix/e/g s/is/r/g s/iiu/n/g s/y/i/g s/ci/a/g s/cg/8/g s/ir/w/g s/i*n/m/g ------------------------------------ --- fsg2hop ------------------------ #! /n/gnu/bin/gawk -f # Recoding an interlinear file from the FSG alphabet to # my Lossy Ad-hoc Semi-Analytic Fault-Tolerant encoding BEGIN { print "# Output of fsg2hop - Stolfi's Semi-Analytic Fault-Tolerant alphabet" } /^ *$/ { print; next } /^ *#/ { print; next } /^<[^>.;]*>/ { print; next } /^<[^>]*\.[^>]*;[A-Z]> / { curtxt = substr($0,20) # We discard "%" and "!" since the conversion # will destroy synchronism anyway. gsub(/[%!]/, "", curtxt); # First, the conversion from FSG to JSA (Stolfi's super-analytic) gsub(/IIIK/, "iiiij", curtxt); gsub(/IIIL/, "iiiiu", curtxt); gsub(/IIIR/, "iiiis", curtxt); gsub(/IIIE/, "iiiix", curtxt); gsub(/IIE/, "iiix", curtxt); gsub(/IIR/, "iiis", curtxt); gsub(/IIK/, "iiij", curtxt); gsub(/HZ/, "cqjc", curtxt); gsub(/PZ/, "cqgc", curtxt); gsub(/DZ/, "cljc", curtxt); gsub(/FZ/, "clgc", curtxt); gsub(/IE/, "iix", curtxt); gsub(/IR/, "iis", curtxt); gsub(/IK/, "iij", curtxt); gsub(/2/, "cs", curtxt); gsub(/4/, "q", curtxt); gsub(/6/, "cj", curtxt); gsub(/7/, "ig", curtxt); gsub(/8/, "cg", curtxt); gsub(/A/, "ci", curtxt); gsub(/C/, "c", curtxt); gsub(/D/, "lj", curtxt); gsub(/E/, "ix", curtxt); gsub(/F/, "lg", curtxt); gsub(/G/, "cy", curtxt); gsub(/H/, "qj", curtxt); gsub(/I/, "i", curtxt); gsub(/K/, "ij", curtxt); gsub(/L/, "iu", curtxt); gsub(/M/, "iiiu", curtxt); gsub(/N/, "iiu", curtxt); gsub(/O/, "o", curtxt); gsub(/P/, "qg", curtxt); gsub(/R/, "is", curtxt); gsub(/S/, "csc", curtxt); gsub(/T/, "cc", curtxt); gsub(/V/, "?", curtxt); gsub(/Y/, "?", curtxt); # Now, the conversion from JSA to HOP: gsub(/[ql]j/, "H", curtxt); gsub(/[ql]g/, "P", curtxt); gsub(/cs/, "z", curtxt); gsub(/ij/, "k", curtxt); gsub(/ix/, "e", curtxt); gsub(/is/, "r", curtxt); gsub(/iiu/, "n", curtxt); gsub(/y/, "i", curtxt); gsub(/ci/, "a", curtxt); gsub(/cg/, "8", curtxt); gsub(/ir/, "w", curtxt); gsub(/i*n/, "m", curtxt); print (substr($0,1,19) curtxt); next } ------------------------------------ Built the new encoded interlinear: cat bio-m-evt.evt \ | fsg2hop \ > bio-m-hop.evt Reran the consensus: cat bio-m-hop.evt \ | make-consensus-interlin \ > bio-x-hop.evt cat bio-x-hop.evt \ | egrep '^<.*;J> ' \ | sed \ -e 's/{[^}]*}//g' \ > bio-j-hop.evt The result was still not very good; an inserted word still can cause the program to lose sync for the rest of the line. But it seems to be better than the previous text. I created by hand a file bio-j-hop.evj, which is like bio-j-hop.evt except that it has " " instead of "." as word-space, and " //" instead of "-" for end-of-line, and " =" instead of "=" for end-of-paragraph. Extracted the text files: extract-words-from-interlin \ -chars "aocz8HPerqkmw" \ bio-j-hop.evt \ bio-j-hop lines words bytes file ------ ------- --------- ------------ 7670 7670 41815 bio-j-hop.wds 1510 1510 9982 bio-j-hop.dic 5894 5894 33804 bio-j-hop-gut.wds 949 949 6236 bio-j-hop-gut.dic 843 843 2464 bio-j-hop-fun.wds 5 5 24 bio-j-hop-fun.dic 933 933 5547 bio-j-hop-bad.wds 556 556 3722 bio-j-hop-bad.dic Compared to the previous version of the unencoded stroke-level (jsa) files: Note the increase in the number of words, from 7054 to 7670. Also increased the number of good words, from 4661 to 5894, and the consequent decrease in bad words, from 1553 to 933. There was a decrease in the number of distinct words, from 2132 to 1510. Also decreased was the number of good distinct words, from 992 to 949, and bad distinct words, from 1138 to 503. Digraph counts: a o c z 8 H P e r q k m w TT ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- . 251 1235 757 912 472 276 86 313 103 1489 . . . 5894 a 3196 2 4 19 26 14 78 2 491 345 5 39 802 23 5046 o 28 5 1 39 6 21 1776 68 1173 240 6 5 19 1 3388 c 10 1059 226 4047 44 1865 408 33 15 4 . . 5 . 7716 z 58 109 90 957 10 3 4 1 1 . . . . . 1233 8 64 2245 50 45 32 1 5 . 5 1 . . . . 2448 H 12 1125 98 1479 47 5 . . 9 . . . 1 . 2776 P 2 20 43 116 17 3 . . . . . . . . 201 e 1121 130 117 216 122 61 227 10 4 2 1 . . . 2011 r 514 90 48 24 15 3 1 . . . . . . . 695 q 1 5 1474 17 2 . 1 1 . . . . . . 1501 k 43 . 1 . . . . . . . . . . . 44 m 822 4 1 . . . . . . . . . . . 827 w 23 1 . . . . . . . . . . . . 24 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- TOT 5894 5046 3388 7716 1233 2448 2776 201 2011 695 1501 44 827 24 33804 Next-symbol probability (× 99): a o c z 8 H P e r q k m w TT -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- . 4 21 13 15 8 5 1 5 2 25 . . . 99 a 63 . . . 1 . 2 . 10 7 . 1 16 . 99 o 1 . . 1 . 1 52 2 34 7 . . 1 . 99 c . 14 3 52 1 24 5 . . . . . . . 99 z 5 9 7 77 1 . . . . . . . . . 99 8 3 91 2 2 1 . . . . . . . . . 99 H . 40 3 53 2 . . . . . . . . . 99 P 1 10 21 57 8 1 . . . . . . . . 99 e 55 6 6 11 6 3 11 . . . . . . . 99 r 73 13 7 3 2 . . . . . . . . . 99 q . . 97 1 . . . . . . . . . . 99 k 97 . 2 . . . . . . . . . . . 99 m 98 . . . . . . . . . . . . . 99 w 95 4 . . . . . . . . . . . . 99 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- TOT 17 15 10 23 4 7 8 1 6 2 4 0 2 0 99 Previous-symbol probability (× 99): a o c z 8 H P e r q k m w TT -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- . 5 36 10 73 19 10 42 15 15 98 . . . 17 a 54 . . . 2 1 3 1 24 49 . 88 96 95 15 o . . . 1 . 1 63 33 58 34 . 11 2 4 10 c . 21 7 52 4 75 15 16 1 1 . . 1 . 23 z 1 2 3 12 1 . . . . . . . . . 4 8 1 44 1 1 3 . . . . . . . . . 7 H . 22 3 19 4 . . . . . . . . . 8 P . . 1 1 1 . . . . . . . . . 1 e 19 3 3 3 10 2 8 5 . . . . . . 6 r 9 2 1 . 1 . . . . . . . . . 2 q . . 43 . . . . . . . . . . . 4 k 1 . . . . . . . . . . . . . 0 m 14 . . . . . . . . . . . . . 2 w . . . . . . . . . . . . . . 0 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- TOT 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 Rebuilt .fix.wds, .fix.dic: cat bio-j-hop.wds \ | sed -e '/?/s/^.*$/???/g' \ > .fix.wds cat .fix.wds \ | sort | uniq \ > .fix.dic cat .fix.wds \ | wfreq \ > .fix.frq lines words bytes file ------ ------- --------- ------------ 955 955 6264 .fix.dic 957 2871 17757 .fix.frq 7670 7670 40000 .fix.wds (Note the coincidence in the number of bytes of .fix.wds!) To do next: redo all statistics... Word location maps: cat bio-j-hop-gut.wds \ | enum-words-in-blocks -v WPB=100 \ | sort +1 -2 +0 -1n \ | make-word-location-map -v CTWD=1 -v PERCENT=1 -v NBLOCKS=59 \ > .baz cat bio-j-hop-gut.wds \ | enum-words-in-blocks -v WPB=590 \ | sort +1 -2 +0 -1n \ | make-word-location-map -v CTWD=3 -v PERCENT=1 -v NBLOCKS=10 \ > .bar Here is the coarse word location map of all the words with 10 or more occurrences: TOTAL AVG DEV WORD ABS FREQ BY BLOCK REL FREQ BY BLOCK WORD ----- ----- ----- --------- ----------------------------- ----------------------------- --------- 64 5.2 3.5 8ar 13 5 2 8 3 4 3 7 3 16 20 8 3 12 5 6 5 11 5 25 8ar 63 4.6 2.9 8ae 8 5 6 17 3 1 3 8 9 3 13 8 9 27 5 2 5 13 14 5 8ae 120 5.2 3.1 8am 10 15 13 15 6 6 8 16 12 19 8 12 11 12 5 5 7 13 10 16 8am 53 4.6 3.0 8a 10 5 3 7 4 2 6 10 2 4 19 9 6 13 7 4 11 19 4 7 8a 26 6.1 2.5 8oe . 1 2 5 2 2 3 4 2 5 . 4 8 19 8 8 11 15 8 19 8oe 40 3.4 2.8 oHar 11 8 1 4 4 4 3 2 . 3 27 20 2 10 10 10 7 5 . 7 oHar 43 5.5 2.7 oHae 3 4 . 5 7 4 7 4 2 7 7 9 . 12 16 9 16 9 5 16 oHae 105 5.1 2.5 oHam 5 14 6 4 13 28 10 9 10 6 5 13 6 4 12 26 9 8 9 6 oHam 31 4.4 2.6 oHa 2 7 2 4 1 6 3 4 1 1 6 22 6 13 3 19 10 13 3 3 oHa 11 5.8 2.5 oHoe . 1 1 . 2 3 1 . 1 2 . 9 9 . 18 27 9 . 9 18 oHoe 53 4.9 3.0 qoHar 10 4 2 . 9 9 3 6 6 4 19 7 4 . 17 17 6 11 11 7 qoHar 128 5.6 2.6 qoHae 9 7 5 22 8 15 14 20 19 9 7 5 4 17 6 12 11 15 15 7 qoHae 265 4.8 2.7 qoHam 32 11 34 30 25 41 28 21 36 7 12 4 13 11 9 15 10 8 13 3 qoHam 84 5.3 2.9 qoHa 12 4 6 5 6 12 9 13 9 8 14 5 7 6 7 14 11 15 11 9 qoHa 24 5.5 2.3 qoHoe . . 5 3 2 2 7 1 2 2 . . 21 12 8 8 29 4 8 8 qoHoe 7 5.5 2.4 oeHar 1 . . . 1 1 3 . 1 . 14 . . . 14 14 42 . 14 . oeHar 6 6.2 1.8 oeHae . . . . 2 1 2 . . 1 . . . . 33 17 33 . . 17 oehae 31 5.8 1.7 oeHam 1 . . 4 3 7 11 2 3 . 3 . . 13 10 22 35 6 10 . oeHam 12 5.3 2.6 oeHa 1 1 . 1 2 2 3 . . 2 8 8 . 8 17 17 25 . . 17 oeHa 11 6.7 1.8 Har . . . 1 1 3 . 3 2 1 . . . 9 9 27 . 27 18 9 Har 19 5.4 2.7 Hae 2 2 . . 3 3 5 . 2 2 10 10 . . 16 16 26 . 10 10 Hae 39 5.4 2.3 Ham 3 1 4 3 . 8 11 5 4 . 8 3 10 8 . 20 28 13 10 . Ham 6 3.5 2.8 Ha 2 1 . . . 2 . 1 . . 33 17 . . . 33 . 17 . . Ha 14 4.8 3.0 Hoe 3 . 2 1 . 1 4 1 1 1 21 . 14 7 . 7 28 7 7 7 Hoe 11 6.2 1.9 Poe . . 1 . 2 1 4 1 1 1 . . 9 . 18 9 36 9 9 9 Poe 19 4.8 2.5 ar 1 2 3 2 1 4 3 . 2 1 5 10 16 10 5 21 16 . 10 5 ar 20 3.6 2.5 ae . 9 3 . 2 . 5 . . 1 . 45 15 . 10 . 25 . . 5 ae 53 4.6 2.7 am 2 8 10 6 4 6 6 3 4 4 4 15 19 11 7 11 11 6 7 7 am 13 4.3 2.2 a 1 2 . 2 3 2 2 . 1 . 8 15 . 15 23 15 15 . 8 . a 213 5.3 2.6 oe 22 6 6 25 38 26 41 16 9 24 10 3 3 12 18 12 19 7 4 11 oe 66 4.9 2.8 or 7 7 1 8 15 6 9 2 1 10 11 11 2 12 23 9 14 3 2 15 or 12 4.2 2.8 zar 1 2 4 . . . 3 . 2 . 8 17 33 . . . 25 . 17 . zar 19 4.1 2.9 zae 3 3 2 3 2 1 . 2 2 1 16 16 10 16 10 5 . 10 10 5 zae 57 4.4 2.9 zam 5 10 10 6 7 2 1 4 8 4 9 17 17 10 12 3 2 7 14 7 zam 40 5.5 2.9 zoe 4 2 4 3 5 4 3 2 10 3 10 5 10 7 12 10 7 5 25 7 zoe 13 5.0 2.9 zor 2 1 . 2 1 2 2 . 2 1 15 8 . 15 8 15 15 . 15 8 zor 5 6.7 2.2 rar . . . 1 . 1 1 . 1 1 . . . 20 . 20 20 . 20 20 rar 6 5.3 2.5 rae . 1 . 1 . 2 1 . . 1 . 17 . 17 . 33 17 . . 17 rae 17 4.7 2.7 ram . 2 5 3 . . 1 4 1 1 . 12 29 17 . . 6 23 6 6 ram 5 5.7 3.1 ra 1 . . . 1 . 1 1 . 1 20 . . . 20 . 20 20 . 20 ra 10 5.9 2.2 roe 1 . . 1 . 2 1 5 . . 10 . . 10 . 20 10 50 . . roe 13 3.3 3.0 z 2 4 3 1 . 1 . . . 2 15 30 23 8 . 8 . . . 15 z 11 3.6 2.5 r 1 2 4 . 1 1 . 1 1 . 9 18 36 . 9 9 . 9 9 . r ----- ----- ----- --------- ----------------------------- ----------------------------- --------- 46 5.7 3.1 Hc8a 5 2 1 10 3 2 3 4 5 11 11 4 2 22 6 4 6 9 11 24 Hc8a 29 4.5 2.9 Hcc8a 4 2 4 5 3 2 4 . . 5 14 7 14 17 10 7 14 . . 17 Hcc8a 11 4.3 2.4 Hcca 1 1 2 . 2 4 . . . 1 9 9 18 . 18 36 . . . 9 Hcca 13 6.4 2.9 aHc8a 1 . . 3 1 . . 4 . 4 8 . . 23 8 . . 30 . 30 aHc8a 12 6.2 2.6 aHcc8a . 1 . 2 1 2 1 2 . 3 . 8 . 17 8 17 8 17 . 25 aHcc8a 10 4.4 2.0 aHcca 1 1 . 1 2 4 . 1 . . 10 10 . 10 20 40 . 10 . . aHcca 88 5.3 3.5 oHc8a 11 13 8 12 2 2 3 5 5 27 12 15 9 14 2 2 3 6 6 30 oHc8a 59 4.7 3.0 oHcc8a 5 12 3 5 7 9 3 3 3 9 8 20 5 8 12 15 5 5 5 15 oHcc8a 35 5.1 2.8 oHcca 3 6 1 . 4 7 4 5 2 3 8 17 3 . 11 20 11 14 6 8 oHcca 21 4.9 3.2 oeHc8a 2 4 1 3 1 . 5 . 1 4 9 19 5 14 5 . 24 . 5 19 oeHc8a 15 7.0 2.3 oeHcc8a 1 . . . 1 2 2 3 4 2 7 . . . 7 13 13 20 26 13 oeHcc8a 20 5.3 2.6 oeHcca 1 2 1 2 . 8 2 . 1 3 5 10 5 10 . 40 10 . 5 15 oeHcca 204 5.1 3.1 qoHc8a 24 10 27 39 10 10 11 23 18 32 12 5 13 19 5 5 5 11 9 16 qoHc8a 193 4.6 2.9 qoHcc8a 23 15 43 15 14 18 11 22 16 16 12 8 22 8 7 9 6 11 8 8 qoHcc8a 90 5.4 2.8 qoHcca 5 10 9 7 5 15 6 13 8 12 6 11 10 8 6 17 7 14 9 13 qoHcca 5 3.3 3.1 ecc8a 2 . 1 . 1 . . . 1 . 40 . 20 . 20 . . . 20 . ecc8a 53 4.5 2.7 eccc8a 3 7 11 7 5 . 4 8 8 . 6 13 21 13 9 . 7 15 15 . eccc8a 17 5.4 2.5 eccca 1 1 2 . 3 2 2 4 1 1 6 6 12 . 17 12 12 23 6 6 eccca 3 7.5 3.0 oecc8a . . . 1 . . . . . 2 . . . 33 . . . . . 66 oecc8a 24 4.5 3.3 oeccc8a 5 3 2 3 1 1 1 2 3 3 21 12 8 12 4 4 4 8 12 12 oeccc8a 11 4.6 3.6 oeccca 3 . 2 1 1 . . 1 . 3 27 . 18 9 9 . . 9 . 27 oeccca 24 5.9 3.1 cc8a 1 3 2 3 1 . 1 6 2 5 4 12 8 12 4 . 4 25 8 21 cc8a 211 4.9 3.0 ccc8a 19 25 29 25 18 9 18 23 22 23 9 12 14 12 8 4 8 11 10 11 ccc8a 89 5.2 3.0 ccca 8 15 8 2 4 9 10 12 16 5 9 17 9 2 4 10 11 13 18 6 ccca 6 3.8 3.3 zc8a . 4 . . . . . . 2 . . 66 . . . . . . 33 . zc8a 233 5.3 3.1 zcc8a 18 29 24 26 17 14 19 18 34 34 8 12 10 11 7 6 8 8 14 14 zcc8a 86 4.7 3.1 zcca 16 10 4 5 9 10 9 3 10 10 18 12 5 6 10 12 10 3 12 12 zcca 233 5.3 3.1 zcc8a 18 29 24 26 17 14 19 18 34 34 8 12 10 11 7 6 8 8 14 14 zcc8a 42 4.4 2.7 zccc8a 5 4 6 5 7 2 5 2 4 2 12 9 14 12 17 5 12 5 9 5 zccc8a 34 4.1 2.3 zccca 2 5 5 3 9 5 1 2 . 2 6 15 15 9 26 15 3 6 . 6 zccca 211 4.9 3.0 ccc8a 19 25 29 25 18 9 18 23 22 23 9 12 14 12 8 4 8 11 10 11 ccc8a 23 3.8 2.3 cccc8a 1 3 9 1 4 . 1 2 2 . 4 13 39 4 17 . 4 9 9 . cccc8a 35 5.2 2.6 cccca 3 1 5 3 4 4 3 8 3 1 8 3 14 8 11 11 8 23 8 3 cccca ----- ----- ----- --------- ----------------------------- ----------------------------- --------- 55 5.1 2.9 cccHca 4 8 4 3 6 7 7 6 3 7 7 14 7 5 11 13 13 11 5 13 cccHca 39 5.4 3.2 zccHca 3 8 1 1 4 4 4 2 4 8 8 20 3 3 10 10 10 5 10 20 zccHca 36 5.3 2.4 ccccHca 1 2 5 4 1 12 2 2 4 3 3 6 14 11 3 33 6 6 11 8 ccccHca 33 5.0 2.6 zcccHca 4 2 3 2 . 10 3 5 3 1 12 6 9 6 . 30 9 15 9 3 zcccHca 14 5.3 2.8 cccHa 2 . . 4 . 2 1 2 2 1 14 . . 28 . 14 7 14 14 7 cccHa 17 5.0 2.1 zccHa 1 1 2 . 1 9 1 1 . 1 6 6 12 . 6 52 6 6 . 6 zccHa 12 4.2 2.4 ccccHa 1 . 3 4 . 1 . 2 1 . 8 . 25 33 . 8 . 17 8 . ccccHa 13 3.5 2.9 zcccHa 4 1 2 1 1 1 1 . 2 . 30 8 15 8 8 8 8 . 15 . zcccHa ----- ----- ----- --------- ----------------------------- ----------------------------- --------- 10 4.6 2.9 8ccc8a 2 1 1 . . 1 2 3 . . 20 10 10 . . 10 20 30 . . 8ccc8a 17 4.3 2.9 8zcc8a 3 2 1 3 1 1 1 4 . 1 17 12 6 17 6 6 6 23 . 6 8zcc8a 18 4.4 2.7 Pccc8a 3 2 1 1 4 . 4 2 . 1 17 11 6 6 22 . 22 11 . 6 Pccc8a 18 4.8 2.7 Hccc8a 2 1 2 3 2 2 . 3 3 . 11 6 11 17 11 11 . 17 17 . Hccc8a 16 4.2 2.3 oPccc8a 2 1 1 2 5 3 . . 2 . 12 6 6 12 31 19 . . 12 . oPccc8a 14 4.9 3.3 oezcc8a 3 1 . 1 3 . 2 1 . 3 21 7 . 7 21 . 14 7 . 21 oezcc8a 22 5.0 3.0 ezcc8a 2 2 4 2 1 1 2 4 1 3 9 9 18 9 5 5 9 18 5 14 ezcc8a 11 6.2 3.0 qoHccc8a 1 . 1 2 . . 1 1 3 2 9 . 9 18 . . 9 9 27 18 qoHccc8a 10 4.7 3.8 oe8a 4 . . 1 . . . 2 2 1 40 . . 10 . . . 20 20 10 oe8a 28 6.2 2.4 cccoe . 1 4 2 1 2 6 3 7 2 . 4 14 7 4 7 21 11 25 7 cccoe 13 5.6 2.5 ccoe . 1 2 . 3 1 2 1 2 1 . 8 15 . 23 8 15 8 15 8 ccoe 23 5.1 3.1 oHca 2 4 1 2 2 2 4 1 . 5 9 17 4 9 9 9 17 4 . 22 oHca 44 4.6 3.0 qoHca 5 9 2 3 5 3 6 5 . 6 11 20 5 7 11 7 14 11 . 14 qoHca ----- ----- ----- --------- ----------------------------- ----------------------------- --------- 12 3.9 2.6 eHam 1 3 1 2 1 2 . 1 . 1 8 25 8 17 8 17 . 8 . 8 eHam 20 3.4 2.5 eoe 5 1 5 1 2 3 2 . . 1 25 5 25 5 10 15 10 . . 5 eoe 10 5.0 3.1 eor 2 1 . . 1 2 1 1 1 1 20 10 . . 10 20 10 10 10 10 eor 26 4.8 2.9 oea 4 3 . 1 6 3 4 1 1 3 15 11 . 4 23 11 15 4 4 11 oea 16 5.5 2.5 oeor . 3 1 . 1 4 2 2 2 1 . 19 6 . 6 25 12 12 12 6 oeor 124 4.9 2.8 qoe 16 6 11 12 22 8 15 13 13 8 13 5 9 10 18 6 12 10 10 6 qoe 14 4.5 2.9 zca 3 2 . . 1 2 3 2 1 . 21 14 . . 7 14 21 14 7 . zca 11 4.4 2.6 zccor 1 2 . 2 1 2 2 . . 1 9 18 . 18 9 18 18 . . 9 zccor 24 5.1 2.8 zccoe 3 . 2 3 6 1 1 3 3 2 12 . 8 12 25 4 4 12 12 8 zccoe 15 5.5 2.2 zcoe . 2 . . 6 . 3 2 1 1 . 13 . . 40 . 20 13 7 7 zcoe 97-08-16 stolfi =============== I found this in Jim Reeds's e-mail archives: As for tricky pages: I suppose in the end we just have to make a diagram and whereever V text appears (be it a word, a line, or a para), define a ``locus'', with locus identifier entered on the diagram, and tag the transcribed text with page/locus/line-num. Thus, on the page shown on Kahn p865, in addition to the usual locus for lines (viz, the main body of text) we could define 8 more loci, call them N1, N2, N3, N4, N5, W1, W2, and E1, and have lines in the transcription like: 152 N1 1 OFAN/AFOE ; ladies with hands in tubing 152 N2 1 OPOE/ZC89 ; under N1 152 N3 1 OEFS8OE ; center top 152 N4 1 OPOEOR 152 N5 1 ORSC8AE ; under N4 152 W1 1 2ORORAE ; above lady's head 152 W2 1 OECOC8N ; on her vascular boat's hull 152 E1 1 OFA