Hacking at the Voynich manuscript - Side notes 056 OKOKOKO word component statistics Last edited on 2000-01-30 06:36:38 by stolfi INTRODUCTION The Voynichese words generally have a `multiple layer' structure with certain words confined to specific layers. This note investigates the statistics and structure of each layer. THE DATA The source data will be majority edition derived from interlinear release 1.6e6, already chopped into pages and sections: ln -s ../045/subsecs-m text-subsecs ln -s ../019/unit-to-type.tbl ln -s ../../select-units ln -s ../../words-from-evt ln -s ../../compute-freqs ln -s ../../compute-cum-freqs ln -s ../../combine-counts ln -s ../../tabulate-frequencies ln -s ../../compare-counts ln -s ../../count-diword-freqs Let's extract the frequencies of word "features", defined as follows: Feature "cec": Maximal substrings of "e" letters, plus the two adjacent letters (treating "ch", "sh", and platform gallows as single letters). Leading "q" letters are deleted. Uses "_" if there is no following/previous letter. Each "e" substring is written as a separate line. Feature "coec" Maximal substrings of [aoyei] letters, plus the two adjacent letters (treating "ch", "sh", and platform gallows as single letters). Leading "q" letters are deleted. Uses "_" if there is no following/previous letter. Each "e" substring is written as a separate line. Feature "dex": Mapped all gallows (with or without platforms) to "X"; mapped all letters { d l r s m n g j x } to "D", deleted any inital "q" and any { a o y i }, leaving the "e"s unchanged. Feature "stolfi": Parsed all words into Stolfi-style elements, but associating the "e"s and "i"s with the character on the right. Then separated all elements. Feature "grove": Parsed all words into Grove-style elements, but associating the "e"s and "i"s with the character on the right. Then separated all elements. Feature "o": The pattern of [aoy] letters, with intervening non-empty strings replaced by "-". Leading "q" letters are deleted. Prefixed and suffixed "-"s are omitted. Words without any [aoy] letter are indicated as "-". Thus "keody" becomes "o-y", "daiin" becomes "a", and "chcthr" becomes "-". Feature "cococ": The pattern of [aoy] letters, with intervening non-empty strings replaced by the respective class letters (M for gallows, X for tables, R for dealers+finals, "o" for rounds). Prefixed "q"s are deleted. Letters "e" and "i" are merged with the preceding M/X and following R, respectively. Thus "keody" becomes "MoRo", "daiin" becomes "RoR", and "chcthr" becomes "XMR". Featture "hoh": The [aoy] strings with one non-[aoy] element on each side. Gallows and tables are mapped to H. Dealers and finals are mapped to R. Prefix "q"s are deleted, "e" and "i" are absorbed into adjacent elements whenever possible. Feature "core": The `gallows elements' of each word: essentially those factors of the OKOKOKO model that contain a gallows letter [ktpf]. The feature consists of the gallows letters, the surrounding "c-h" and "i-h" `platforms' (as in "ikh" or "cthh"), the `half-platforms' (as in "ct" or "ik"), and any single "e" following those letters/groups (as in "ke" or "ckhe"). Multiple consecutive "e"s are not included. Any letters that are not preserved are replaced by "-". Multiple consecutive "-"s are condensed to one "-". Prefixed and suffixed "-"s are deleted. Words without gallows letters are omitted. Feature "mantle": The `mantle' of each word: essentially the "gallows" and "tables" factors of the OKOKOKO model. The mantle consists of the "gallows" letters [ktpf], the letters [ceh], the letter "s" when followed by "h", and the letter "i" when followed by "h" or a gallows letter. All gallows are replaced by "M". The letters [aoy] are replaced by "." All other letters which aren't preserved are mapped to "-". Strings of consecutive "-"s, including any "."s (resp. only "."s) are replaced by single "-" (resp. a single "."). Any prefixed or suffixed "-"s and "."s are discarded. Empty words are discarded. Thus "okcheorchdy" would become "Mche-ch", "ikhhechdeoir" would become "iMhhech-e", and "daiin" would be discarded. Feature "crust": This feature is what remains of the word when the core and mantle are removed. Any maximal string of core letters is replaced by "-". Prefixed and suffixed "-"s are retained, as well as words that get reduced to a single "-". The letters [aoy] are mapped to ".". Feature tags: set features = ( cec coec dex stolfi grove o cococ hoh core mantle crust ) Testing the feature extraction scripts: echo "text-subsecs/*.evt -> .elem.txt" cat text-subsecs/{hea.1,heb.1,bio.1}.evt \ | words-from-evt \ | head -2000 \ | factor-words -f factor-text.gawk \ > .elem.txt foreach ftag ( ${features} ) echo "text-subsecs/*.evt -> .${ftag}.txt" cat text-subsecs/{hea.1,heb.1,bio.1}.evt \ | words-from-evt \ | head -2000 \ | extract-${ftag}-strings -f factor-text.gawk \ > .${ftag}.txt end OK, let's do it: mkdir stats-subsecs foreach ftag ( ${features} ) analyze-features ${ftag} classify-features ${ftag} end The `mantle' and `crust' are best analyzed in matrix form. First, let's collect the valid prefixes and suffixes: foreach ftag ( mantle crust ) extract-affixes ${ftag} end Edited {mantle,crust}-{prefs,suffs}{,-len}.dic by hand. Now tabulate the pairs: foreach ftag ( mantle crust ) tabulate-affix-pairs ${ftag} '=' end DISCUSSION OF CORE-FEATURE STATISTICS Here are the counts and frequencies of the non-empty cores (the part of the word consisting entirely of gallows, including occasional platforms and "e" suffixes). count freq core structure ----- ------- ------------------------ 17439 0.98165 One gallows element 18 0.00101 Two gallows elements, adjacent 129 0.00726 Two gallows elements, separated by [aoyi] only 176 0.00991 Two gallows elements, separated by other stuff 3 0.00017 Three gallows elements Cores consisting of a single gallows element: count freq string ----- ------- ------------------------ 5246 0.30082 k 3509 0.20122 t 1836 0.10528 kee 1594 0.09140 ke 1160 0.06652 p 919 0.05270 te 684 0.03922 cth 658 0.03773 tee 611 0.03504 ckh 299 0.01715 f 222 0.01273 ckhe 178 0.01021 cthe 154 0.00883 keee 123 0.00705 cph 58 0.00333 cphe 42 0.00241 teee 41 0.00235 cfh 22 0.00126 ckhee 18 0.00103 cthee 15 0.00086 cfhe 8 0.00046 ct 6 0.00034 ck 6 0.00034 cphee 4 0.00023 ik 3 0.00017 it 3 0.00017 keeee 2 0.00011 feee 2 0.00011 ikh 2 0.00011 ith 2 0.00011 pe 1 0.00006 cfhee 1 0.00006 cke 1 0.00006 ckheee 1 0.00006 ckhh 1 0.00006 cp 1 0.00006 cphhe 1 0.00006 cte 1 0.00006 ctheee 1 0.00006 fee 1 0.00006 ike 1 0.00006 ikee 1 0.00006 ip ----- ------- ---------------------- 17439 0.98165 One gallows element Ignoring low-frequency element types, like "iik/iit", we get the following table: 5246 k | 1594 ke | 1836 kee | 154 keee | 611 ckh | 222 ckhe | 22 cxhee 299 f | . fe | 1 fee | 2 feee | 41 cfh | 15 cfhe | xx cxhee | | | | | 3509 t | 919 te | 658 tee | 42 teee | 684 cth | 178 cthe | 18 cthee 1160 p | 2 pe | . pee | . peee | 123 cph | 58 cphe | 6 cphee Note that although there is a good parallel between [kt] elements and [fp] elements, the combinations "pe" and "fe" are anomalously rare. I suspect that the hooked-arm versions of [pf] are actually "pe" and "fe". (Cf. the two "f"s on page f66r). Now for cores that consist of two or more gallows elements, with no insertions other than "e" and half-gallows: count string ----- ------------------------ 2 0.11111 keef 1 0.05556 cfhek 1 0.05556 ckheckh 1 0.05556 ckhef 1 0.05556 cpheckh 1 0.05556 cpheeckh 1 0.05556 cphk 1 0.05556 ctheep 1 0.05556 ctheet 1 0.05556 ctht 1 0.05556 ikhckh 1 0.05556 kecthe 1 0.05556 keek 1 0.05556 keep 1 0.05556 pk 1 0.05556 tcfh 1 0.05556 tk ----- ------- ------------------------ 18 0.00101 Two gallows elements, adjacent Now words that have two core elements, with [aoyi] letters only in between: count freq string ----- ------- ------------------------ 15 0.11538 t.k 12 0.09231 k.k 8 0.06154 t.t 5 0.03846 k.t 5 0.03846 te.te 4 0.03077 kee.k 4 0.03077 p.k 4 0.03077 t.p 3 0.02308 k.kee 3 0.02308 ke.k 3 0.02308 ke.ke 3 0.02308 p.ke 3 0.02308 t.ckh 3 0.02308 te.k 2 0.01538 k.cfh 2 0.01538 k.ip 2 0.01538 ke.ckhe 2 0.01538 ke.t 2 0.01538 ke.tee 2 0.01538 p.kee 2 0.01538 t.cph 2 0.01538 t.cth 2 0.01538 t.cthe 2 0.01538 te.t 1 0.00769 ckhe.ke 1 0.00769 cph.cth 1 0.00769 cph.k 1 0.00769 cphe.cth 1 0.00769 cthe.t 1 0.00769 f.ckh 1 0.00769 k.ckh 1 0.00769 k.cth 1 0.00769 k.f 1 0.00769 k.if 1 0.00769 k.ik 1 0.00769 k.keee 1 0.00769 k.p 1 0.00769 ke.cth 1 0.00769 ke.p 1 0.00769 kee.cth 1 0.00769 kee.ke 1 0.00769 kee.kee 1 0.00769 p.f 1 0.00769 p.p 1 0.00769 p.t 1 0.00769 p.te 1 0.00769 t.cphee 1 0.00769 t.ik 1 0.00769 t.ip 1 0.00769 t.ke 1 0.00769 t.kee 1 0.00769 t.keee 1 0.00769 te.f 1 0.00769 te.kee 1 0.00769 te.p 1 0.00769 te.te.t 1 0.00769 te.tee 1 0.00769 tee.k 1 0.00769 tee.kee ----- ------- ------------------------ 129 0.00726 Two gallows elements, separated by [aoyi] only Now words that have two core elements, with something else in between: count freq string ----- ------- ------------------------ 25 0.14205 t-k 19 0.10795 k-k 18 0.10227 p-k 13 0.07386 t-t 8 0.04545 p-t 7 0.03977 k-t 7 0.03977 p-f 6 0.03409 p-kee 6 0.03409 p-p 5 0.02841 f-k 5 0.02841 t-ke 5 0.02841 t-p 4 0.02273 k-p 3 0.01705 ke-ke 3 0.01705 p-cth 2 0.01136 cth-cth 2 0.01136 f-cfh 2 0.01136 f-f 2 0.01136 f-p 2 0.01136 f-t 2 0.01136 k-cth 2 0.01136 k-f 2 0.01136 k-ke 2 0.01136 ke-ckh 2 0.01136 p-cfh 2 0.01136 t-kee 1 0.00568 cph-cth 1 0.00568 f-cph 1 0.00568 f-cth 1 0.00568 f-cthe 1 0.00568 k-te 1 0.00568 k-teee 1 0.00568 ke-f 1 0.00568 ke-k 1 0.00568 kee-k 1 0.00568 kee-ke 1 0.00568 p-ckh 1 0.00568 p-cph 1 0.00568 p-ct 1 0.00568 p-ke 1 0.00568 p-te 1 0.00568 t-cth 1 0.00568 t-keee 1 0.00568 te-ke 1 0.00568 te-p 1 0.00568 tee-kee ----- ------- ------------------------ 176 0.00991 Two gallows elements, separated by other stuff Now the rest: count freq string ----- ------- ------------------------ 1 0.33333 p.t-k 1 0.33333 t-k-cfh 1 0.33333 te.te.t ----- ------- ------------------------ 3 0.00017 Three gallows elements DISCUSSION OF CORE-MANTLE STATISTICS The distribution of `mantle' (actually core+mantle) features is quite broad, which presumably implies that they are composed of multiple letters. The following table summarizes the statistics of words that have a core or a mantle (i.e. excluding words like "daiin" "ar", etc.) number of gallows letters --------------------------------------------------------------- intrusions zero one two three TOTAL ------------- ----------- ----------- ------------ ----------- ----------- none 8247 31.6% 16243 62.3% 48 0.2% 0 0.0% 24685 93.6% [aoy] only 24 0.1% 929 3.6% 193 0.7% 2 0.0% 1179 4.5% other letters 44 0.2% 267 1.0% 82 0.3% 1 0.0% 492 1.9% ------------- ----------- ----------- ------------ ----------- ----------- TOTAL 8315 31.9% 17439 66.9% 323 1.2% 3 0.0% 26356 99.9% Keep in mind that words without any core-mantle letters (such as "daiin") were not counted. Even so, 97.6% of the counted words fit the paradigm: a single cluster of core-mantle+[aoy] letters, containing at most one gallows letter. In fact, 93.9% of these words have a monolithic core-mantle with at most one gallows. These are the 40 most common core-mantles: count freqcy cumct cumfr mantlepattern ------ ------ ------ ------ ------------------- 5872 0.2252 5872 0.2252 M 2362 0.0906 8234 0.3157 che 2333 0.0895 10567 0.4052 Mee 2259 0.0866 12826 0.4918 ch 2177 0.0835 15003 0.5753 Me 1498 0.0574 16501 0.6327 she 1390 0.0533 17891 0.6860 Mch 1046 0.0401 18937 0.7261 sh 905 0.0347 19842 0.7608 Mche 745 0.0286 20587 0.7894 cMh 510 0.0196 21097 0.8089 chee 399 0.0153 21496 0.8242 shee 320 0.0123 21816 0.8365 chcMh 297 0.0114 22113 0.8479 cMhe 260 0.0100 22373 0.8579 ch.M 198 0.0076 22571 0.8655 Msh 189 0.0072 22760 0.8727 cheM 187 0.0072 22947 0.8799 Meee 181 0.0069 23128 0.8868 Mshe 156 0.0060 23284 0.8928 chM 107 0.0041 23391 0.8969 shcMh 93 0.0036 23484 0.9005 checMh 86 0.0033 23570 0.9038 Mchee 83 0.0032 23653 0.9069 sheM 75 0.0029 23728 0.9098 ch.Mch 74 0.0028 23802 0.9127 chcMhe 67 0.0026 23869 0.9152 Mech 63 0.0024 23932 0.9176 shecMh 60 0.0023 23992 0.9199 che.M 58 0.0022 24050 0.9222 ch.cMh 56 0.0021 24106 0.9243 M-ch 55 0.0021 24161 0.9264 ch.Me 54 0.0021 24215 0.9285 sh.M 43 0.0016 24258 0.9301 cMhee 43 0.0016 24301 0.9318 cheeM 40 0.0015 24341 0.9333 shM 39 0.0015 24380 0.9348 sheeM 38 0.0015 24418 0.9363 M.M 33 0.0013 24451 0.9375 ch.Mee 32 0.0012 24483 0.9388 M-M 32 0.0012 24515 0.9400 ee 31 0.0012 24546 0.9412 M-che 31 0.0012 24577 0.9424 ch-M 31 0.0012 24608 0.9436 cheMe 30 0.0012 24638 0.9447 M-sh 28 0.0011 24666 0.9458 chMch 28 0.0011 24694 0.9469 eee 28 0.0011 24722 0.9479 sh.cMh 28 0.0011 24750 0.9490 shcMhe 27 0.0010 24777 0.9500 Meche .. ...... ..... ...... ....... 1 0.0000 26080 1.0000 shshe Sad to say, these counts do not quite support my previous assumption that single "e" is inherently different from "ee", "eee", etc. It is remarkable, though, that the mantles "ee" and "eee" are more than twice as common than "e" alone; and that "Mee" is more common than "Me". But, of course, these anomalies can mean anything. Plotting the frequencies above, we can see a single peak of 22% at the beginning ("M" meaning that the mantle is just a single gallows), then a large drop to 8% followed by four almost equal entries ("che", "Mee", "ch", "Me"), then another large drop to 5.6%, followed by a tapering tail. So here is an another `instant hypothesis' for target practice: "ch" akin to "Me", and "che" is akin to "Mee", etc. In other words, "ch" is actually a "c" modified by "e", and "c" is like a gallows except that it may occur more than once in the mantle. It may be instructive to look at the mantles that contain no gallows and no non-mantle letters: count freqcy mantle pattern ------ ------- ------------------- 2362 0.09057 che 2259 0.08662 ch 1498 0.05744 she 1046 0.04011 sh 510 0.01956 chee 399 0.01530 shee 32 0.00123 ee 28 0.00107 eee 17 0.00065 cheee 14 0.00054 e 7 0.00027 se 6 0.00023 c 5 0.00019 chch 5 0.00019 chsh 5 0.00019 shch 5 0.00019 shche 5 0.00019 sheee 4 0.00015 chech 4 0.00015 shech 4 0.00015 shese 3 0.00012 chse 3 0.00012 seee 2 0.00008 cheche 2 0.00008 eese 2 0.00008 ese 2 0.00008 see 2 0.00008 shsh 1 0.00004 chc 1 0.00004 chchee 1 0.00004 cheech 1 0.00004 cheesee 1 0.00004 chese 1 0.00004 chesh 1 0.00004 chsee 1 0.00004 cse 1 0.00004 eche 1 0.00004 esech 1 0.00004 h 1 0.00004 sechee 1 0.00004 shesh 1 0.00004 shh 1 0.00004 shse 1 0.00004 shshe Note that "chee" and "shee" are quite common but "chch" and "shsh" are extremely rare. These numbers argue against my assumption that "chee" is "ch"+"ee"; perhaps "chee" is a single letter, and the all-"e" mantles are errors. Of the mantles that are split only by [aoy] letters (4.5% of all mantles), these are the most common: count freqcy mantle pattern ------ ------- ------------------- 260 0.00997 ch.M 75 0.00288 ch.Mch 60 0.00230 che.M 58 0.00222 ch.cMh 55 0.00211 ch.Me 54 0.00207 sh.M 38 0.00146 M.M 33 0.00127 ch.Mee 28 0.00107 sh.cMh 25 0.00096 M.ch 20 0.00077 she.M 19 0.00073 Mch.M 19 0.00073 sh.Mch 18 0.00069 che.cMh 17 0.00065 ch.Mche 16 0.00061 ch.cMhe 14 0.00054 M.che 14 0.00054 che.Me 11 0.00042 Me.M 11 0.00042 sh.Me 10 0.00038 M.cMh 10 0.00038 M.she 9 0.00035 ch.ch .. ....... ....... The pattern "ch.M" fits "archoky" and "chotaiin", for example. Mantles split by [aoy] (in particular, mantles beginning with "cho") seem to be more common in language A; they are almost absent in the "bio" section. Among the mantles that are split by non-mantle, non-[aoy] letters (1.9% of all mantles), these are the most common: count freqcy mantle pattern ------ ------- ------------------- 56 0.00215 M-ch 32 0.00123 M-M 31 0.00119 M-che 31 0.00119 ch-M 30 0.00115 M-sh 17 0.00065 ch-ch 14 0.00054 M-she 9 0.00035 che-M 9 0.00035 sh-M 7 0.00027 M-Mch 6 0.00023 Mch-M 5 0.00019 che-ch 5 0.00019 M-Me 5 0.00019 M-Mee 5 0.00019 che-Me 4 0.00015 ch-che ... ....... ...... DISCUSSION OF "EC" AND "CE" FEATURE STATISTICS Here are the counts and frequencies of "e" strings classified by length and by left/right context. First, isolated "e"s: ec/tot.frq ce/tot.frq count freq string count freq string ----- ------- ------ ----- ------- ------ 3452 0.25084 ed 6521 0.47384 he 2448 0.17788 eo 1640 0.11917 ke 2052 0.14911 ey 946 0.06874 te 329 0.02391 ea 132 0.00959 oe 325 0.02362 ec 71 0.00516 _e 323 0.02347 ek 33 0.00240 de 165 0.01199 es 32 0.00233 se 139 0.01010 et 10 0.00073 le 58 0.00421 ep 5 0.00036 re 57 0.00414 e_ 4 0.00029 ae 32 0.00233 ef 3 0.00022 ye 11 0.00080 er 2 0.00015 ce 5 0.00036 eg 2 0.00015 pe 3 0.00022 el 1 0.00007 ei 1 0.00007 em Note that the left contexts are significantly more restricted than the right conexts. Note the large drop from "te" to "oe" and "_e". This observation seems consistent with the theory that a single "e" is a modifier for the preceding letter. The "oe"s are an unexplained phenomenon; there are too many to ignore (about 1% of all "e" strings). If the "(q)o"s are deleted, most of the "oe"s apparently become "_e". Contexts of "ee"" count freq string count freq string ----- ------- ------ ----- ------- ------ 1635 0.11881 eey 1868 0.13574 kee 1294 0.09403 eed 1217 0.08843 hee 628 0.04563 eeo 664 0.04825 tee 180 0.01308 ees 140 0.01017 oee 79 0.00574 eea 56 0.00407 dee 78 0.00567 eek 37 0.00269 _ee 39 0.00283 eec 24 0.00174 lee 33 0.00240 eet 6 0.00044 see 9 0.00065 eef 3 0.00022 ree 7 0.00051 eep 2 0.00015 iee 4 0.00029 een 2 0.00015 yee 3 0.00022 eer 1 0.00007 aee 2 0.00015 eeg 1 0.00007 fee 2 0.00015 eei 2 0.00015 eel 2 0.00015 eem This time the two contexts are roughly similar in spread. There are no large steps in frequency. These numebrs can be interpreted as saying that "ee" is a complete letter. Contexts of "eee": count freq string count freq string ----- ------- ------ ----- ------- ------ 164 0.01192 eeey 157 0.01141 keee 76 0.00552 eeed 43 0.00312 teee 47 0.00342 eees 42 0.00305 oeee 28 0.00203 eeeo 32 0.00233 heee 5 0.00036 eeen 28 0.00203 _eee 4 0.00029 eeek 12 0.00087 deee 4 0.00029 eeet 10 0.00073 leee 2 0.00015 eee_ 4 0.00029 reee 2 0.00015 eeea 3 0.00022 seee 1 0.00007 eeeg 2 0.00015 feee 1 0.00007 eeem 1 0.00007 yeee Here the right contexts seem a bit more restricted than the left contexts. count freq string count freq string ----- ------- ------ ----- ------- ------ 3 0.00022 eeees 3 0.00022 keeee 2 0.00015 eeeey 2 0.00015 oeeee 1 0.00007 eeeed 1 0.00007 deeee These are too rare to analyze. Let's compare the left contexts of the four strings "e", "ee", "eee", and "eeee": 1640 0.17445 ke 1868 0.46456 kee 157 0.47006 keee 3 0.50000 keeee 6521 0.69365 he 1217 0.30266 hee 32 0.09581 heee 946 0.10063 te 664 0.16513 tee 43 0.12874 teee 132 0.01404 oe 140 0.03482 oee 42 0.12575 oeee 2 0.33333 oeeee 71 0.00755 _e 37 0.00920 _ee 28 0.08383 _eee 33 0.00351 de 56 0.01393 dee 12 0.03593 deee 1 0.16667 deeee 10 0.00106 le 24 0.00597 lee 10 0.02994 leee 32 0.00340 se 6 0.00149 see 3 0.00898 seee 5 0.00053 re 3 0.00075 ree 4 0.01198 reee 3 0.00032 ye 2 0.00050 yee 1 0.00299 yeee 1 0.00025 fee 2 0.00599 feee 2 0.00021 pe 4 0.00043 ae 1 0.00025 aee 2 0.00050 iee 2 0.00021 ce We see that they are remarkably similar, except that "he" is anomalous - far more common, proportionally, than "hee", "heee", and "heeee". The excess "he" to have been realized at the cost of "ke" chiefly. DISCUSSION OF CRUST STATISTICS There are 8772 words that have no mantle letters, and 492 that have a mantle interrupted by non-mantle, non-[aoy] letters. Let's ignore them for the time being, and concentrate on the 25864 words with a compact, non-empty mantle. Among the latter, the prefixes have a very narrow distribution. Here are the top 10: count freqcy cumct cumfr prefix ------ ------ ------ ------ ------------------- 23040 0.8908 23040 0.8908 - 1732 0.0670 24772 0.9578 l- 581 0.0225 25353 0.9802 d- 173 0.0067 25526 0.9869 s- 138 0.0053 25664 0.9923 r- 56 0.0022 25720 0.9944 dl- 47 0.0018 25767 0.9962 sl- 17 0.0007 25784 0.9969 q- 14 0.0005 25798 0.9974 ll- 11 0.0004 25809 0.9979 rl- 10 0.0004 25819 0.9983 dr- 4 0.0002 25823 0.9984 i- 3 0.0001 25826 0.9985 di- 3 0.0001 25829 0.9986 ii- 3 0.0001 25832 0.9988 ld- 3 0.0001 25835 0.9989 rr- ...... ...... ..... ...... ..... It can be seen that the empty and single-letter prefix account for 99.2% of all counted words. The suffixes have a broader distribution. With some manual classification, we get this list:: count freqcy cumct cumfr prefix ------ ------ ------ ------ ------------------- 8049 0.3112 8049 0.3112 - 6028 0.2331 14077 0.5443 -d 3581 0.1385 17658 0.6827 -l 668 0.0258 18326 0.7086 -s 2678 0.1035 21004 0.8121 -r 155 0.0060 21159 0.8181 -ir 32 0.0012 21191 0.8193 -iir 72 0.0028 21263 0.8221 -n 711 0.0275 21974 0.8496 -in 1453 0.0562 23427 0.9058 -iin 393 0.0152 23820 0.9210 -m # 23820 TOTAL # 22 0.0009 23842 0.9218 -dd 288 0.0111 24130 0.9330 -dl 15 0.0006 24145 0.9335 -ds 325 0.0126 24470 0.9461 -dr 38 0.0015 24508 0.9476 -dir 6 0.0002 24514 0.9478 -diir 8 0.0003 24522 0.9481 -dn 99 0.0038 24621 0.9519 -din 337 0.0130 24958 0.9650 -diin 63 0.0024 25021 0.9674 -dm # 1201 TOTAL # 153 0.0059 25174 0.9733 -ld 53 0.0020 25227 0.9754 -ll 45 0.0017 25272 0.9771 -ls 59 0.0023 25331 0.9794 -lr 7 0.0003 25338 0.9797 -lir 1 0.0000 25339 0.9797 -liir 1 0.0000 25340 0.9797 -ln 6 0.0002 25346 0.9800 -lin 32 0.0012 25378 0.9812 -liin 15 0.0006 25393 0.9818 -lm # 372 TOTAL # 18 0.0007 25411 0.9825 -sd 10 0.0004 25421 0.9829 -sl 2 0.0001 25423 0.9829 -ss 7 0.0003 25430 0.9832 -sr 1 0.0000 25431 0.9833 -sir 0 0.0000 25431 0.9833 -siir 1 0.0000 25432 0.9833 -sn 0 0.0000 25432 0.9833 -sin 18 0.0007 25450 0.9840 -siin 2 0.0001 25452 0.9841 -sm # 59 TOTAL # 21 0.0008 25473 0.9849 -rd 34 0.0013 25507 0.9862 -rl 2 0.0001 25509 0.9863 -rs 36 0.0014 25545 0.9877 -rr 5 0.0002 25550 0.9879 -rir 0 0.0000 25550 0.9879 -riir 2 0.0001 25552 0.9879 -rn 10 0.0004 25562 0.9883 -rin 37 0.0014 25599 0.9898 -riin 21 0.0008 25620 0.9906 -rm # 168 TOTAL # Note that these suffixes (with zero, one, and two dealers) comprise 99% of all counted words. Of the mantleless words, at least 97% seem to be merely the concatenation of a prefix and a suffix from the above lists. DISCUSSION OF AOY STATISTICS The following table shows the occurences of O-strings classified by their immediate non-O contexts. Here M means a gallows element (with any [ic] prefixes and [he] suffixes), X a table element (with any [e] suffix), R a dealer or final (with any [i] prefix or [e] suffix). The three O-letters have been mapped to "o". Word boundaries are denoted "#" and empty O-strings by "_". Context _ o oo ooo other ------- ----- ----- ----- ----- ----- #*X 8899 577 9 . 1 M*X 6186 95 . . 1 X*X 1294 37 . . . R*X 1755 53 . . . #*# -N/A- 240 18 . 3 #*M 3635 10212 25 . 40 X*M 1633 894 5 . 29 R*M 1237 159 5 . 7 M*R 1402 7058 64 1 6 M*M 11 114 . . 6 M*# 189 2896 32 1 3 X*# 110 4942 51 . 3 R*# 19274 7311 13 . 2 #*R 6402 4748 174 1 33 X*R 5086 4749 69 . 2 R*R 838 6592 34 . 6 other 166 142 2 . . TOTAL 58207 50819 501 3 142 The "other" counts are letter groups such as "oe", "shh", "ich", detached [ice], etc. which cannot be parsed into the standard set of elements, There are about 51972 O-letters in the sample text, and about 109672 possible O-slots between elements. The bottom line of the table shows that almost half of the slots are empty, and half are occupied with a single "o". Note that, if there was no restriction about the sequence of O's and K's, we should have 62% empty slots, 30% slots with one "o", 14% slots with "oo", etc. If the placement of the "o"s were independent of the context, we should expect each pair of non-O elements to be separated by "_" and "o" strings in approximate 1:1 ratio. Comparing the "_" and "o" entries in each row, however, we see that the contexts M*X, X*X, R*X, #*X strongly repel O-strings (ratios 65:1, 35:1, 33:1, 15:1, respectively), while X*#, M*#,and R*R strongly attract them (ratios 1:45, 1:15, and 1:8, respectively). One way of explaining these numbers is to say that the O-letters are either modifiers for the following R or M letter, or word-finals. These contexts account for 49675 of the 50819 "o"-strings (97.7%), and 492 of the 501 "oo" strings (98.2%). Here is the same data, without collapsing the [aoy] but collapsing M and X into H: Context _ a o y ay oy yy oo yo aa ao oa oao ya aoy yoa ------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- #*# . 2 95 143 1 16 1 . . . . . . . . . #*H 12534 33 9324 1432 . 15 1 5 13 . . . . . . . #*R 6402 1830 2833 85 . 3 . 28 10 1 2 118 1 12 . . H*# 299 19 883 6936 5 74 . 2 1 . 1 . . . 1 . H*H 9124 29 998 113 . 2 . 2 1 . . . . . . . H*R 6488 5382 6333 92 1 2 . 21 1 1 3 102 . 2 . 1 R*# 19274 23 112 7176 1 10 . . 1 . 1 . . . . . R*H 2992 24 116 72 3 . . 1 1 . . . . . . . R*R 838 5228 1314 50 . 1 . 1 2 1 1 24 . 4 . . In the following tables, we used the following mappings: gallows (incl.platforms) -> "M", tables -> "X", dealers and finals -> "R". The initial "q"s were deleted, the "e"s and "i"s were merged into the adjacent "M". "X", or "R" as appropriate. Then words with a core-mantle were split into prefix, coremantle, suffix, delimited by "«" and "»". Let's say that a word is `hard' if it has a non-empty core and/or mantle, and `soft' otherwise. In a hard word we can isolate a maximal `prefix' and a maximal `suffix' consisting of non-core, non-mantle letters --- namely, dealers, finals, circles, and any [ie] modifiers. Thus, for example, the hard word "orckhocheody" can be split into prefix "or", suffix "ody", and core-mantle "ckhoche". Note that a prefix, suffix, or soft word with N non-circle elements has N+1 slots where circles could be inserted, while a core-mantle with N non-circle elements has N-1 such slots. The following table shows the counts of empty and occupied circle slots in the three parts of hard words. soft words: 22435 O-slots, 9952 occupied (44%) prefixes: 29078 O-slots, 12082 occupied (42%) suffixes: 46322 O-slots, 27572 occupied (60%) core-mantles: 11133 O-slots, 1534 occupied (14%) Thus we see that the O-letters stronly avoid the interior of the core-mantle. In fact, if we look closely, we find that most of the filled O-slots in core-mantles occur after an X-letter that precedes the core, as in "chokedy" or "shchotchy"; or in `invalid' core-mantles (with more than one M and/or R intrusions): valid core-mantles without XO: 8634 O-slots, 89 occupied (1%) valid core-mantles with XO: 1076 O-slots, 858 occupied (80%) invalid core-mantles: 911 O-slots, 360 occupied (40%) In short, the circles are found mostly in the `crust' of words, except for some "cho" and "sho" sequences in the first half of the mantle. Let's look more closely at the soft words (without core-mantle). Of the 35128 word instances in the text, 8870 (25.3%) fall into this category. Of these 8870 words, the vast majority (8754) consists of R's and O's. (Recall that any "i" preceding an F letter was lumped with the latter, so that "daiin" got mapped into "RaR".) The 116 word instances without core-mantle that do not consist of R's and O's contain unattached "e", "c", "aii", or non-initial "q", or other weird groups like "shh" and "de". Here are the counts of the RO-only words, according to the number of R and O letters in the word: O-letters in word 0 1 2 3 4 ALL --------------------- ----- ----- ----- ----- ----- ----- 0 R-letters - 240 18 . . 258 1 R-letters 475 3000 387 5 . 3867 2 R-letters 62 3113 936 63 3 4177 3 R-letters 7 63 283 55 7 415 4 R-letters 1 4 24 6 1 36 5 R-letters . . . 1 . 1 Totals 545 6420 1648 130 11 8754 Rel. percent 6.2% 73.3% 18.8% 1.5% 0.1% 100.0% Abs. percent 1.6% 18.3% 4.7% 0.4% 0.0% 24.9% Note that there is some correlation between the two counts: the average number nO of O-letters is roughly 0.5 + 0.5*nR where nR is the number of R-letters. Here are the R-O words, with R split into D (dealers) and F (finals): tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 feature ------ ------ ------ ------ ------ ------ ------ ------ ------- 1553 64 531 37 11 116 314 231 DaF 1143 54 115 23 6 83 221 436 oD 963 28 148 81 21 92 181 208 DaD 737 10 23 71 58 66 334 51 aD 661 21 34 26 33 77 268 66 aF 456 15 140 17 17 31 46 44 D 446 36 112 11 1 15 75 150 DoD 342 16 125 20 10 38 18 71 Dy 299 20 46 5 5 34 98 30 oDaF 147 9 16 5 3 20 18 49 oDy 143 . 34 12 7 14 21 24 y 116 5 17 4 1 10 39 14 oDaD 95 3 14 9 8 2 19 14 o 86 . 18 7 2 12 17 16 DaDy 82 3 15 2 . 6 14 32 oDoD 66 3 19 1 . 6 25 1 oaF 64 8 29 . . . 9 14 DoF 59 2 11 3 . 3 19 10 oF 56 . 8 3 . 5 17 12 DD 53 1 1 5 5 5 20 11 aDy 51 . . . 3 6 20 12 aDoD 48 2 . 2 1 6 27 . aDaF 46 . 8 2 . 1 18 2 oaD 44 1 14 1 . 1 13 7 Do 41 1 3 4 2 4 10 3 DaDaD 41 3 13 2 . 1 9 9 DoDy 41 1 . 6 5 1 18 1 aDaD 36 1 1 4 . 4 18 1 DaDaF 36 3 5 1 . 4 12 6 oDD 34 2 12 3 1 4 5 2 yDaF 31 1 5 1 . 3 7 12 DDy 30 . 5 2 . 4 4 5 DaDoD 28 2 7 2 . 4 8 2 DoDaF 28 3 3 1 . 6 2 11 oDDy 26 2 4 5 . . 6 2 DoDaD 22 . 5 1 . 4 2 4 DaDDy 22 3 6 . . 5 . . oDDaF 21 1 . 1 2 2 11 1 aDoDy 20 . 9 . . 3 1 1 F 19 5 3 . . 3 3 1 ooD 18 . 2 2 2 1 5 2 aDDy 16 . 6 . . 2 6 1 oy 13 2 5 1 1 1 1 . yDaD 12 1 . . 1 1 5 1 DDaD 12 1 5 . 1 1 1 . DaDD 12 . . 1 . . 7 2 aDoF 12 . 4 . . 4 1 . oDDaD 12 1 1 1 1 2 2 . oDaDy 11 . . 1 2 3 3 . aDD 11 . 4 . . . 1 4 oDoF 10 . . . . . 6 2 DDaF 10 1 3 . . 2 1 2 Da 10 1 . . . . 4 5 DoDoD 10 1 5 . . 1 . 2 yDy 9 1 2 . . . 3 3 DDoD 9 . 2 1 . 2 1 . DaDDaF 9 . 2 1 . 1 1 1 DaDoDy 9 1 3 . . . 5 . DoaF 9 . 3 . . . 2 3 oDo 8 . 1 1 . . 1 2 DaDo 8 . 1 . . 1 4 2 DaDoF 8 . 4 . . 3 . 1 DoDD 7 . 1 1 . 1 3 . DF 6 . . 1 . 1 2 1 DDD 6 . 1 . . 1 1 3 DoDDy 6 . 1 3 . . 2 . DoaD 6 . . 2 . . 1 2 aFy 6 . 3 . . . . 3 oDoDy 6 . 3 . . 1 1 . yD 6 2 . . . 1 2 . yaF 5 1 2 . . 1 . . DaDF 5 . . 1 1 . . 1 DaDaDy 5 . 1 1 . . 1 . Doy 5 . . . . 2 2 . aDDaD 5 . 1 2 . . . . aDDaF 5 . . . . 1 3 . aDoDD 5 . 2 . . . 2 1 yaD 4 . . . . 1 1 . DaDDaD 4 . 3 . . . . 1 DoDoF 4 . . . 1 . 2 . aDo 4 . 3 . . . . 1 yoD 3 . . . . . 1 1 DDaDy 3 . 2 . . . . . DaFD 3 . 2 . . . . 1 DaFo 3 . 1 . 1 . . 1 DyD 3 . . . . . 1 1 DyaD 3 . . . . . 1 . aDaDy 3 . . . . . 3 . aFaD 3 . . . . 1 2 . aFaF 3 1 1 . . . . . oDaDDy 3 . 1 . . . 1 . oDoDD 3 . 1 . . . 1 1 yDoD 2 . . . . . 2 . DDDy 2 . . . . . . . DaDoDD 2 . . . . . 2 . DaFDy 2 . 2 . . . . . DaFF 2 . 2 . . . . . DaFoD 2 . 1 . . 1 . . DaFy 2 . 2 . . . . . DoDDaF 2 1 1 . . . . . DoDaDy 2 . 2 . . . . . DoDoDy 2 1 . . . 1 . . DyDy 2 . . . . . . . a 2 . 2 . . . . . aDF 2 . . . . . 2 . aDaDD 2 . . . . 1 . 1 aDaDDy 2 . . . . . 2 . aDaDaD 2 . . . . . 1 . aDaDaDy 2 . . . . 1 1 . aDaDoD 2 . . . . 1 1 . aDoDDy 2 . . . . . 1 . aDoDaF 2 . . . 1 . 1 . aDoy 2 1 . . . 1 . . aFF 2 . 1 . . . . 1 aoD 2 . 2 . . . . . oDDoD 2 . 1 . . . . . oDF 2 . 2 . . . . . oDaDD 2 1 . . . . . . oDaDaD 2 . . . . . . . oDaDaDy 2 . . . . . . . oDaDoD 2 1 . . . . . . oDaDoDy 2 . 1 . . 1 . . oDoDDy 2 1 . . . . 1 . oDoDaF 2 . 1 . . . 1 . oDyD 2 . . . . . 1 . oaDaD 2 . 1 . . . . . ooF 2 . 1 . . . . . oyD 2 . 1 . . 1 . . yoDaF 2 . . . . . 1 . yoF 1 . . . . . 1 . DDDD 1 . . . . . . 1 DDDaD 1 . . . . . 1 . DDF 1 . 1 . . . . . DDaDD 1 . . . . . . 1 DDaDoD 1 . . . . . 1 . DDaFaD 1 . . . 1 . . . DDay 1 . . . . . 1 . DDo 1 . . . . . 1 . DDoDy 1 . . . . . 1 . DDyD 1 . . . . . 1 . DaDDD 1 . 1 . . . . . DaDDaDoD 1 . . . . . 1 . DaDDyD 1 . 1 . . . . . DaDa 1 . . . . . 1 . DaDaDD 1 . . . . . . . DaDaDDy 1 . . . . . . . DaDaDa 1 . . 1 . . . . DaDaDaD 1 . . . 1 . . . DaDaDoDy 1 . . . . . . . DaDaFF 1 . . . . . . . DaDoDF 1 . . . . 1 . . DaDoDaF 1 . . . . . 1 . DaDoDoD 1 . . . . . 1 . DaDoaD 1 . . . . . . 1 DaDyDD 1 . 1 . . . . . DaDyDaD 1 . . . . . . 1 DaDyDy 1 . . . 1 . . . DaFaFDy 1 . 1 . . . . . Dao 1 1 . . . . . . DaoD 1 . 1 . . . . . DoDDD 1 . 1 . . . . . DoDo 1 . . . . . 1 . DoDoDD 1 1 . . . . . . DoDoDaFF 1 . . . . 1 . . DoDyD 1 . . . . . . . DoFy 1 . . . . . 1 . DoaDy 1 . 1 . . . . . DoyDy 1 . . . . . 1 . DyDaF 1 . . . . . . . DyDyD 1 . . . . 1 . . DyDyDy 1 . . . . . . 1 DyF 1 . . . . . 1 . DyaF 1 . 1 . . . . . FaF 1 . 1 . . . . . FoD 1 1 . . . . . . aDDo 1 . . . . . 1 . aDDoD 1 . . . . 1 . . aDDyDy 1 . . . . . . . aDaDaF 1 1 . . . . . . aDaDoy 1 . . 1 . . . . aDaFy 1 . . 1 . . . . aDoDDD 1 . . . . . 1 . aDoDaD 1 . . . . . 1 . aDoDo 1 . . . . . . 1 aDoDoD 1 . . . . . 1 . aDoaD 1 1 . . . . . . aDoaDy 1 . . . . 1 . . aDyD 1 . . . . . . 1 aDyDy 1 . . . . . 1 . aFD 1 . . . . . 1 . aFoF 1 . . . . . 1 . aaD 1 . . 1 . . . . ay 1 . . . . . 1 . oDDDy 1 . . . . . . 1 oDDaDy 1 . . 1 . . . . oDFy 1 1 . . . . . . oDaDF 1 . . . 1 . . . oDaDaF 1 . . . . . 1 . oDaDo 1 . . . . . 1 . oDaFaD 1 . 1 . . . . . oDaFo 1 . . . . . . 1 oDaFy 1 . 1 . . . . . oDoDo 1 . . . . . . 1 oDoDoD 1 . . . . 1 . . oDoaF 1 . . . . 1 . . oDoy 1 . . . . . . 1 oDyDy 1 . . . . . . . oFaF 1 . . . . . 1 . oFoD 1 . . . . . 1 . oaDDaD 1 . 1 . . . . . oaDDaDy 1 . . 1 . . . . oaDaF 1 . . . 1 . . . oaFD 1 . 1 . . . . . oaoDaD 1 1 . . . . . . ooDDaD 1 . 1 . . . . . ooDDy 1 1 . . . . . . ooDaD 1 . . . . . 1 . ooDaDD 1 . . . . . . . ooDy 1 . . . . . . . oyDaD 1 . 1 . . . . . yDD 1 . 1 . . . . . yDaDD 1 . . . . . . 1 yDaDaD 1 . . 1 . . . . yDoDy 1 . . . . . . . yDoF 1 . . . 1 . . . yaDy 1 . . . . . 1 . yoDD 1 . . . . . . . yy ------ ------ ------ ------ ------ ------ ------ ------ ------- 8775 357 1681 407 220 752 2106 1631 Total Here is a more condensed table, with D and F merged into R: (The slight discrepancy in the counts due to exclusion of some rare elements like "de" from R). tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 2516 92 679 118 32 208 495 439 RaR 1398 31 57 97 91 143 602 117 aR 1201 56 126 26 6 86 239 446 oR 510 44 141 11 1 15 84 164 RoR 475 15 149 17 17 34 46 45 R 416 25 63 9 6 44 137 44 oRaR 341 16 124 20 10 38 18 71 Ry 145 7 16 5 3 20 18 49 oRy 143 . 34 12 7 14 21 24 y 112 3 27 3 . 7 43 3 oaR 95 3 . 8 6 8 50 1 aRaR 95 3 14 9 8 2 19 14 o 93 2 19 2 . 6 16 36 oRoR 88 . 19 7 2 13 17 16 RaRy 77 2 4 8 2 8 28 4 RaRaR 64 . . 1 3 6 28 14 aRoR 62 . 8 4 . 6 20 12 RR 59 1 1 7 5 5 21 13 aRy 53 4 10 7 . 4 14 4 RoRaR 47 4 17 4 2 5 6 2 yRaR 44 1 14 1 . 1 13 7 Ro 41 2 13 2 . 1 9 9 RoRy 40 . 8 2 . 5 8 7 RaRoR 37 3 6 1 . 4 12 6 oRR 34 3 10 . . 9 1 . oRRaR 29 3 3 2 . 6 2 11 oRRy 28 . 5 1 . 3 7 10 RRy 24 . 5 1 . 4 4 4 RaRRy 22 2 11 . 1 2 1 . RaRR 21 1 . . 1 1 10 3 RRaR 21 1 . 1 2 2 11 1 aRoRy 21 5 4 . . 3 3 1 ooR 18 . 2 2 2 1 5 2 aRRy 16 1 2 1 2 4 4 . aRR 16 . 6 . . 2 6 1 oy 15 1 4 3 . . 7 . RoaR 14 1 3 . . . 4 6 RoRoR 13 1 1 1 1 2 2 1 oRaRy 12 . 2 1 . 2 2 . RaRRaR 11 . 3 1 . . 1 3 RaRo 11 2 2 . . 1 4 1 yaR 10 1 3 . . 2 1 2 Ra 10 . 1 2 . 2 2 . aRRaR 9 1 2 . . . 3 3 RRoR 9 . 2 1 . 1 1 1 RaRoRy 9 . 3 . . . 2 3 oRo 9 1 4 . . 1 . 2 yRy 8 . 4 . . 3 . 1 RoRR 7 . . 1 . 1 3 1 RRR 6 . 1 . . 1 1 3 RoRRy 6 . 3 . . . . 3 oRoRy 6 . 3 . . 1 1 . yR 6 . 3 . . . 1 1 yoR 5 . . 1 1 . . 1 RaRaRy 5 . 1 1 . . 1 . Roy 5 . . . . 1 3 . aRoRR 4 . 1 . 1 . . 2 RyR 4 . . . . . 2 1 RyaR 4 . . 1 . . 1 . aRaRy 4 . . . 1 . 2 . aRo 4 1 . . 1 . 1 . oRaRaR 4 . 1 . . . 1 1 yRoR 3 . . . . . 1 1 RRaRy 3 . . . . . . . RaRoRR 3 . . . . . 2 . aRaRaR 3 . . . . . 2 . aRoRaR 3 1 2 . . . . . oRaRR 3 1 1 . . . . . oRaRRy 3 . 1 . . . 1 . oRoRR 3 . . 1 . . 1 . oaRaR 2 . . . . . 2 . RRRy 2 . . . . . 1 . RaRaRR 2 . . . 1 . . . RaRaRRy 2 . 2 . . . . . RoRRaR 2 1 1 . . . . . RoRaRy 2 . 2 . . . . . RoRoRy 2 1 . . . 1 . . RyRy 2 . . . . . . . a 2 . . . . . 2 . aRaRR 2 . . . . 1 . 1 aRaRRy 2 . . . . . 1 . aRaRaRy 2 . . . . 1 1 . aRaRoR 2 . . . . 1 1 . aRoRRy 2 . . . 1 . 1 . aRoy 2 . 1 . . . . 1 aoR 2 . 2 . . . . . oRRoR 2 . . . . . . . oRaRaRy 2 . 1 . . . 1 . oRaRo 2 . . . . . . . oRaRoR 2 1 . . . . . . oRaRoRy 2 . 1 . . 1 . . oRoRRy 2 1 . . . . 1 . oRoRaR 2 . 1 . . . 1 . oRyR 2 . 1 . . . . . oyR 2 . 1 . . 1 . . yoRaR 1 . . . . . 1 . RRRR 1 . . . . . . 1 RRRaR 1 . 1 . . . . . RRaRR 1 . . . . . 1 . RRaRaR 1 . . . . . . 1 RRaRoR 1 . . . 1 . . . RRay 1 . . . . . 1 . RRo 1 . . . . . 1 . RRoRy 1 . . . . . 1 . RRyR 1 . . . . . 1 . RaRRR 1 . 1 . . . . . RaRRaRoR 1 . . . . . 1 . RaRRyR 1 . 1 . . . . . RaRa 1 . . . . . . . RaRaRa 1 . . 1 . . . . RaRaRaR 1 . . . 1 . . . RaRaRoRy 1 . . . . 1 . . RaRoRaR 1 . . . . . 1 . RaRoRoR 1 . . . . . 1 . RaRoaR 1 . . . . . . 1 RaRyRR 1 . 1 . . . . . RaRyRaR 1 . . . . . . 1 RaRyRy 1 . 1 . . . . . Rao 1 1 . . . . . . RaoR 1 . 1 . . . . . RoRRR 1 . 1 . . . . . RoRo 1 . . . . . 1 . RoRoRR 1 . . . . 1 . . RoRyR 1 . . . . . 1 . RoaRy 1 . 1 . . . . . RoyRy 1 . . . . . 1 . RyRaR 1 . . . . . . . RyRyR 1 . . . . 1 . . RyRyRy 1 . . . . . 1 . aRRoR 1 . . . . 1 . . aRRyRy 1 1 . . . . . . aRaRoy 1 . . . . . 1 . aRoRo 1 . . . . . . 1 aRoRoR 1 . . . . . 1 . aRoaR 1 1 . . . . . . aRoaRy 1 . . . . 1 . . aRyR 1 . . . . . . 1 aRyRy 1 . . . . . 1 . aaR 1 . . 1 . . . . ay 1 . . . . . 1 . oRRRy 1 . . . . . . 1 oRRaRy 1 . 1 . . . . . oRoRo 1 . . . . . . 1 oRoRoR 1 . . . . 1 . . oRoaR 1 . . . . 1 . . oRoy 1 . . . . . . 1 oRyRy 1 . . . 1 . . . oaRR 1 . . . . . 1 . oaRRaR 1 . 1 . . . . . oaRRaRy 1 . 1 . . . . . oaoRaR 1 1 . . . . . . ooRRaR 1 . 1 . . . . . ooRRy 1 1 . . . . . . ooRaR 1 . . . . . 1 . ooRaRR 1 . . . . . . . ooRy 1 . . . . . . . oyRaR 1 . 1 . . . . . yRR 1 . 1 . . . . . yRaRR 1 . . . . . . 1 yRaRaR 1 . . 1 . . . . yRoRy 1 . . . 1 . . . yaRy 1 . . . . . 1 . yoRR 1 . . . . . . . yy ------ ------ ------ ------ ------ ------ ------ ------ ------- 8754 350 1675 406 220 751 2103 1629 Total 24.9% 26.7% 24.9% 29.8% 31.3% 26.6% 20.8% 24.8% Percent Here is the same table, with [aoy] mapped to "o": tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 3030 136 821 129 34 223 579 605 RoR 2605 87 186 123 97 230 842 563 oR 722 34 101 24 17 70 239 98 oRoR 475 15 149 17 17 34 46 45 R 395 18 141 21 10 41 32 80 Ro 240 3 48 21 15 16 40 38 o 226 9 24 12 9 26 43 67 oRo 187 7 25 17 2 18 55 21 RoRoR 155 10 38 3 . 11 52 7 ooR 144 3 37 10 2 15 27 28 RoRo 62 . 8 4 . 6 20 12 RR 54 4 9 2 2 8 16 6 oRR 51 2 6 4 3 4 16 7 oRoRo 47 3 5 4 2 7 7 13 oRRo 47 3 13 2 . 11 4 . oRRoR 31 2 2 . 1 1 14 6 RRoR 30 2 15 . 1 5 1 1 RoRR 30 . 6 1 . 5 5 7 RoRRo 29 . 5 1 . 3 8 10 RRo 21 1 5 2 1 2 1 3 RoRoRo 20 2 4 3 . . 9 1 RooR 19 2 . . 1 1 7 3 oRoRoR 18 . 6 1 . 2 6 1 oo 15 . 4 1 . 2 3 . RoRRoR 14 1 4 . . 1 6 . oRoRR 9 1 2 . . 3 1 1 oRoRRo 7 . . 1 . 1 3 1 RRR 7 . . . . . 2 1 RoRoRR 7 1 1 1 . 1 1 . ooRoR 6 . 2 1 . . 1 . Roo 6 1 . . . . 1 . oRoRoRo 4 . . . . . 2 1 RRoRo 4 . 1 1 . 1 1 . RoRoRoR 3 . . . 1 1 1 . oRoo 2 . . . . . 2 . RRRo 2 . . . . . 1 1 RRoRoR 2 . 1 . . . 1 . RoRRR 2 . . . 1 . . . RoRoRRo 2 . 1 . . . 1 . RooRo 2 . . . . 1 . 1 oRRoRo 2 . . . . 1 1 . oRooR 2 . . . 1 . 1 . ooRR 2 1 . . . . 1 . ooRRoR 2 . . . 1 . . . ooRo 1 . . . . . 1 . RRRR 1 . . . . . . 1 RRRoR 1 . 1 . . . . . RRoRR 1 . . . 1 . . . RRoo 1 . 1 . . . . . RoRRoRoR 1 . . . 1 . . . RoRoRoRo 1 . . . . . 1 . RoRooR 1 . . . . . 1 . oRRRo 1 1 . . . . . . oRoRoo 1 1 . . . . . . oRooRo 1 . 1 . . . . . ooRRo 1 . 1 . . . . . ooRRoRo 1 . . . . . 1 . ooRoRR 1 . 1 . . . . . oooRoR ------ ------ ------ ------ ------ ------ ------ ------ ------- 8754 350 1675 406 220 751 2103 1629 Total The long tails of these tables suggest that the words without core-mantle are combinations of letters, rather than single letters. We can see that double R's and double O's are rare, but not enough to be classed as errors: crust-only words with doubled R's = 409 (4.7% of crust-only words) crust-only words with doubled O's = 227 (2.7% of crust-only words) Triple letters are quite rare: there is only one "oao", and only 14 RRR's, most of them in the stars section: tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 7 . . 1 . 1 3 1 RRR 2 . . . . . 2 . RRRy 1 . . . . . 1 . RRRR 1 . . . . . . 1 RRRaR 1 . . . . . 1 . RaRRR 1 . 1 . . . . . RoRRR 1 . . . . . 1 . oRRRy ------ ------ ------ ------ ------ ------ ------ ------ ------- In fact all possible patterns of alternating R's and O's do occur with good frequency, incluing words consisting of a single R or a single O. Note also that the F's are not entirely final. Here are the words without core-mantle with F followed by final round:: tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 6 . . 2 . . 1 2 aFy 3 . 2 . . . . 1 DaFo 2 . 1 . . 1 . . DaFy 1 . 1 . . . . . oDaFo 1 . . . . . . . DoFy 1 . . 1 . . . . aDaFy 1 . . 1 . . . . oDFy 1 . . . . . . 1 oDaFy ------ ------ ------ ------ ------ ------ ------ ------ ------- 16 0 4 4 0 1 1 4 Total (The percents are relative to the total cruts-only words.) And here are those where the F is followed by some other R-letter: tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 3 . 2 . . . . . DaFD 3 . . . . . 3 . aFaD 3 . . . . 1 2 . aFaF 2 . 2 . . . . . DaFF 2 . 2 . . . . . DaFoD 2 1 . . . 1 . . aFF 2 . . . . . 2 . DaFDy 1 . . . 1 . . . DaFaFDy 1 . . . . . 1 . DDaFaD 1 . . . . . . . DaDaFF 1 1 . . . . . . DoDoDaFF 1 . 1 . . . . . FaF 1 . 1 . . . . . FoD 1 . . . . . 1 . aFD 1 . . . . . 1 . aFoF 1 . . . . . 1 . oDaFaD 1 . . . . . . . oFaF 1 . . . . . 1 . oFoD 1 . . . 1 . . . oaFD ------ ------ ------ ------ ------ ------ ------ ------ ------- 29 2 8 0 2 2 12 0 Total Now let's look at the words with non-empty core-mantle. We observed before that the round letters are mostly confined to the crust. Of the 26258 word occurrences with core-mantle, 1533 occurrences (5.8%) violate this rule. Of these, 865 would have a single core except for the intrusion of "a" or "o" (class I exceptions). The remaining 668 have other intrusions as well (class II). tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 feature ------ ------ ------ ------ ------ ------ ------ ------ ------- 35128 1312 6726 1364 702 2827 10110 6559 All words 865 42 302 78 31 41 190 13 Class I 2.5% 3.2% 4.5% 5.7% 4.4% 1.5% 1.9% 0.2% Percent 668 17 145 44 24 53 210 73 Class II 1.9% 1.3% 2.2% 3.2% 3.4% 1.9% 2.1% 1.1% Percent 1533 59 447 122 55 94 400 86 Class I+II 4.4% 4.5% 6.6% 8.9% 7.8% 3.3% 4.0% 1.3% Percent Note that class II exceptions are fairly rare in all sections, and, as it can be seen below, are more diverse than class I exceptions. Moreover, 137 of the class II exceptions have "y" intrusions. Since "y" is usually found at the end of words, it seems plausible that most, if not all, class II exceptions are accidental joins of two "normal" words. Their relative abundance in the cos/zod sections could be due to the general crowding of text in those sections. It is possible that many or all class I exceptions have the same origin, since "Xo" is one of the most common word patterns in all sections. Here are the class I exceptional core-mantle patterns: tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 568 31 180 48 19 26 130 12 «XoM» 169 5 72 16 8 4 39 1 «XoMX» 69 3 23 11 3 7 12 . «MoX» 28 1 16 1 1 1 2 . «XoX» 7 . 2 1 . . 2 . «MXoX» 6 . 2 1 . . 3 . «XaM» 5 1 3 . . . . . «XoMXX» 3 . . . . . . . «MaX» 2 . 1 . . . 1 . «MaXX» 2 1 . . . . 1 . «XXoMX» 2 . 2 . . . . . «XXoM» 2 . 1 . . 1 . . «XaMX» 2 . . . . 2 . . «XooM» ------ ------ ------ ------ ------ ------ ------ ------ ------- 865 42 302 78 31 41 190 13 Class I 2.5% 3.2% 4.5% 5.7% 4.4% 1.5% 1.9% 0.2% Percent Here are all class II exceptional core-mantle patterns with "y" intrusions: tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 39 . 15 4 1 4 9 3 «XyM» 20 . 9 3 . 1 4 3 «XyMX» 15 . 9 . . . 2 2 «MyM» 14 . 4 1 . . 5 1 «MyX» 7 . 4 . . . 1 . «MXyM» 5 . 2 2 1 . . . «MXyMX» 5 . 3 1 . . . . «MyMX» 5 . . . . 1 3 1 «XRyM» 3 . 1 . . . 1 . «MyRX» 3 . 1 . . . . 1 «XXyM» 2 . 2 . . . . . «XyX» 1 . . . . 1 . . «MRyM» 1 . 1 . . . . . «MRyX» 1 . . . . . 1 . «MXRyM» 1 . 1 . . . . . «MXoMyX» 1 1 . . . . . . «MXoRyX» 1 . 1 . . . . . «MoRyM» 1 . 1 . . . . . «MoRyX» 1 . . . . . . 1 «MyRM» 1 . . . 1 . . . «MyXX» 1 . . . . . . 1 «MyqoM» 1 . . 1 . . . . «XMXyM» 1 . . . . . 1 . «XRyMX» 1 . . . . . 1 . «XRyMoM» 1 . . . . 1 . . «XRyX» 1 . . . . . . . «XRyoM» 1 . 1 . . . . . «XoRyMX» 1 . 1 . . . . . «XoyMX» 1 . 1 . . . . . «XoyM» 1 1 . . . . . . «XyoM» ------ ------ ------ ------ ------ ------ ------ ------ ------- 137 2 57 12 3 8 28 13 Class II (w/ y) 0.4% 0.2% 0.8% 0.9% 0.4% 0.3% 0.3% 0.2% Percent And here are all other class II exceptional coremantle patterns: tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 82 4 9 6 1 5 27 18 «MoRX» 54 4 10 3 6 5 13 6 «MoM» 36 . . 1 2 4 12 6 «MaRX» 36 1 1 . 3 3 13 4 «XoRM» 35 1 12 3 . 3 11 1 «MXoM» 30 1 7 2 . 1 12 3 «XoRX» 22 1 . 1 . 2 10 5 «MoRM» 18 . . . . 1 11 1 «XoRMX» 16 . 5 1 2 1 5 . «MoMX» 15 . 6 . . 2 5 . «MXoRX» 15 . 4 3 1 2 3 . «XoeM» 13 . 1 . 2 3 3 . «MaM» 12 . . . . 1 7 3 «MoRMX» 11 . 7 . 1 . 1 1 «MXoMX» 9 . . . . . 6 2 «MaRM» 7 . . 4 1 1 . . «XoeMX» 6 2 . . 1 . 2 . «MoRXX» 5 . 1 . . . 4 . «MXoRM» 5 . 1 . . . 1 2 «MoRaRX» 5 . . . . . 5 . «XaRM» 4 . 1 . . . 1 . «MXoeM» 4 . 2 1 . . . . «XoMoM» 3 . 2 . . . . 1 «MoRRX» 3 . . . . 1 1 . «XMaRX» 3 . . . . . 3 . «XRaRM» 2 . . . . 1 . 1 «MRaRX» 2 . 2 . . . . . «MXaM» 2 . . . . . 2 . «MXoRMX» 2 . . . . . 2 . «MaRMX» 2 . . . . . 1 . «MoRaX» 2 . 2 . . . . . «MoXM» 2 1 . . . . 1 . «MoeM» 2 . . 1 . . 1 . «MoeoMX» 2 . . . . . 2 . «XRaM» 2 . . . . . 2 . «XXoRM» 2 . 1 . . . 1 . «XaRX» 2 . 1 . . . . . «XoMXoM» 2 . 2 . . . . . «XoRRX» 1 . . . . . . . «MRaM» 1 . 1 . . . . . «MRoMXM» 1 . . . . . . 1 «MRoRM» 1 . . . . . 1 . «MXRaM» 1 . . . . . . 1 «MXRoRM» 1 . . . . . 1 . «MXaRM» 1 . 1 . . . . . «MXoRRX» 1 . . . . 1 . . «MXoRXM» 1 . 1 . . . . . «MXoRoX» 1 . . . . 1 . . «MXoeMX» 1 . . . . . 1 . «MaRXX» 1 . . . . . 1 . «MaRaRoM» 1 . . . . . . . «MaRoRM» 1 . . . . . 1 . «MaRoRX» 1 . . . . . 1 . «MaeMX» 1 . . . . 1 . . «MaiifX» 1 . . . . . . . «MaleM» 1 . . . . . 1 . «MaoRM» 1 . . . . . . 1 «MoMXoM» 1 . . . 1 . . . «MoMoMX» 1 . 1 . . . . . «MoMoX» 1 . 1 . . . . . «MoRRXX» 1 . . . . . . 1 «MoRRaM» 1 . . . . . 1 . «MoRXoRM» 1 . . . . . . 1 «MoRoMX» 1 . . . . 1 . . «MoRoM» 1 . . . . . . 1 «MoRoRX» 1 . 1 . . . . . «MoRoXX» 1 . . . . . . . «MoXRX» 1 . 1 . . . . . «MoXoM» 1 . . . . . . . «MoaRX» 1 . . . . . 1 . «MoeX» 1 . . . . . . . «MoeaRX» 1 . 1 . . . . . «MoeoM» 1 . . . . 1 . . «XMXoRX» 1 . . . . 1 . . «XMaMX» 1 . 1 . . . . . «XMoM» 1 . . 1 . . . . «XMoRX» 1 . . . . 1 . . «XRaRX» 1 . . . . . 1 . «XRaRoM» 1 . . 1 . . . . «XRoMXoM» 1 . . . . . 1 . «XXoRX» 1 . . 1 . . . . «XXoeM» 1 . . . . . 1 . «XaRMX» 1 . . . . . . . «XaRoRM» 1 . . 1 . . . . «XoMRX» 1 . . . . . . . «XoMXM» 1 . . 1 . . . . «XoMXoMX» 1 . . 1 . . . . «XoMoRX» 1 . . . . 1 . . «XoRMXMX» 1 . . . . . 1 . «XoRMXX» 1 . . . . . . . «XoRoRM» 1 . 1 . . . . . «XoRoX» 1 . . . . . 1 . «XoeMXX» 1 . . . . 1 . . «XooRMX» 1 . 1 . . . . . «XoqoM» ------ ------ ------ ------ ------ ------ ------ ------ ------- 531 15 88 32 21 45 182 60 Class II (no y) 1.5% 1.1% 1.3% 2.3% 3.0% 1.6% 1.8% 0.9% Percent Now let's look at the prefixes and suffixes of the 26258 words that have a core/mantle. Of these, 26128 have prefixes consisting entirely of R's and O's. (The bulk of the exceptions are prefixes "e" and "oe", which may be mis-transcribed "c" and "oc".) First, the counts of R-Oprefixes according to number of O's and R's: O-letters -> 0 1 2 3 ALL ------------------- ----- ----- ----- ----- ----- 0 R-letters 12534 10789 34 . 23357 1 R-letters 1546 1035 30 . 2611 2 R-letters 10 134 13 1 158 3 R-letters 1 . 1 . 2 ALL 14091 11958 78 1 26128 We see that almost all prefixes have a single O letter and/or a single R letter. These account for 25904 of the 26128 prefixes (99.1%). Strangely there is only a weak correlation between the number of R's and the number of O's. Here are the R-O prefix counts: tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 12534 490 3081 420 207 1048 3365 2003 « 9324 326 1217 356 237 656 3253 1996 o« 1546 20 222 34 3 59 679 384 R« 1432 54 358 87 16 212 287 107 y« 780 24 43 8 5 49 212 308 oR« 90 2 1 3 7 8 48 5 aR« 86 3 27 7 . 4 25 12 Ro« 73 . 6 3 . 4 13 37 RoR« 53 1 17 3 . 9 4 12 Ry« 52 . 4 2 2 1 13 21 RaR« 33 3 . . 1 3 19 . a« 15 . 11 . . 1 1 2 oy« 14 . 6 . 1 1 3 2 Ra« 13 2 5 . . 1 1 1 yo« 12 1 2 2 . 1 5 1 yR« 11 2 1 . . 1 2 4 oRo« 10 . 1 . . . 6 2 RR« 7 . . 2 . . 2 1 aRo« 5 1 2 1 . . 1 . oo« 4 1 1 . . . . 2 oRR« 3 . 1 . . 1 1 . Ray« 3 . . . . . . 1 RyR« 3 . . 1 . . 2 . aRoR« 2 1 . . . . 1 . RaRo« 2 . . 1 . . 1 . aRR« 2 . . 2 . . . . aRaR« 2 . 2 . . . . . oRa« 2 . . . . 1 . 1 oRoR« 2 . 1 . . . . 1 oRy« 2 1 . . . . 1 . ooR« 1 . 1 . . . . . RRR« 1 . . . . . 1 . RaRoR« 1 . . . . . . . RaRy« 1 . . . . . . . RoRo« 1 . 1 . . . . . Roo« 1 . . . . 1 . . aRa« 1 . . . . . 1 . oRaRy« 1 . . . . . . . oRaR« 1 . . . . . . . yRoR« 1 . 1 . . . . . yoR« 1 . 1 . . . . . yy« ------ ------ ------ ------ ------ ------ ------ ------ ------- 26128 932 5013 932 479 2061 7947 4903 Total Again, mapping all round letters to "o": tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 12534 490 3081 420 207 1048 3365 2003 « 10789 383 1575 443 254 871 3559 2103 o« 1546 20 222 34 3 59 679 384 R« 882 27 46 13 12 58 265 314 oR« 153 4 50 10 1 14 32 26 Ro« 128 . 10 5 2 5 26 59 RoR« 34 3 19 1 . 2 3 3 oo« 23 2 4 2 . 2 4 6 oRo« 10 . 1 . . . 6 2 RR« 9 . . 3 . 1 2 1 oRoR« 6 1 1 1 . . 1 2 oRR« 4 1 . . . . 1 . RoRo« 4 . 2 . . 1 1 . Roo« 3 1 1 . . . 1 . ooR« 1 . 1 . . . . . RRR« 1 . . . . . 1 . RoRoR« 1 . . . . . 1 . oRoRo« ------ ------ ------ ------ ------ ------ ------ ------ ------- Now the suffixes. Of the 26258 words that have a core/mantle, 26075 have a suffix consisting exclusively of R's and O's. Here are the suffix counts according to the number of R's and O's: O-letters -> 0 1 2 3 4 ALL ------------------- ----- ----- ----- ----- ----- ----- 0 R-letters 299 7838 83 1 . 8221 1 R-letters 641 13857 1377 10 1 15886 2 R-letters 29 853 894 69 2 1847 3 R-letters 5 10 73 29 2 119 4 R-letters . . 1 . 1 2 ALL 974 22558 2428 109 6 26075 Rel.percent 3.7% 86.5% 9.3% 0.4% 0.0% 100.0% So essentially all suffixes consist of 0..2 R-letters and 0..2 O-letters. There is a visible correlation between the two counts. Here are all R-O suffixes: tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 6936 288 1586 304 128 461 1947 1342 »y 4863 35 679 100 73 382 1816 990 »aR 4727 7 26 45 29 580 1761 1858 »Ry 4211 382 1616 166 73 173 712 324 »oR 1051 79 209 90 32 111 250 10 »oRy 883 51 291 51 36 38 272 6 »o 741 11 96 49 23 64 277 133 »R 685 . 12 14 15 81 369 82 »RaR 457 23 117 29 25 24 119 5 »oRaR 299 12 78 20 2 45 58 26 » 159 1 12 5 9 16 41 30 »aRy 120 2 3 2 8 8 67 3 »aRaR 93 8 43 2 2 5 5 12 »oRRy 92 4 21 11 5 6 31 . »oaR 79 8 34 2 2 4 14 5 »oRoR 74 8 28 5 . 5 15 . »oy 56 . 3 . . 7 26 14 »RoR 54 7 21 . 4 2 11 1 »oRR 49 . 4 1 . 2 17 14 »aRoR 46 . 5 3 1 9 5 9 »aRRy 38 3 11 5 . 2 5 5 »yR 29 . 4 2 . 2 15 3 »RR 26 . 10 2 . . 3 3 »aRR 26 . 13 2 1 2 5 3 »yRy 21 . . 2 1 2 6 8 »RRy 20 . 6 . 5 1 3 . »oRaRy 19 1 7 1 . 2 4 1 »a 19 1 8 . 2 . . 1 »oRRaR 18 1 12 . 1 2 . . »oRoRy 18 1 6 2 . . 8 . »ooR 15 . 1 2 . 1 6 1 »aRoRy 14 . 10 1 . . . . »yRaR 13 2 . 2 . 2 6 . »Ro 11 . . 2 . 2 4 1 »RaRy 11 . 1 1 . 4 4 . »aRRaR 10 1 5 . . . 1 2 »oRo 8 . . 1 . . 6 1 »RaRaR 8 . . . . . 3 4 »RyR 7 . . . . 1 3 2 »RaRoR 7 . 1 . . . 3 1 »aRo 6 1 2 . . 2 1 . »RoRy 6 . . 1 1 2 2 . »RyRy 6 . . 1 . 1 2 1 »oRaRaR 6 . 1 1 . . 1 . »oaRy 5 . 1 . . 1 3 . »RRR 5 1 . . . 1 2 1 »Ra 5 . . . . . 3 1 »RaRRy 5 . . . . 2 2 . »RoRaR 5 . 3 1 . . . . »ay 5 . 3 . . . . 1 »oRoRR 4 . 1 . 1 . 2 . »aRaRy 4 . 1 . . . 1 . »oRaRR 4 . 1 . . . 3 . »oRaRoR 3 . . 1 . . 1 . »RaRR 3 1 . . . . 1 . »oRRaRy 3 . 1 2 . . . . »oRRoR 3 . 1 . . . 2 . »oRa 3 . 2 1 . . . . »oRoRaR 3 . 2 . . . . 1 »oRyR 2 . . . . . 1 1 »RRaR 2 . . . . . 2 . »RaRoRy 2 . . . . . . 2 »RyoR 2 . . . . 2 . . »aRRR 2 . . 1 . . 1 . »aRaRaR 2 . . . . 1 1 . »aRoRaR 2 . . . . . 1 . »aRyR 2 . 1 . . . . 1 »aoR 2 . . 1 . . 1 . »oRRR 2 . . 1 . . . . »oRaRRy 2 . 1 . . . 1 . »oo 2 . . . . 2 . . »ooRy 2 . 1 . . . . . »yRR 2 . 1 . . . . . »yRaRy 2 . . . . . 1 1 »yRoR 2 . 1 . . . . . »yaR 1 . 1 . . . . . »RRaRRy 1 . . . . 1 . . »RRaRy 1 . . . . . 1 . »RRo 1 . . . . . 1 . »RaRRo 1 . . . . 1 . . »RaRo 1 . 1 . . . . . »RoRRy 1 . . . . . 1 . »RoaR 1 . 1 . . . . . »Roy 1 . . . . . 1 . »RyRR 1 . . . . . 1 . »Ryo 1 . . . . . . . »aRRa 1 . . . . . . 1 »aRRoR 1 . . . . 1 . . »aRa 1 . 1 . . . . . »aRaRR 1 . . . . . 1 . »aRaRo 1 . . . . . 1 . »aRaRoR 1 . . . . . . . »aRooR 1 . . . . . 1 . »aRyRaR 1 . . . . . 1 . »aRyRy 1 . . . . . . . »aaR 1 . . 1 . . . . »ao 1 . . . . . . 1 »aoy 1 . . . . . 1 . »ayRy 1 . . . . . 1 . »oRaRo 1 . . . . . 1 . »oRaRoRy 1 . . . . . 1 . »oRaRoaR 1 . . . . . . . »oRaaR 1 . 1 . . . . . »oRoRRy 1 . . . . . 1 . »oRoRaRoR 1 . 1 . . . . . »oRoRo 1 . 1 . . . . . »oRoaR 1 . 1 . . . . . »oRoaRy 1 . . . . . . . »oRyRaR 1 . . . . 1 . . »oRyRy 1 . . . . . 1 . »oaRaR 1 . 1 . . . . . »oaRoR 1 . . . . . 1 . »oaRoRy 1 . 1 . . . . . »oyR 1 . . . . . 1 . »oyRy 1 . . . . . . 1 »yRRRy 1 . 1 . . . . . »yRRo 1 . . 1 . . . . »yRa 1 . 1 . . . . . »yRaRoR 1 1 . . . . . . »yo 1 . . . . . 1 . »yoR 1 . . . . . 1 . »yoaRy ------ ------ ------ ------ ------ ------ ------ ------ ------- 26175 940 5015 937 479 2064 7952 4913 Total RO Here again are the suffixes, with all [aoy] mapped to "o": tot pha.2 hea.1 cos.2 zod.1 heb.1 str.2 bio.1 pattern ------ ------ ------ ------ ------ ------ ------ ------ ------- 9112 420 2306 271 146 557 2533 1319 »oR 7838 340 1884 356 164 501 2223 1349 »o 4745 10 26 47 29 583 1769 1859 »Ro 1258 81 241 98 42 130 302 46 »oRo 749 . 15 14 15 88 398 100 »RoR 741 11 96 49 23 64 277 133 »R 726 33 170 35 35 38 219 29 »oRoR 299 12 78 20 2 45 58 26 » 141 8 49 5 3 14 10 21 »oRRo 117 5 30 13 5 6 40 1 »ooR 83 9 32 7 . 5 16 . »oo 82 7 32 2 4 2 14 4 »oRR 64 1 22 2 7 5 14 1 »oRoRo 34 1 10 3 2 4 4 2 »oRRoR 29 . 4 2 . 2 15 3 »RR 24 1 2 3 1 7 7 1 »RoRo 22 . . 2 1 2 7 8 »RRo 21 . 4 3 . 2 9 1 »oRoRoR 20 . . 1 . 3 11 3 »RoRoR 10 . 5 . . . 1 1 »oRoRR 10 . 1 1 . 2 3 . »ooRo 7 . 1 . . . 4 1 »RoRRo 5 . 1 . . 1 3 . »RRR 4 . . 1 . . 2 . »RoRR 4 . . 1 . 2 1 . »oRRR 3 . . . . . 1 2 »RooR 3 1 . . . . 1 . »oRRoRo 3 . 1 1 . . . . »oRoRRo 3 . 1 . . . . . »oRooR 2 . . . . . 1 1 »RRoR 2 . . . . . 2 . »RoRoRo 2 . 1 . . . 1 . »Roo 2 . 1 . . . 1 . »ooRoR 1 . 1 . . . . . »RRoRRo 1 . . . . 1 . . »RRoRo 1 . . . . . . 1 »oRRRo 1 . . . . . 1 . »oRoRoRo 1 . . . . . 1 . »oRoRoRoR 1 . . . . . 1 . »oRoRooR 1 . 1 . . . . . »oRooRo 1 . . . . . 1 . »ooRoRo 1 . . . . . . 1 »ooo 1 . . . . . 1 . »oooRo ------ ------ ------ ------ ------ ------ ------ ------ ------- CONCLUSIONS What can we conclude from all this? We seem to have three `axioms' that are as true as they can be: (1) there is at most one gallows letter in each word; (2) the mantle letters occur clustered, except possibly for an inserted [aoy] after a table; (3) except for [qaoy], the crust prefix consists of at most one letter [dlrs]; (4) except for [aoy], the crust suffix has the form { ~ d l s r }{ ~ d l il iil m im iim n in iin iiin r ir iir s }, where "~" denotes the empty string. (5) The gallows and tables rarely occur alone, while the dealers and rounds do. Another possible axiom is that the letter "e" and its multiples always follow one of the letters [kth] or, more rarely, [odsrl]. That still fails to account for occurrences of those strings at beginning-of-word (71 single "e", 37 "ee", 28 "eee"). Looking at the actual instances, however, it seems plausible that they (and perhaps some of the other occurrences too) could be badly written instances of "c" "o", "ch", or "che". Given the broad distribution of the mantle strings, we can rule out the possibility that that each mantle string represents a distinct letter. On the other hand, the data strongly suggests that the mantles are composed of units larger than a single EVA letter. The `alphabet' that I have been assuming ("k", "ke", "ch", "che", etc.) seems consistent with the data; but other `alphabets' may work as well.