Back from the math colloquium, i decided to review the \c/ stroke in JSA encoding, to see if strings of \c/s could be parsed unambiguously into letters. First of all, let's review the grouping of \ci*/ into letters: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/ci/a/g' \ | compare-contexts -lctx 0 -rctx 0 -colw 20 \ 'ai*' \ 'oi*' 710 0.59 ai 1642 0.60 o 330 0.27 aiii 1075 0.39 oi 130 0.11 aii 9 0.00 oiii 35 0.03 a 3 0.00 oii 4 0.00 aiiii ----- ---- ---- ----- ---- ---- 2729 1.00 TOT 1209 1.00 TOT Hm, it seems that \ci/ is not really a letter; it is most often attached to the following \i/ strings. Let's retry with some more context, and removing the \qo/ combination: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/_qo/_A/g' \ -e 's/ci/a/g' \ | compare-contexts -lctx 0 -rctx 1 -colw 20 \ 'ai*' \ 'oi*' 403 0.33 aix 760 0.50 oix 329 0.27 aiiiu 260 0.17 oq 271 0.22 ais 250 0.17 ol 109 0.09 aiiu 171 0.11 ois 32 0.03 aij 35 0.02 oc 19 0.02 aiis 17 0.01 o_ 15 0.01 ax 9 0.01 oiiiu 8 0.01 ac 4 0.00 oij 4 0.00 as 1 0.00 oiiu 4 0.00 aiu 1 0.00 oiis 4 0.00 aiiiiu ----- ---- ---- 4 0.00 a_ 1508 1.00 TOT 2 0.00 al 2 0.00 aiix 1 0.00 aq 1 0.00 ao 1 0.00 aiiis ----- ---- ---- 1209 1.00 TOT Let's look at the .dic file, instead of .wds, to lessen the effect of common words: cat bio-j-jsa-gut.dic \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/_qo/_A/g' \ -e 's/ci/a/g' \ | compare-contexts -lctx 0 -rctx 1 -colw 20 \ 'ai*' \ 'oi*' 126 0.35 aix 233 0.49 oix 86 0.24 ais 64 0.13 oq 48 0.13 aiiiu 64 0.13 ois 22 0.06 aiiu 61 0.13 ol 21 0.06 aij 32 0.07 oc 14 0.04 aiis 13 0.03 o_ 11 0.03 ax 5 0.01 oiiiu 8 0.02 ac 4 0.01 oij 4 0.01 as 1 0.00 oiiu 4 0.01 aiiiiu 1 0.00 oiis 3 0.01 a_ ----- ---- ---- 2 0.01 al 478 1.00 TOT 2 0.01 aiu 2 0.01 aiix 1 0.00 aq 1 0.00 ao 1 0.00 aiiis ----- ---- ---- 356 1.00 TOT Hm, it seems quite possible that the \o/ in \oiiiu/, \oij/, \oiiu/, \oiis/ is actually a misreading of \ci/. Let's now compare the occurrences of \i/ strings after \c/ against other letters: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ | compare-contexts -lctx 0 -rctx 0 -colw 20 \ 'cii*' \ '[^ci]ii*' 710 0.59 cii 1075 0.73 oi 330 0.27 ciiii 361 0.24 _i 130 0.11 ciii 13 0.01 yi 35 0.03 ci 9 0.01 oiii 4 0.00 ciiiii 6 0.00 ji ----- ---- ---- 5 0.00 xi 1209 1.00 TOT 4 0.00 gi 3 0.00 oii 1 0.00 si ----- ---- ---- 1477 1.00 TOT Hm. According to this table, strings of two or more \i/s occur only after a \c/ stroke. The exceptions \oiii/ (9 instances) and \oii/ (3 instances) can easily be explained as misreadings of \ciiii/ (330 instances) and \cii/ (130 instances). If the probability of misreading \ci/ as \o/ is independent of context, then we can expect that 20 \oi/s are actually \cii/s. Let's retry with some additional context: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ | compare-contexts -lctx 0 -rctx 1 -colw 20 \ 'cii*' \ '[^ci]ii*' 403 0.33 ciix 893 0.60 oix 329 0.27 ciiiiu 282 0.19 _ix 271 0.22 ciis 178 0.12 ois 109 0.09 ciiiu 79 0.05 _is 32 0.03 ciij 9 0.01 oiiiu 19 0.02 ciiis 8 0.01 yix 15 0.01 cix 6 0.00 jix 8 0.01 cic 4 0.00 yis 4 0.00 cis 4 0.00 oij 4 0.00 ciiu 3 0.00 xix 4 0.00 ciiiiiu 3 0.00 gix 4 0.00 ci_ 2 0.00 xis 2 0.00 cil 2 0.00 oiiu 2 0.00 ciiix 1 0.00 yij 1 0.00 ciq 1 0.00 six 1 0.00 cio 1 0.00 oiis 1 0.00 ciiiis 1 0.00 gis ----- ---- ---- ----- ---- ---- 1209 1.00 TOT 1477 1.00 TOT According to this table, it sems that: The pairs \ix/, \is/, and \ij/ are letters: they can appear after \o/, \ci/, and word-begin, but also after other strokes like \c/, \y/, \j/, \g/, \s/, \x/. Strings of one or more \i/ ending with the \u/ plume can only occur after \c/. The exceptions \oiiiu/ (9) and \oiiu/ (2) can be explained as misreadings of \ciiiiu/ (329) and \ciiiu/ (109). That would suggest that about 3% of the \o/s are actually \ci/s. Strings of two or more \i/s with the \s/, \j/, and \x/ plumes can appear only after \c/. The exception \oiis/ (1) can be explained as a misreading of \ciiis/ (19). Here is the \c/ column again, manually sorted by last letter: 4 0.00 ci_ 8 0.01 cic 2 0.00 cil 1 0.00 cio 1 0.00 ciq 32 0.03 ciij 4 0.00 cis 271 0.22 ciis 19 0.02 ciiis 1 0.00 ciiiis 4 0.00 ciiu 109 0.09 ciiiu 329 0.27 ciiiiu 4 0.00 ciiiiiu 15 0.01 cix 403 0.33 ciix 2 0.00 ciiix ----- ---- ---- 1209 1.00 TOT From all this data, it seems we can draw the following hypotheses: The strings \ij/, \is/, and \ix/ are letters. It is possible that \iis/ is a rare letter, too. The pair \ci/ is often a letter, but sometimes it is not. In particular, \cix/ is the letter \ix/ following a \c/. The only strings that end with \u/ plume are \ciiiu/ and \ciiiiu/. The last observation has a number of possible explanations: (1) \ciiiu/ and \ciiiiu/ are letters; or (2) \iiu/ and \iiiu/ are letters that can occur only after \ci/; or (3) \iiiu/ and \iiiiu/ are letters that can occur only after \c/; or (4) \iiiu/ is a letter that can occur after \c/ or \ci/. The mixed hypotheses (5) \iiu/ and \iiiu/ are letters that can occur after \c/ or \ci/ (6) \iiiu/ and \iiiiu/ are letters that can occur after \c/ or \ci/ is rather unlikely, given the low frequency of \ciiu/ and \ciiiiiu/. Hypothesis (2) has the merit that it provides an alternative explanation for the rare occurrences of \oiiu/ and \oiiiu/, not depending on transcription errors. Let's be conservative, and lump only \ix/, \ij/, \is/, \iiu/ as letters, leaving out the first \i/ of \iiiu/ and \iis/. Here is a table of these \i/ letters and their left contexts: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/ci/a/g' \ | compare-contexts -lctx 1 -rctx 0 -colw 17 \ 'i*k' \ 'i*e' \ 'i*r' \ 'i*n' 32 0.86 ak 893 0.55 oe 271 0.48 ar 329 0.72 ain 4 0.11 ok 403 0.25 ae 178 0.32 or 109 0.24 an 1 0.03 yk 282 0.17 _e 79 0.14 _r 9 0.02 oin ----- ---- --- 15 0.01 ce 19 0.03 air 4 0.01 cn 37 1.00 TOT 8 0.00 ye 4 0.01 yr 4 0.01 aii 6 0.00 je 4 0.01 cr 2 0.00 on 3 0.00 ge 2 0.00 er ----- ---- --- 3 0.00 e 1 0.00 oir 457 1.00 TOT 2 0.00 aie 1 0.00 gr 1 0.00 se 1 0.00 aii ----- ---- --- ----- ---- --- 1616 1.00 TOT 560 1.00 TOT Note the occurrences of \y/ before the \i/ letters, except \m/ and \n/. This data confirms our previous guess that the \cy/ group is merely the final form of \ci/. Let's do this reduction, too: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/cy/a/g' \ -e 's/ci/a/g' \ | compare-contexts -lctx 1 -rctx 0 -colw 17 \ 'i*k' \ 'i*e' \ 'i*r' \ 'i*n' 33 0.89 ak 893 0.55 oe 275 0.49 ar 329 0.72 ain 4 0.11 ok 411 0.25 ae 178 0.32 or 109 0.24 an ----- ---- --- 282 0.17 _e 79 0.14 _r 9 0.02 oin 37 1.00 TOT 15 0.01 ce 19 0.03 air 4 0.01 cn 6 0.00 je 4 0.01 cr 4 0.01 aii 3 0.00 ge 2 0.00 er 2 0.00 on 3 0.00 e 1 0.00 oir ----- ---- --- 2 0.00 aie 1 0.00 gr 457 1.00 TOT 1 0.00 se 1 0.00 aii ----- ---- --- ----- ---- --- 1616 1.00 TOT 560 1.00 TOT Now, let's check whether the combinations \cy/, \ci/ and \cg/ behave like \c/ on the left. To reduce the number of distinct patterns, I will collapse \cs/ to \c/, and erase all gallows: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]//' \ -e 's/cs/c/g' \ -e 's/ci/a/g' \ -e 's/cy/a/g' \ -e 's/cg/8/g' \ -e 's/c\([^ca8o]\)/C\1/g' \ -e 's/o\([^ca8o]\)/O\1/g' \ | compare-contexts -lctx 1 -rctx 0 -colw 20 \ '[ca8o]*C' \ '[ca8o]*8' \ '[ca8o]*a' \ '[ca8o]*O' 10 0.19 _cccC 456 0.22 _ccc8 443 0.11 _ccc8a 517 0.46 _O 6 0.11 xC 322 0.16 _8 418 0.10 qoa 157 0.14 qO 6 0.11 _C 207 0.10 qoc8 264 0.07 _8a 102 0.09 xO 5 0.09 _ccC 201 0.10 qocc8 211 0.05 _ccca 65 0.06 _cccO 4 0.07 _ccccC 150 0.07 xccc8 203 0.05 qoc8a 61 0.05 _cO 3 0.06 xccccC 95 0.05 _oc8 196 0.05 qocc8a 41 0.04 _ccO 3 0.06 OC 74 0.04 _cccc8 180 0.04 _oa 36 0.03 sO 2 0.04 xcC 65 0.03 _occ8 175 0.04 _cccca 33 0.03 qoO 2 0.04 qocccC 57 0.03 xcc8 174 0.04 xa 26 0.02 _8O 2 0.04 qoaC 54 0.03 _cc8 140 0.03 xccc8a 18 0.02 _oO 1 0.02 xccC 51 0.02 x8 108 0.03 _a 9 0.01 xcccO 1 0.02 scccC 35 0.02 qoccc8 93 0.02 _oc8a 8 0.01 _ocO 1 0.02 sccC 31 0.02 xc8 87 0.02 _ccccc 6 0.01 xccO 1 0.02 qoccC 29 0.01 _occc8 85 0.02 qocca 6 0.01 qocO 1 0.02 qocC 27 0.01 _c8 85 0.02 _ca 6 0.01 _ccccO 1 0.02 qcc8aC 26 0.01 _8ccc8 72 0.02 _cccc8 6 0.01 _8ccO 1 0.02 _occca 18 0.01 _ccccc 68 0.02 xccca 4 0.00 _occO 1 0.02 _cC 16 0.01 _accc8 64 0.02 sa 4 0.00 _ccc8O 1 0.02 _acccc 14 0.01 _acc8 62 0.02 _occ8a 4 0.00 _8cccO 1 0.02 _aC 12 0.01 xcccc8 59 0.01 _cca 3 0.00 jO 1 0.02 _8cccc 12 0.01 sccc8 55 0.01 xcc8a 3 0.00 _cc8O ----- ---- ---- 12 0.01 _ac8 50 0.01 xcca 2 0.00 xcO 54 1.00 TOT 11 0.01 _ccccc 49 0.01 _cc8a 2 0.00 qoccO 7 0.00 qo8 48 0.01 x8a 2 0.00 _occcO 6 0.00 _o8 45 0.01 qoca 2 0.00 _ccccc 5 0.00 qcc8 40 0.01 _occa 2 0.00 _acccO 5 0.00 _8cc8 34 0.01 qoccc8 1 0.00 xccccO 4 0.00 scccc8 29 0.01 xc8a 1 0.00 x8O 4 0.00 _a8 28 0.01 _occc8 1 0.00 uO 4 0.00 _8c8 27 0.01 _c8a 1 0.00 qoc8O 3 0.00 qocccc 26 0.01 _8ccc8 1 0.00 jcccO 3 0.00 qoa8 22 0.01 _oca 1 0.00 _oaO 3 0.00 qccc8 21 0.01 _occca 1 0.00 _coO 3 0.00 _8cccc 18 0.00 _ccccc 1 0.00 _c8aO 2 0.00 xoccc8 17 0.00 qoccca 1 0.00 _8cccc 2 0.00 xo8 16 0.00 xcccca ----- ---- ---- 2 0.00 xccccc 15 0.00 _accc8 1134 1.00 TOT 2 0.00 xa8 14 0.00 _acc8a 2 0.00 s8 12 0.00 sccca 2 0.00 _cocc8 12 0.00 _ac8a 2 0.00 _acccc 12 0.00 _aa 2 0.00 _8acc8 11 0.00 xcccc8 2 0.00 _8ac8 11 0.00 xca 1 0.00 xccccc 10 0.00 sccc8a 1 0.00 x8ccc8 9 0.00 _ccccc 1 0.00 x88 9 0.00 _acca 1 0.00 scc8 8 0.00 _accca 1 0.00 sa8 7 0.00 ja 1 0.00 qoc8cc 6 0.00 xccccc 1 0.00 qoc8c8 6 0.00 _o8a 1 0.00 qoacc8 6 0.00 _ccccc 1 0.00 qo8ccc 5 0.00 qo8a 1 0.00 qo8cc8 5 0.00 _acccc 1 0.00 qcccc8 5 0.00 _8cc8a 1 0.00 jcc8 4 0.00 scccc8 1 0.00 jc8 4 0.00 qcc8a 1 0.00 j8 4 0.00 qca 1 0.00 _occo8 4 0.00 _aca 1 0.00 _oa8 4 0.00 _a8a 1 0.00 _o8ccc 4 0.00 _8ccca 1 0.00 _co8 4 0.00 _8c8a 1 0.00 _ccocc 3 0.00 xoa 1 0.00 _ccocc 3 0.00 ua 1 0.00 _cco8 3 0.00 qocccc 1 0.00 _8oc8 3 0.00 qoa8a 1 0.00 _8cccc 3 0.00 qccc8a ----- ---- ---- 3 0.00 jca 2063 1.00 TOT 3 0.00 _occcc 3 0.00 _ccoa 3 0.00 _8cccc 3 0.00 _8cccc 3 0.00 _8cca 2 0.00 xoccc8 2 0.00 xccccc 2 0.00 xccccc 2 0.00 scca 2 0.00 qcca 2 0.00 jcca 2 0.00 ga 2 0.00 _cocc8 2 0.00 _acccc 2 0.00 _acccc 2 0.00 _8acca 2 0.00 _8acc8 2 0.00 _8ac8a 2 0.00 _8aa 1 0.00 xocca 1 0.00 xo8a 1 0.00 xccccc 1 0.00 xc8ca 1 0.00 xa8a 1 0.00 x8ccc8 1 0.00 x88a 1 0.00 scccca 1 0.00 scc8a 1 0.00 sca 1 0.00 sacca 1 0.00 sa8a 1 0.00 s8a 1 0.00 qocccc 1 0.00 qoc8cc 1 0.00 qoc8c8 1 0.00 qoacc8 1 0.00 qoaa 1 0.00 qo8cca 1 0.00 qo8cc8 1 0.00 qo8aca 1 0.00 qccca 1 0.00 qcc8cc 1 0.00 qaa 1 0.00 qa 1 0.00 jcc8a 1 0.00 jc8a 1 0.00 j8a 1 0.00 _occo8 1 0.00 _oc8cc 1 0.00 _oaccc 1 0.00 _oa8a 1 0.00 _o8ccc 1 0.00 _coccc 1 0.00 _co8a 1 0.00 _ccocc 1 0.00 _ccocc 1 0.00 _ccocc 1 0.00 _cco8a 1 0.00 _cccoa 1 0.00 _ccccc 1 0.00 _ccccc 1 0.00 _cccaa 1 0.00 _ccc8c 1 0.00 _ccc8a 1 0.00 _ccacc 1 0.00 _caa 1 0.00 _aoa 1 0.00 _acccc 1 0.00 _8occa 1 0.00 _8oc8a 1 0.00 _8cccc 1 0.00 _8aca ----- ---- ---- 4017 1.00 TOT Let's recount, with narrower left contexts (all the \c/s and one more letter): cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]//' \ -e 's/cs/c/g' \ -e 's/ci/a/g' \ -e 's/cy/a/g' \ -e 's/cg/8/g' \ -e 's/c\([^ca8o]\)/C\1/g' \ | compare-contexts -lctx 1 -rctx 0 -colw 20 \ 'c*C' \ 'c*8' \ 'c*a' \ 'c*o' 10 0.19 _cccC 486 0.23 _ccc8 1949 0.47 8a 1230 0.45 qo 6 0.11 xC 367 0.17 _8 613 0.15 oa 1066 0.39 _o 6 0.11 _C 305 0.14 oc8 221 0.05 _ccca 110 0.04 xo 5 0.09 aC 269 0.13 occ8 216 0.05 _a 80 0.03 _co 5 0.09 _ccC 150 0.07 xccc8 180 0.04 _cccca 68 0.02 _ccco 4 0.07 _ccccC 77 0.04 _cccc8 175 0.04 xa 55 0.02 _cco 3 0.06 xccccC 67 0.03 occc8 128 0.03 occa 37 0.01 8o 3 0.06 oC 60 0.03 _cc8 92 0.02 _ca 36 0.01 so 2 0.04 xcC 57 0.03 xcc8 89 0.02 _ccccc 9 0.00 xccco 2 0.04 occcC 53 0.03 x8 73 0.02 _cca 6 0.00 xcco 1 0.02 xccC 32 0.02 _c8 68 0.02 xccca 6 0.00 _cccco 1 0.02 scccC 31 0.01 xc8 67 0.02 oca 6 0.00 8cco 1 0.02 sccC 21 0.01 o8 66 0.02 sa 4 0.00 8ccco 1 0.02 occC 19 0.01 _ccccc 50 0.01 xcca 3 0.00 jo 1 0.02 ocC 17 0.01 acc8 39 0.01 occca 3 0.00 ao 1 0.02 accccC 16 0.01 accc8 16 0.00 xcccca 2 0.00 xco 1 0.02 _cC 14 0.01 ac8 12 0.00 sccca 2 0.00 accco 1 0.02 8ccccC 12 0.01 xcccc8 11 0.00 xca 2 0.00 _ccccc ----- ---- ---- 12 0.01 sccc8 8 0.00 8cca 1 0.00 xcccco 54 1.00 TOT 11 0.01 a8 7 0.00 ja 1 0.00 uo 11 0.01 _ccccc 7 0.00 _ccccc 1 0.00 jccco 5 0.00 qcc8 6 0.00 xccccc 1 0.00 8cccco 4 0.00 scccc8 4 0.00 qca ----- ---- ---- 3 0.00 qccc8 4 0.00 occcca 2729 1.00 TOT 3 0.00 occcc8 4 0.00 8ccca 2 0.00 xccccc 3 0.00 ua 2 0.00 s8 3 0.00 jca 2 0.00 acccc8 3 0.00 8cccca 1 0.00 xccccc 2 0.00 xccccc 1 0.00 scc8 2 0.00 scca 1 0.00 qcccc8 2 0.00 qcca 1 0.00 jcc8 2 0.00 qa 1 0.00 jc8 2 0.00 jcca 1 0.00 j8 2 0.00 ga ----- ---- ---- 1 0.00 scccca 2114 1.00 TOT 1 0.00 sca 1 0.00 qccca 1 0.00 _ccccc 1 0.00 8ca ----- ---- ---- 4131 1.00 TOT