Let's look more closely at the tall letters: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/cs/z/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/y/i/g' \ -e 's/ci/a/g' \ -e 's/cg/8/g' \ | compare-contexts -lctx 0 -rctx 0 -colw 24 \ '[czlqjg]*lj[czlqjg]*' \ '[czlqjg]*qj[czlqjg]*' \ '[czlqjg]*lg[czlqjg]*' \ '[czlqjg]*qg[czlqjg]*' 571 0.36 lj 230 0.33 qj 10 0.62 lgccc 53 0.34 qgccc 376 0.24 ljcc 161 0.23 qjc 2 0.12 lg 50 0.32 qg 322 0.21 ljc 118 0.17 qjcc 2 0.12 clgccc 8 0.05 qgzcc 38 0.02 cccljc 28 0.04 qjccc 1 0.06 lgcc 8 0.05 qgcc 32 0.02 ljccc 27 0.04 cccqjc 1 0.06 cclg 4 0.03 qgzc 27 0.02 zccljc 18 0.03 ccccqjc ----- ---- ---- 4 0.03 cqgcc 26 0.02 zcccljc 14 0.02 zccqjc 16 1.00 TOT 4 0.03 cqgc 23 0.01 ccccljc 14 0.02 qjzcc 4 0.03 cccqgcc 18 0.01 ljzcc 11 0.02 zcccqjc 3 0.02 cccqg 17 0.01 zcclj 9 0.01 cqjcc 2 0.01 zccqgcc 14 0.01 ccclj 8 0.01 zcccqj 2 0.01 zcccqgc 10 0.01 zccljcc 7 0.01 zccqj 2 0.01 qgcccc 10 0.01 cccljcc 7 0.01 cqjc 2 0.01 cqg 9 0.01 cljcc 5 0.01 zccqjcc 2 0.01 ccqgzccc 9 0.01 cljc 5 0.01 qjzc 1 0.01 zqgcc 9 0.01 cccclj 5 0.01 ccqj 1 0.01 zccqgccc 6 0.00 zcccljcc 4 0.01 cccqj 1 0.01 zccqg 6 0.00 cclj 3 0.00 zcqj 1 0.01 ccqgccc 5 0.00 zccclj 3 0.00 qcqjc 1 0.01 cccqgccc 4 0.00 ljzc 3 0.00 ccccqj 1 0.01 ccccqgcc 4 0.00 ljcccc 2 0.00 zcccqjcc ----- ---- ---- 4 0.00 ccccljcc 2 0.00 qjccz 154 1.00 TOT 2 0.00 zclj 2 0.00 qcqj 2 0.00 qcljcc 2 0.00 cccqjcc 2 0.00 ljczcc 2 0.00 ccccqjcc 2 0.00 ljczc 1 0.00 zcqjcc 2 0.00 ccljcc 1 0.00 zccccqjcc 2 0.00 ccljc 1 0.00 qjczc 1 0.00 zzcljcc 1 0.00 qjcz 1 0.00 zzcclj 1 0.00 qjcccc 1 0.00 zzcccljc 1 0.00 qcqjccc 1 0.00 zljcc 1 0.00 qcqjcc 1 0.00 zlj 1 0.00 cqjccz 1 0.00 zcljc 1 0.00 cqj 1 0.00 zccljccc 1 0.00 ccqjc 1 0.00 qlj ----- ---- ---- 1 0.00 ljcz 700 1.00 TOT 1 0.00 ljccz 1 0.00 ljccccljc 1 0.00 clj 1 0.00 cccljccc ----- ---- ---- 1565 1.00 TOT These statistics confirm the identification of the \l/ and \q/ in gallows. The questions to decide now are whether \Hcc/ and \zcc/ are letters or composites \H/+\cc/ and \z/+\cc/. cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]/H/' \ -e 's/cs/z/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/y/i/g' \ -e 's/ci/a/g' \ -e 's/cg/8/g' \ > .bar foreach f ( z H ) cat .bar \ | enum-contexts -vPAT='[^czH]'"${f}"'[^czH]' \ | sed -e 's/.$//g' \ | wfreq \ > .${f}.L cat .bar \ | enum-contexts -vPAT='[^czH]'"${f}cc"'[^czH]' \ | sed -e 's/.$//g' \ | wfreq \ > .${f}cc.L cat .bar \ | enum-contexts -vPAT='[^czH]'"cc"'[^czH]' \ | sed -e 's/^.//g' \ | wfreq \ > .cc.R cat .bar \ | enum-contexts -vPAT='[^czH]'"${f}cc"'[^czH]' \ | sed -e 's/^.//g' \ | wfreq \ > .${f}cc.R pr -m -s' ' -t -i' '1 -w 96 .${f}cc.L .${f}.L .${f}cc.R .cc.R \ | expand end 326 0.74 _zcc 136 0.90 _z 296 0.67 zcc8 54 0.39 cc8 65 0.15 ezcc 6 0.04 ez 108 0.25 zcca 47 0.34 cca 22 0.05 8zcc 6 0.04 az 35 0.08 zcco 23 0.17 cco 14 0.03 azcc 2 0.01 oz ----- ---- ---- 11 0.08 cce 9 0.02 rzcc 1 0.01 qz 439 1.00 TOT 2 0.01 ccr 3 0.01 ozcc ----- ---- ---- 1 0.01 cc_ ----- ---- ---- 151 1.00 TOT ----- ---- ---- 439 1.00 TOT 138 1.00 TOT 380 0.76 oHcc 668 0.79 oH 319 0.64 Hcc8 54 0.39 cc8 61 0.12 eHcc 87 0.10 _H 169 0.34 Hcca 47 0.34 cca 33 0.07 _Hcc 68 0.08 eH 13 0.03 Hcco 23 0.17 cco 26 0.05 aHcc 18 0.02 aH ----- ---- ---- 11 0.08 cce 1 0.00 8Hcc 1 0.00 qH 501 1.00 TOT 2 0.01 ccr ----- ---- ---- ----- ---- ---- 1 0.01 cc_ 501 1.00 TOT 842 1.00 TOT ----- ---- ---- 138 1.00 TOT From these numbers, it seems plausible that `Hcc' and `zcc' are composites. Collecting again all [czH] patterns, splitting H and P: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql]j/H/' \ -e 's/[ql]g/P/' \ -e 's/cs/z/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/y/i/g' \ -e 's/ci/a/g' \ -e 's/cg/8/g' \ | enum-contexts -vPAT='[czHP][czHP]*' -vCTX=0 \ | wfreq 795 0.20 H 493 0.13 Hcc 484 0.12 ccc 482 0.12 Hc 439 0.11 zcc 152 0.04 z 138 0.04 cc 87 0.02 zccc 70 0.02 zc 70 0.02 cccc 65 0.02 cccHc 63 0.02 Pccc 60 0.02 Hccc 52 0.01 P 41 0.01 zccHc 41 0.01 ccccHc 37 0.01 zcccHc 32 0.01 Hzcc 24 0.01 zccH 20 0.01 cHcc 19 0.00 cHc 18 0.00 cccH 15 0.00 zccHcc 13 0.00 zcccH 12 0.00 ccccH 12 0.00 cccHcc 11 0.00 cccz 11 0.00 ccH 9 0.00 c 9 0.00 Pcc 9 0.00 Hzc 8 0.00 zcccHcc 8 0.00 Pzcc 6 0.00 zzcc 6 0.00 ccccHcc 6 0.00 Hcccc 5 0.00 zcH 5 0.00 ccccz 4 0.00 zcccz 4 0.00 cccPcc 4 0.00 cPcc 4 0.00 cPc 4 0.00 cH 4 0.00 Pzc 3 0.00 cccP 3 0.00 ccHc 3 0.00 Hczc 3 0.00 Hccz 2 0.00 zcccPc 2 0.00 zccPcc 2 0.00 ccz 2 0.00 ccPzccc 2 0.00 ccHcc 2 0.00 cPccc 2 0.00 cP 2 0.00 Pcccc 2 0.00 Hczcc 2 0.00 Hcz 1 0.00 zzcccHc 1 0.00 zzccH 1 0.00 zzcHcc 1 0.00 zcz 1 0.00 zccz 1 0.00 zccccHcc 1 0.00 zcccc 1 0.00 zccPccc 1 0.00 zccP 1 0.00 zccHccc 1 0.00 zcHcc 1 0.00 zcHc 1 0.00 zPcc 1 0.00 zHcc 1 0.00 zH 1 0.00 ccccc 1 0.00 ccccPcc 1 0.00 cccPccc 1 0.00 cccHccc 1 0.00 ccPccc 1 0.00 ccP 1 0.00 cHccz 1 0.00 cHccc ----- ---- ---- 3906 1.00 TOT Based on the analysis above, it seems that, after colapsing \ci/, \cy/, \cg/, \cs/, and the tall characters \[lq]j/ to `H' \[lq]g/ to `P', we can parse most strings consisting of `c', `z', `H', and `P' into the following "letters" (the frequencies are for isolated occurences only): freq letter code ---- -------- ---- 795 H E 52 P P 152 z Z 138 cc M 70 zc R 482 Hc I 484 ccc O 439 zcc A 493 Hcc U 19 cHc X 4 cPc Y 20 cHcc K 4 cPcc G Here is my best guess for the parsing of those strings: freq string best parsing alt parsings ---- ---- ---------- ------------ -------------------- 795 0.20 H (H) 493 0.13 Hcc (Hcc) 484 0.12 ccc (ccc) 482 0.12 Hc (Hc) 439 0.11 zcc (zcc) 152 0.04 z (z) 138 0.04 cc (cc) 87 0.02 zccc (z)(ccc) (zc)(cc) 70 0.02 zc (zc) 70 0.02 cccc (cc)(cc) 65 0.02 cccHc (ccc)(Hc) (cc)(cHc) 63 0.02 Pccc (P)(ccc) 60 0.02 Hccc (H)(ccc) (Hc)(cc) 52 0.01 P 41 0.01 zccHc (zcc)(Hc) (zc)(cHc) 41 0.01 ccccHc (ccc)(cHc) (cc)(cc)(Hc) 37 0.01 zcccHc (zcc)(cHc) (zc)(cc)(Hc) 32 0.01 Hzcc (H)(zcc) (H)(z)(cc) 24 0.01 zccH (zcc)(H) 20 0.01 cHcc (cHcc) 19 0.00 cHc (cHc) 18 0.00 cccH (ccc)(H) 15 0.00 zccHcc (zcc)(Hcc) 13 0.00 zcccH (z)(ccc)(H) (zc)(cc)(H) 12 0.00 ccccH (cc)(cc)(H) 12 0.00 cccHcc (ccc)(Hcc) (cc)(cHcc) 11 0.00 cccz (ccc)(z) 11 0.00 ccH (cc)(H) 9 0.00 c 9 0.00 Pcc (P)(cc) 9 0.00 Hzc (H)(zc) 8 0.00 zcccHcc (zcc)(cHcc) (zc)(cc)(Hcc), (z)(cc)(cHcc) 8 0.00 Pzcc (P)(zcc) 6 0.00 zzcc (z)(zcc) 6 0.00 ccccHcc (ccc)(cHcc) (cc)(cc)(Hcc) 6 0.00 Hcccc (Hcc)(cc) (Hc)(ccc) 5 0.00 zcH (zc)(H) 5 0.00 ccccz (cc)(cc)(z) 4 0.00 zcccz (zc)(cc)(z) (z)(ccc)(z) 4 0.00 cccPcc (ccc)(P)(cc) (cc)(cPcc) 4 0.00 cPcc (cPcc) 4 0.00 cPc (cPc) 4 0.00 cH 4 0.00 Pzc (P)(zc) 3 0.00 cccP (ccc)(P) 3 0.00 ccHc (cc)(Hc) 3 0.00 Hczc (Hc)(zc) 3 0.00 Hccz (H)(cc)(z) 2 0.00 zcccPc (zcc)(cPc) 2 0.00 zccPcc (zcc)(Pcc) (z)(cc)(P)(cc) 2 0.00 ccz (cc)(z) 2 0.00 ccPzccc (cc)(P)(zc)(cc) (cc)(P)(z)(ccc) 2 0.00 ccHcc (cc)(Hcc) 2 0.00 cPccc (cPc)(cc) 2 0.00 cP 2 0.00 Pcccc (Pc)(ccc) (P)(cc)(cc) 2 0.00 Hczcc (Hc)(zcc) 2 0.00 Hcz (Hc)(z) 1 0.00 zzcccHc (z)(zcc)(cHc) (z)(zc)(cc)(Hc) 1 0.00 zzccH (z)(zcc)(H) 1 0.00 zzcHcc (z)(z)(cHcc) (z)(zc)(H)(cc) 1 0.00 zcz (zc)(z) 1 0.00 zccz (zcc)(z) 1 0.00 zccccHcc (zc)(cc)(cHcc) (z)(cc)(cc)(H)(cc) 1 0.00 zcccc (zc)(ccc) (z)(cc)(cc) 1 0.00 zccPccc (zc)(cPc)(cc) (z)(cc)(P)(ccc), (z)(cc)(Pc)(cc) 1 0.00 zccP (zcc)(P) 1 0.00 zccHccc (zc)(cHc)(cc) (zcc)(H)(ccc), (zcc)(Hc)(cc) 1 0.00 zcHcc (zc)(Hcc) (z)(cHcc) 1 0.00 zcHc (zc)(Hc) (z)(cHc) 1 0.00 zPcc (z)(Pcc) 1 0.00 zHcc (z)(Hcc) 1 0.00 zH (z)(H) 1 0.00 ccccc (cc)(ccc) (ccc)(cc) 1 0.00 ccccPcc (cc)(cc)(P)(cc) (ccc)(cPcc) 1 0.00 cccPccc (cc)(cpc)(cc) (ccc)(P)(ccc) 1 0.00 cccHccc (cc)(cHc)(cc) (ccc)(H)(ccc) 1 0.00 ccPccc (cc)(P)(ccc) 1 0.00 ccP (cc)(P) 1 0.00 cHccz (cHcc)(z) 1 0.00 cHccc (cHc)(cc) This still doesn't look quite right.... Let's try it anyway. jsa2hip ------------------------------------------------- #! /n/gnu/bin/sed -f # Recoding superanalytic to "hip" encoding: /^[^#]/s/ij/k/g /^[^#]/s/ix/e/g /^[^#]/s/is/r/g /^[^#]/s/iiu/n/g /^[^#]/s/y/i/g /^[^#]/s/ci/a/g /^[^#]/s/cg/8/g /^[^#]/s/cs/z/g /^[^#]/s/iin/m/g /^[^#]/s/in/m/g /^[^#]/s/ir/v/g /^[^#]/s/qj/E/g /^[^#]/s/qg/P/g /^[^#]/s/lj/E/g /^[^#]/s/lg/P/g # Parsing of [czPE] strings: /^[^#]/s/[zcEP][zcEP][zcEP][zcEP][zcEP][zcEP][zcEP][zcEP]*/@/g /^[^#]/s/zccEcc/AU/g /^[^#]/s/ccccEc/OX/g /^[^#]/s/zcccEc/AX/g /^[^#]/s/ccccE/MME/g /^[^#]/s/zccEc/AI/g /^[^#]/s/cccEc/OI/g /^[^#]/s/zccc/RM/g /^[^#]/s/cccc/MM/g /^[^#]/s/Pccc/PO/g /^[^#]/s/Eccc/EO/g /^[^#]/s/Ezcc/EA/g /^[^#]/s/zccE/AE/g /^[^#]/s/cEcc/K/g /^[^#]/s/cccE/OE/g /^[^#]/s/cccz/OZ/g /^[^#]/s/Ecc/U/g /^[^#]/s/ccc/O/g /^[^#]/s/zcc/A/g /^[^#]/s/cEc/X/g /^[^#]/s/ccE/ME/g /^[^#]/s/Ec/I/g /^[^#]/s/cc/M/g /^[^#]/s/zc/R/g /^[^#]/s/E/E/g /^[^#]/s/z/Z/g /^[^#]/s/P/P/g ------------------------------------------------- extract-words-from-interlin \ -recode jsa2hip \ -chars "qoa8HPZAEIOUMXRKermnkvc@" \ bio-j-jsa.evt \ bio-j-hip lines words bytes file ------ ------- --------- ------------ 7054 7054 36231 bio-j-hip.wds 1967 1967 13458 bio-j-hip.dic 4658 4658 22234 bio-j-hip-gut.wds 862 862 4575 bio-j-hip-gut.dic 843 843 2464 bio-j-hip-fun.wds 5 5 24 bio-j-hip-fun.dic 1553 1553 11533 bio-j-hip-bad.wds 1100 1100 8859 bio-j-hip-bad.dic Digraph counts (edited): q o a 8 R M A O P E Z I U e r m n k v X K c @ TOT ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- . 1239 965 161 363 129 149 440 436 65 91 149 32 29 282 79 . . . . 7 8 16 18 4658 q . . 1227 . . . . . . . . . . . . . . . . . . . . . 1247 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- o 21 . . . 18 . 9 . . 61 714 . 394 380 893 178 9 . . . 7 10 . . 2727 a 2794 . . . 11 . . 14 9 . 23 . 19 26 411 275 333 109 33 19 . . . . 4104 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- R . . 26 22 31 . 107 . . . . . . . . . . . . . . . . . 199 M . . 32 132 142 . 95 . . . 36 11 . . 11 . . . . . . . . . 468 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- A . . 40 125 322 . . . . . 25 . 41 15 . . . . . . 37 . . . 609 O . . 40 167 404 . . . . 7 18 11 77 . . . . . . . 41 . . . 765 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- e 825 . 105 114 53 36 49 71 154 10 76 7 43 61 . . . . . . . . . . 1614 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- P . . 37 17 . . 22 8 66 . . . . . . . . . . . . . . . 165 8 50 . 37 1948 . 9 18 22 16 . . . . . . . . . . . . . . . 2113 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- E 10 . 75 795 . 9 . 32 61 . . . . . . . . . . . . . . . 996 r 401 . 36 64 . . . 9 16 . . . . . . . . . . . . . . . 539 Z 42 . 63 73 . . . 7 . . . . . . . . . . . . . . . . 196 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- I . . 20 173 391 . . . . . . . . . . . . . . . . . 12 . 608 U . . 10 179 321 . . . . . . . . . . . . . . . . . . . 513 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- m 339 . . . . . . . . . . . . . . . . . . . . . . . 342 n 114 . . . . . . . . . . . . . . . . . . . . . . . 115 k 36 . . . . . . . . . . . . . . . . . . . . . . . 37 v 19 . . . . . . . . . . . . . . . . . . . . . . . 20 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- X . . . 86 10 . . . . . . . . . . . . . . . . . . . 101 K . . . 14 10 . . . . . . . . . . . . . . . . . . . 26 c . . . 11 12 . . . . 12 . . . . . . . . . . . . . . 48 @ . . . 11 13 . . . . . . . . . . . . . . . . . . . 24 ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- TOT 4658 1247 2727 4104 2113 199 468 609 765 165 996 196 608 513 1614 539 342 115 37 20 101 26 48 24 22234 There are some nice things in this table: \ccc/ = `O' and \cscc/ = `A' come out similar, and same for \cc/ = `M' and \csc/ = `R'. There are some surprises, such as the similarity between \qg,lg/ = `P' and \cg/ = `8'; or between \lj,qj/ = `E', \cs/ = `Z', and \is/ = `r'. The slight differences between members of the same class may be telling us something, too. \cc/ and \csc/ are similar, but only \cc/ is followed by \lj/, \qj/, \cs/, or \ix/ \ccc/ and \cscc/ are similar, but only \ccc/ is followed by \lg/, \qg/, or \cs/ only \cscc/ is followed by \ljcc/ or \qjcc/ \ljc/ and \ljcc/ are similar, but only \ljc/ is followed by unparsed \c/ only \ljcc/ can be preceded by \cscc/ Also, \cgci/ is probably a letter; indeed \cg/ is followed by \ci/ 91% of the time, although \ci/ occurs in other contexts too. What can we conclude from these bits, really?