Again, ignoring repeated words: cat bio-j-jsa-gut.dic \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]//' \ -e 's/cs/c/g' \ -e 's/ci/a/g' \ -e 's/cy/a/g' \ -e 's/cg/8/g' \ -e 's/c\([^ca8o]\)/C\1/g' \ | compare-contexts -lctx 1 -rctx 0 -colw 20 \ 'c*C' \ 'c*8' \ 'c*a' \ 'c*o' 5 0.14 xC 79 0.19 _8 331 0.35 8a 258 0.41 _o 5 0.14 aC 37 0.09 _ccc8 108 0.11 _a 163 0.26 qo 3 0.08 xccccC 34 0.08 x8 79 0.08 oa 56 0.09 xo 3 0.08 oC 30 0.07 xccc8 75 0.08 xa 31 0.05 _cco 3 0.08 _cccC 27 0.06 xcc8 39 0.04 sa 30 0.05 _co 3 0.08 _ccC 26 0.06 occc8 38 0.04 _ccca 19 0.03 so 2 0.05 xcC 23 0.05 oc8 37 0.04 _cca 19 0.03 _ccco 2 0.05 occcC 21 0.05 occ8 32 0.03 _ca 18 0.03 8o 2 0.05 _ccccC 21 0.05 _cc8 23 0.02 _cccca 7 0.01 xccco 1 0.03 xccC 18 0.04 o8 22 0.02 xccca 5 0.01 8cco 1 0.03 scccC 14 0.03 _cccc8 21 0.02 occca 4 0.01 xcco 1 0.03 sccC 10 0.02 a8 18 0.02 xcca 4 0.01 _cccco 1 0.03 occC 9 0.02 _ccccc 16 0.02 occa 3 0.00 jo 1 0.03 ocC 9 0.02 _ccccc 14 0.01 _ccccc 3 0.00 ao 1 0.03 accccC 8 0.02 xc8 10 0.01 xcccca 2 0.00 xco 1 0.03 _cC 7 0.02 xcccc8 8 0.01 xca 2 0.00 accco 1 0.03 _C 7 0.02 _c8 8 0.01 sccca 2 0.00 _ccccc 1 0.03 8ccccC 6 0.01 accc8 7 0.01 oca 2 0.00 8ccco ----- ---- ---- 6 0.01 acc8 7 0.01 ja 1 0.00 xcccco 37 1.00 TOT 4 0.01 sccc8 7 0.01 8cca 1 0.00 uo 4 0.01 qcc8 6 0.01 _ccccc 1 0.00 jccco 4 0.01 ac8 4 0.00 qca 1 0.00 8cccco 3 0.01 scccc8 3 0.00 xccccc ----- ---- ---- 2 0.00 xccccc 3 0.00 occcca 632 1.00 TOT 2 0.00 s8 3 0.00 jca 2 0.00 qccc8 3 0.00 8ccca 2 0.00 occcc8 2 0.00 xccccc 2 0.00 acccc8 2 0.00 ua 1 0.00 xccccc 2 0.00 scca 1 0.00 scc8 2 0.00 qcca 1 0.00 qcccc8 2 0.00 qa 1 0.00 jcc8 2 0.00 jcca 1 0.00 jc8 2 0.00 ga 1 0.00 j8 2 0.00 8cccca ----- ---- ---- 1 0.00 scccca 423 1.00 TOT 1 0.00 sca 1 0.00 qccca 1 0.00 _ccccc 1 0.00 8ca ----- ---- ---- 943 1.00 TOT These data suggest that, ignoring gallows and \s/-plumes, the \c/ strings always end in \ci/, \cy/, \cg/ or \o/. Let's look again at the \c/ strings and their relationship to \s/-plumes and gallows. For simplicity, let's map all gallows to `H', and \cs/ to `z'; for consistency, let's map \cy/ and \ci/ to `a', \cg/ to `8'. cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]/H/' \ -e 's/cs/z/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/y/i/g' \ -e 's/ci/a/g' \ -e 's/cg/8/g' \ | compare-contexts -lctx 0 -rctx 0 -colw 24\ '[czH][czH]*a' \ '[czH][czH]*o' \ '[czH][czH]*8' \ '[czH][czH]*[^czHao8]' 718 0.40 Ha 106 0.30 Ho 379 0.23 Hc8 18 0.20 z_ 169 0.09 Hcca 63 0.18 zo 319 0.19 Hcc8 11 0.12 cce 133 0.07 ccca 35 0.10 zcco 317 0.19 ccc8 9 0.10 cccz_ 108 0.06 zcca 34 0.10 ccco 296 0.18 zcc8 7 0.08 H_ 79 0.04 Hca 24 0.07 zco 84 0.05 Hccc8 6 0.07 He 70 0.04 za 23 0.07 cco 54 0.03 cc8 5 0.05 ccccz_ 54 0.03 cccHca 19 0.05 Hco 51 0.03 zccc8 4 0.04 zce 48 0.03 cca 13 0.04 Hcco 28 0.02 cccc8 4 0.04 zcccz_ 40 0.02 ccccHca 6 0.02 Hccco 22 0.01 Hzcc8 3 0.03 zccH_ 39 0.02 zccHca 4 0.01 cHco 21 0.01 zc8 3 0.03 Hcn 39 0.02 cccca 4 0.01 Hzcco 10 0.01 cccHcc8 2 0.02 zc_ 37 0.02 zcccHca 3 0.01 zccco 9 0.01 cccHc8 2 0.02 ccz_ 33 0.02 zccca 3 0.01 cccco 9 0.01 cHcc8 2 0.02 ccr 33 0.02 Hccca 1 0.00 zzcco 9 0.01 Hzc8 2 0.02 cccH_ 21 0.01 zccHa 1 0.00 zccHo 6 0.00 zcccHcc8 1 0.01 ze 18 0.01 zca 1 0.00 zcHo 6 0.00 zccHcc8 1 0.01 zcr 18 0.01 cccHa 1 0.00 zcHco 6 0.00 cHc8 1 0.01 zccz_ 14 0.01 cHcca 1 0.00 czco 5 0.00 H8 1 0.01 cq 14 0.01 Hzcca 1 0.00 ccccco 4 0.00 zzcc8 1 0.01 cn 13 0.01 zcccHa 1 0.00 ccccHco 4 0.00 c8 1 0.01 cl 12 0.01 ccccHa 1 0.00 cccHo 3 0.00 ccccHcc8 1 0.01 cccHc_ 12 0.01 cHca 1 0.00 cccHco 3 0.00 ccHc8 1 0.01 cc_ 11 0.01 zccHcca 1 0.00 ccHo 2 0.00 zcccHc8 1 0.01 cHccz_ 11 0.01 ccHa 1 0.00 cHcco 2 0.00 zccHc8 1 0.01 Hcz_ 6 0.00 cccHcca 1 0.00 Hzco 2 0.00 cccHccc8 1 0.01 Hcr 6 0.00 cHa ----- ---- ---- 2 0.00 cHccc8 1 0.01 Hccz_ 5 0.00 ca 349 1.00 TOT 2 0.00 Hcccc8 1 0.01 Hccccl 5 0.00 Hcccca 1 0.00 zzcHcc8 ----- ---- ---- 4 0.00 zcHa 1 0.00 zcz8 91 1.00 TOT 4 0.00 ccccHcca 1 0.00 zccHccc8 3 0.00 Hzca 1 0.00 ccHzccc8 2 0.00 zcccHcca 1 0.00 ccHccc8 2 0.00 zHcca 1 0.00 ccHcc8 2 0.00 cccza 1 0.00 Hczc8 2 0.00 Hczcca 1 0.00 Hcz8 1 0.00 zzcccHca 1 0.00 Hccz8 1 0.00 zzcca ----- ---- ---- 1 0.00 zzccHa 1664 1.00 TOT 1 0.00 zcccca 1 0.00 zccccHcca 1 0.00 zccHccca 1 0.00 zcHcca 1 0.00 zHa 1 0.00 ccHzccca 1 0.00 ccHcca 1 0.00 cHccca 1 0.00 Hczca 1 0.00 Hccza ----- ---- ---- 1798 1.00 TOT From these table it seems (again) that \ci/ and \o/ are equivalent; and that \cg/ is similar, but not as much. Also, virtually all \c/ strings end with these three letters; only a very few end with \ix/. Below I have marked with `#/*', `@/&', `+', and `-' the contexts with at least one frequency greater than 0.20, 0.08, 0.04, and 0.02, respectively: # 718 0.40 Ha # 106 0.30 Ho * 379 0.23 Hc8 @ 169 0.09 Hcca & 63 0.18 zo @ 319 0.19 Hcc8 & 133 0.07 ccca & 35 0.10 zcco & 317 0.19 ccc8 & 108 0.06 zcca & 34 0.10 ccco & 296 0.18 zcc8 * 79 0.04 Hca + 24 0.07 zco + 84 0.05 Hccc8 & 70 0.04 za + 23 0.07 cco + 54 0.03 cc8 - 54 0.03 cccHca * 19 0.05 Hco - 51 0.03 zccc8 + 48 0.03 cca @ 13 0.04 Hcco 28 0.02 cccc8 40 0.02 ccccHca + 6 0.02 Hccco 22 0.01 Hzcc8 39 0.02 zccHca 4 0.01 cHco + 21 0.01 zc8 39 0.02 cccca 4 0.01 Hzcco 10 0.01 cccHcc8 37 0.02 zcccHca - 3 0.01 zccco - 9 0.01 cccHc8 - 33 0.02 zccca 3 0.01 cccco 9 0.01 cHcc8 + 33 0.02 Hccca 1 0.00 zzcco 9 0.01 Hzc8 21 0.01 zccHa 1 0.00 zccHo 6 0.00 zcccHcc8 + 18 0.01 zca 1 0.00 zcHo 6 0.00 zccHcc8 18 0.01 cccHa 1 0.00 zcHco 6 0.00 cHc8 14 0.01 cHcca 1 0.00 czco # 5 0.00 H8 14 0.01 Hzcca 1 0.00 ccccco 4 0.00 zzcc8 13 0.01 zcccHa 1 0.00 ccccHco 4 0.00 c8 12 0.01 ccccHa 1 0.00 cccHo 3 0.00 ccccHcc8 12 0.01 cHca - 1 0.00 cccHco 3 0.00 ccHc8 11 0.01 zccHcca 1 0.00 ccHo 2 0.00 zcccHc8 11 0.01 ccHa 1 0.00 cHcco 2 0.00 zccHc8 Checking again, without repeated words: cat bio-j-jsa-gut.dic \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]/H/' \ -e 's/cs/z/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/y/i/g' \ -e 's/ci/a/g' \ -e 's/cg/8/g' \ | compare-contexts -lctx 0 -rctx 0 -colw 24\ '[czH][czH]*a' \ '[czH][czH]*o' \ '[czH][czH]*8' \ '[czH][czH]*[^czHao8]' 130 0.31 Ha 59 0.35 Ho 40 0.14 Hc8 12 0.18 z_ 28 0.07 cca 21 0.12 zo 37 0.13 ccc8 8 0.12 cce 24 0.06 ccca 12 0.07 zco 33 0.12 Hcc8 6 0.09 H_ 23 0.06 Hcca 11 0.06 Hco 23 0.08 zcc8 5 0.08 He 19 0.05 zcca 11 0.06 Hcco 23 0.08 cc8 4 0.06 zcccz_ 15 0.04 za 10 0.06 zcco 23 0.08 Hccc8 3 0.05 zce 15 0.04 Hccca 10 0.06 cco 12 0.04 zc8 3 0.05 ccccz_ 15 0.04 Hca 10 0.06 ccco 11 0.04 zccc8 2 0.03 zc_ 11 0.03 cHca 4 0.02 cHco 11 0.04 Hzcc8 2 0.03 ccz_ 10 0.02 ccHa 4 0.02 Hzcco 9 0.03 cccc8 2 0.03 ccr 9 0.02 cHcca 3 0.02 cccco 7 0.02 Hzc8 2 0.03 cccz_ 8 0.02 zccHa 2 0.01 Hccco 6 0.02 cHcc8 2 0.03 cccH_ 8 0.02 Hzcca 1 0.01 zzcco 5 0.02 cccHcc8 1 0.02 ze 7 0.02 cccca 1 0.01 zccco 5 0.02 cHc8 1 0.02 zcr 6 0.01 zca 1 0.01 zccHo 5 0.02 H8 1 0.02 zccz_ 6 0.01 cccHa 1 0.01 zcHo 4 0.01 zcccHcc8 1 0.02 zccH_ 6 0.01 cHa 1 0.01 zcHco 3 0.01 ccccHcc8 1 0.02 cq 5 0.01 zccca 1 0.01 czco 3 0.01 cccHc8 1 0.02 cn 5 0.01 zcccHca 1 0.01 ccccco 3 0.01 c8 1 0.02 cl 5 0.01 ccccHca 1 0.01 ccccHco 2 0.01 zccHcc8 1 0.02 cccHc_ 5 0.01 cccHca 1 0.01 cccHo 2 0.01 cccHccc8 1 0.02 cc_ 5 0.01 ca 1 0.01 cccHco 2 0.01 ccHc8 1 0.02 cHccz_ 4 0.01 zccHcca 1 0.01 ccHo 2 0.01 cHccc8 1 0.02 Hcz_ 4 0.01 zccHca 1 0.01 cHcco 1 0.00 zzcc8 1 0.02 Hcr 4 0.01 ccccHcca 1 0.01 Hzco 1 0.00 zzcHcc8 1 0.02 Hcn 4 0.01 Hcccca ----- ---- ---- 1 0.00 zcz8 1 0.02 Hccz_ 3 0.01 zcccHa 170 1.00 TOT 1 0.00 zcccHc8 1 0.02 Hccccl 3 0.01 zcHa 1 0.00 zccHccc8 ----- ---- ---- 3 0.01 Hzca 1 0.00 zccHc8 66 1.00 TOT 2 0.00 zHcca 1 0.00 ccHzccc8 2 0.00 cccza 1 0.00 ccHccc8 2 0.00 ccccHa 1 0.00 ccHcc8 2 0.00 cccHcca 1 0.00 Hczc8 2 0.00 Hczcca 1 0.00 Hcz8 1 0.00 zzcccHca 1 0.00 Hccz8 1 0.00 zzcca 1 0.00 Hcccc8 1 0.00 zzccHa ----- ---- ---- 1 0.00 zcccca 284 1.00 TOT 1 0.00 zccccHcca 1 0.00 zcccHcca 1 0.00 zccHccca 1 0.00 zcHcca 1 0.00 zHa 1 0.00 ccHzccca 1 0.00 ccHcca 1 0.00 cHccca 1 0.00 Hczca 1 0.00 Hccza ----- ---- ---- 414 1.00 TOT Here is a summary of the most common \czH/ contexts. The numbers are the percentages from the tables above. Due to an earlier mix-up, all percentages except those in the `H' row were computed relative to the totals minus the `H' entry. in .wds in .dic ---------- ---------- ci cb cg ci cb cg -- -- -- -- -- -- c 0 0 0 2 0 1 z 6 26 0 5 19 0 zc 2 10 1 2 11 4 H 40 30 0 31 35 2 Hc 7 8 23 5 10 14 Hcc 16 5 19 8 10 12 cc 4 9 3 10 9 8 ccc 12 14 19 8 9 13 zcc 10 14 18 7 9 8 cccc 4 1 2 2 3 3 zccc 3 1 3 2 1 4 Hccc 3 2 5 5 2 8 cccHc 5 0 1 2 1 1 ccccHc 4 0 0 2 1 0 zccHc 4 0 0 1 0 0 zcccHc 3 0 0 3 0 0 It seems that \c/ alone is not a letter \z/, \zc/ are similar letters (most common before \o/) \cc/, \ccc/, \zcc/ are similar letters (equally common before \a/, \o/, \8/). \H/ is a common letter (most common before \a/ and \o/) \Hc/, \Hcc/ are similar letters (most common before \o/ and \8/). The latter does not seem to be a group \H/-\cc/. \Hccc/ may be a letter (most common before \a/ and \o/) but may be a group \Hc/-\cc/ or \H/-\ccc/ \cccc/, \zccc/ may be rare letters (equally common before \a/, \o/, \8/). but may also be groups \cc/-\cc/ and \zc/-\cc/ Let's enumerate these various patterns: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]/H/' \ -e 's/cs/z/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/y/i/g' \ -e 's/ci/a/g' \ -e 's/cg/8/g' \ | compare-contexts -lctx 0 -rctx 0 -colw 17 \ '[^czH]z[^czH]' \ '[^czH]zc[^czH]' \ '[^czH]cc[^czH]' \ '[^czH]ccc[^czH]' \ '[^czH]zcc[^czH]' \ '[^czH]H[^czH]' \ '[^czH]Hc[^czH]' \ '[^czH]Hcc[^czH]' 66 0.44 _za 19 0.27 _zco 22 0.16 _cc8 192 0.40 _ccc8 219 0.50 _zcc8 63 0.42 _zo 16 0.23 _zca 19 0.14 _cca 97 0.20 eccc8 77 0.18 _zcca 6 0.04 _z_ 13 0.19 ezc8 17 0.12 _cco 78 0.16 _ccca 43 0.10 ezcc8 5 0.03 ez_ 6 0.09 _zc8 15 0.11 ecca 44 0.09 eccca 30 0.07 _zcco 5 0.03 az_ 3 0.04 _zce 15 0.11 ecc8 25 0.05 _ccco 19 0.04 8zcc8 1 0.01 qza 3 0.04 8zco 9 0.07 _cce 11 0.02 8ccc8 18 0.04 ezcca 1 0.01 oza 2 0.03 ezco 7 0.05 occ8 7 0.01 rccca 8 0.02 azcc8 1 0.01 oz_ 2 0.03 ezca 6 0.04 8cca 6 0.01 rccc8 6 0.01 rzcc8 1 0.01 eza 1 0.01 rzc8 5 0.04 8cc8 6 0.01 accc8 6 0.01 azcca 1 0.01 aza 1 0.01 ezce 3 0.02 qcc8 5 0.01 eccco 4 0.01 ezcco 1 0.01 _ze 1 0.01 ezc_ 3 0.02 ecco 4 0.01 occc8 3 0.01 rzcca ---- ---- ---- 1 0.01 _zcr 3 0.02 8cco 3 0.01 8ccco 2 0.00 ozcca 151 1.00 TOT 1 0.01 _zc_ 2 0.01 rcca 2 0.00 occca 2 0.00 8zcca 1 0.01 8zc8 2 0.01 occa 1 0.00 qccc8 1 0.00 ozcc8 ---- ---- ---- 2 0.01 jcca 1 0.00 accco 1 0.00 8zcco 70 1.00 TOT 2 0.01 ecce 1 0.00 accca ---- ---- ---- 1 0.01 rccr 1 0.00 8ccca 439 1.00 TOT 1 0.01 qcca ---- ---- ---- 1 0.01 jcc8 484 1.00 TOT 1 0.01 ecc_ 1 0.01 acc8 1 0.01 _ccr ---- ---- ---- 138 1.00 TOT 604 0.72 oHa 305 0.63 oHc8 254 0.51 oHcc8 61 0.07 eHa 66 0.14 oHca 120 0.24 oHcca 51 0.06 oHo 31 0.06 eHc8 31 0.06 eHcca 49 0.06 _Ho 27 0.06 _Hc8 29 0.06 eHcc8 35 0.04 _Ha 14 0.03 oHco 20 0.04 _Hcc8 18 0.02 aHa 14 0.03 aHc8 16 0.03 aHcc8 6 0.01 oH_ 7 0.01 eHca 10 0.02 aHcca 5 0.01 eHo 4 0.01 aHca 7 0.01 _Hcca 4 0.00 oHe 3 0.01 oHcn 6 0.01 oHcco 3 0.00 oH8 3 0.01 _Hco 6 0.01 _Hcco 2 0.00 eHe 2 0.00 eHco 1 0.00 eHcco 2 0.00 _H8 2 0.00 _Hca 1 0.00 8Hcca 1 0.00 qHo 2 0.00 8Hc8 ---- ---- ---- 1 0.00 _H_ 1 0.00 oHcr 501 1.00 TOT ---- ---- ---- ---- ---- ---- 842 1.00 TOT 481 1.00 TOT Ditto, distinct words only: cat bio-j-jsa-gut.dic \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]/H/' \ -e 's/cs/z/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/y/i/g' \ -e 's/ci/a/g' \ -e 's/cg/8/g' \ | compare-contexts -lctx 0 -rctx 0 -colw 17 \ '[^czH]z[^czH]' \ '[^czH]zc[^czH]' \ '[^czH]cc[^czH]' \ '[^czH]ccc[^czH]' \ '[^czH]zcc[^czH]' \ '[^czH]H[^czH]' \ '[^czH]Hc[^czH]' \ '[^czH]Hcc[^czH]' 21 0.44 _zo 8 0.22 ezc8 10 0.14 ecc8 15 0.21 eccc8 8 0.15 ezcc8 11 0.23 _za 7 0.19 _zco 8 0.11 _cca 11 0.15 eccca 8 0.15 _zcc8 5 0.10 az_ 4 0.11 _zca 7 0.10 ecca 10 0.14 _ccc8 6 0.12 ezcca 4 0.08 ez_ 3 0.08 8zco 7 0.10 _cco 5 0.07 _ccca 6 0.12 _zcco 1 0.02 qza 2 0.06 ezco 6 0.08 _cce 4 0.06 rccca 5 0.10 _zcca 1 0.02 oza 2 0.06 ezca 5 0.07 _cc8 4 0.06 eccco 3 0.06 rzcca 1 0.02 oz_ 2 0.06 _zce 5 0.07 8cca 4 0.06 _ccco 3 0.06 ezcco 1 0.02 eza 2 0.06 _zc8 2 0.03 rcca 3 0.04 occc8 3 0.06 azcca 1 0.02 aza 1 0.03 rzc8 2 0.03 qcc8 3 0.04 accc8 3 0.06 8zcc8 1 0.02 _ze 1 0.03 ezce 2 0.03 occa 3 0.04 8ccc8 2 0.04 rzcc8 1 0.02 _z_ 1 0.03 ezc_ 2 0.03 occ8 2 0.03 rccc8 1 0.02 ozcca ----- ---- ---- 1 0.03 _zcr 2 0.03 jcca 2 0.03 occca 1 0.02 ozcc8 48 1.00 TOT 1 0.03 _zc_ 2 0.03 ecce 1 0.01 qccc8 1 0.02 azcc8 1 0.03 8zc8 2 0.03 8cco 1 0.01 accco 1 0.02 8zcco ----- ---- ---- 2 0.03 8cc8 1 0.01 accca 1 0.02 8zcca 36 1.00 TOT 1 0.01 rccr 1 0.01 8ccco ----- ---- ---- 1 0.01 qcca 1 0.01 8ccca 52 1.00 TOT 1 0.01 jcc8 ----- ---- ---- 1 0.01 ecco 71 1.00 TOT 1 0.01 ecc_ 1 0.01 acc8 1 0.01 _ccr ----- ---- ---- 71 1.00 TOT 72 0.35 oHa 23 0.34 oHc8 13 0.19 oHcc8 34 0.17 _Ho 8 0.12 oHco 9 0.13 oHcca 23 0.11 eHa 8 0.12 eHc8 9 0.13 eHcc8 19 0.09 oHo 6 0.09 oHca 7 0.10 eHcca 19 0.09 _Ha 4 0.06 eHca 6 0.09 oHcco 16 0.08 aHa 4 0.06 aHca 6 0.09 _Hcc8 5 0.02 oH_ 4 0.06 aHc8 5 0.07 aHcc8 5 0.02 eHo 4 0.06 _Hc8 4 0.06 _Hcco 3 0.01 oHe 2 0.03 eHco 3 0.04 aHcca 3 0.01 oH8 1 0.01 oHcr 3 0.04 _Hcca 2 0.01 eHe 1 0.01 oHcn 1 0.01 eHcco 2 0.01 _H8 1 0.01 _Hco 1 0.01 8Hcca 1 0.00 qHo 1 0.01 _Hca ----- ---- ---- 1 0.00 _H_ 1 0.01 8Hc8 67 1.00 TOT ----- ---- ---- ----- ---- ---- 205 1.00 TOT 68 1.00 TOT Obviously the subset of these letters that begins with `H' is different from the others in that the `H' is usually preceded by \o/, whereas the rest is usually word-initial (in .wds) or preceded by \ix/ (in .dic). Let's see what we have left out: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]/H/' \ -e 's/cs/z/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/y/i/g' \ -e 's/ci/a/g' \ -e 's/cg/8/g' \ | enum-contexts -vPAT='[czH][czH]*' -vCTX=0 \ | egrep -v -e '^(z|zc|cc|ccc|zcc|H|Hc|Hcc)$' \ | wfreq 123 0.15 Hccc 87 0.11 zccc 70 0.09 cccc 65 0.08 cccHc 41 0.05 zccHc 41 0.05 ccccHc 40 0.05 Hzcc 39 0.05 zcccHc 25 0.03 zccH 24 0.03 cHcc 22 0.03 cHc 21 0.03 cccH 17 0.02 zccHcc 16 0.02 cccHcc 13 0.02 zcccH 13 0.02 Hzc 12 0.02 ccccH 12 0.02 ccH 12 0.02 c 11 0.01 cccz 8 0.01 zcccHcc 8 0.01 Hcccc 7 0.01 ccccHcc 6 0.01 zzcc 6 0.01 cH 5 0.01 zcH 5 0.01 ccccz 4 0.01 zcccz 3 0.00 ccHc 3 0.00 cHccc 3 0.00 Hccz 2 0.00 zccHccc 2 0.00 zHcc 2 0.00 ccz 2 0.00 cccHccc 2 0.00 ccHzccc 2 0.00 ccHcc 2 0.00 Hczcc 2 0.00 Hczc 2 0.00 Hcz 1 0.00 zzcccHc 1 0.00 zzccH 1 0.00 zzcHcc 1 0.00 zcz 1 0.00 zccz 1 0.00 zccccHcc 1 0.00 zcccc 1 0.00 zcHcc 1 0.00 zcHc 1 0.00 zH 1 0.00 czc 1 0.00 ccccc 1 0.00 ccHccc 1 0.00 cHccz ----- ---- ---- 794 1.00 TOT Again, with some context: cat bio-j-jsa-gut.wds \ | sed \ -e 's/^/_/g' \ -e 's/$/_/g' \ -e 's/[ql][jg]/H/' \ -e 's/cs/z/g' \ -e 's/ij/k/g' \ -e 's/ix/e/g' \ -e 's/is/r/g' \ -e 's/iiu/n/g' \ -e 's/y/i/g' \ -e 's/ci/a/g' \ -e 's/cg/8/g' \ | enum-contexts -vPAT='[^czH][czH][czH]*[^czH]' -vCTX=0 \ | egrep -v -e '[^czH](z|zc|cc|ccc|zcc|H|Hc|Hcc)[^czH]' \ | wfreq 51 0.06 _cccHca 41 0.05 oHccc8 38 0.05 _zccc8 37 0.05 _zccHca 35 0.04 _zcccHca 35 0.04 _ccccHca 34 0.04 _Hccc8 32 0.04 _cccca 23 0.03 _zccca 21 0.03 oHccca 20 0.03 _cccc8 19 0.02 _zccHa 16 0.02 oHzcc8 16 0.02 _cccHa 12 0.02 _zcccHa 12 0.02 _ccccHa 11 0.01 _zccHcca 10 0.01 _ccHa 9 0.01 _cccHc8 8 0.01 eHccc8 8 0.01 _cccz_ 8 0.01 _cccHcc8 8 0.01 _Hccca 7 0.01 oHzcca 7 0.01 ezccc8 7 0.01 _cHcca 6 0.01 oHzc8 6 0.01 _zccHcc8 6 0.01 _cccHcca 6 0.01 _Hccco 5 0.01 ezccca 5 0.01 ecccc8 5 0.01 _zcccHcc8 5 0.01 _Hzcca 4 0.01 ocHcca 4 0.01 ocHca 4 0.01 ecccca 4 0.01 eccccHca 4 0.01 _zzcc8 4 0.01 _cHcc8 4 0.01 _cHca 4 0.01 _cHc8 4 0.01 _Hzcc8 3 0.00 rzccc8 3 0.00 oHcccca 3 0.00 jca 3 0.00 ecccHca 3 0.00 eHccca 3 0.00 azccca 3 0.00 _zccco 3 0.00 _zccH_ 3 0.00 _zcHa 3 0.00 _ccccz_ 3 0.00 _ccHc8 3 0.00 _cHco 3 0.00 _cHa 3 0.00 _Hzcco 2 0.00 rcccHa 2 0.00 qcHcc8 2 0.00 qcHc8 2 0.00 qcHa 2 0.00 ocHcc8 2 0.00 oHcccc8 2 0.00 ezcccz_ 2 0.00 ezcccHca 2 0.00 ezccHca 2 0.00 ezccHa 2 0.00 acHca 2 0.00 _zcccHcca 2 0.00 _zcccHc8 2 0.00 _zccHc8 2 0.00 _zHcca 2 0.00 _cccza 2 0.00 _ccccHcca 2 0.00 _ccccHcc8 2 0.00 _cccHccc8 2 0.00 _cccH_ 2 0.00 _Hzc8 2 0.00 8zccca 2 0.00 8zccc8 2 0.00 8c8 1 0.00 rcn 1 0.00 rccz_ 1 0.00 rcccz_ 1 0.00 rcccca 1 0.00 rcccc8 1 0.00 qca 1 0.00 qcHccc8 1 0.00 qcHcca 1 0.00 qcHca 1 0.00 ozccHo 1 0.00 occHcc8 1 0.00 ocHco 1 0.00 ocHccz_ 1 0.00 ocHcco 1 0.00 oHzca 1 0.00 oHczcca 1 0.00 oHczca 1 0.00 oHczc8 1 0.00 oHcz_ 1 0.00 oHcz8 1 0.00 oHccza 1 0.00 oHccz_ 1 0.00 oHccz8 1 0.00 oHccccl 1 0.00 jczco 1 0.00 jc8 1 0.00 ezcz8 1 0.00 ezcccHcc8 1 0.00 ezcccHa 1 0.00 ezcHa 1 0.00 eccz_ 1 0.00 eccccz_ 1 0.00 ecccco 1 0.00 eccccHcca 1 0.00 eccccHcc8 1 0.00 ecccHcc8 1 0.00 eccHa 1 0.00 eHzcca 1 0.00 eHzcc8 1 0.00 eHczcca 1 0.00 azcccca 1 0.00 azccc8 1 0.00 accccz_ 1 0.00 acccca 1 0.00 accccHcca 1 0.00 accccHca 1 0.00 acccc8 1 0.00 acHcca 1 0.00 acHa 1 0.00 aHzcco 1 0.00 aHzcc8 1 0.00 aHzca 1 0.00 aHcccca 1 0.00 aHccca 1 0.00 aHccc8 1 0.00 _zzcco 1 0.00 _zzcccHca 1 0.00 _zzcca 1 0.00 _zzccHa 1 0.00 _zzcHcc8 1 0.00 _zccz_ 1 0.00 _zcccz_ 1 0.00 _zccccHcca 1 0.00 _zccHccca 1 0.00 _zccHccc8 1 0.00 _zcHo 1 0.00 _zcHco 1 0.00 _zcHcca 1 0.00 _zHa 1 0.00 _cccco 1 0.00 _ccccco 1 0.00 _ccccHco 1 0.00 _cccHo 1 0.00 _cccHco 1 0.00 _cccHc_ 1 0.00 _ccHzccc8 1 0.00 _ccHo 1 0.00 _ccHccc8 1 0.00 _ccHcca 1 0.00 _cHccca 1 0.00 _cHccc8 1 0.00 _Hzco 1 0.00 _Hzca 1 0.00 _Hcccca 1 0.00 8zcccz_ 1 0.00 8cccco 1 0.00 8cccca 1 0.00 8cccc8 1 0.00 8cccHcc8 1 0.00 8Hzcca ----- ---- ---- 785 1.00 TOT These may be groups of the letters above. So here is the situation for maximal `czH' strings (after collapsing \ci/, \cy/ to `a', and \cg/ to `8', and all gallows to `H'): string freq plausible interpretations ------ ---- ------------------------- c 8 invalid z 151 a letter. H 842 a letter. cc 138 a letter. zc 70 a letter. Hc 481 a letter. cz - invalid. zz - invalid. Hz - invalid. cH 6 invalid. zH 1 invalid, or z+H. HH - invalid. ccc 71 a letter. zcc 52 a letter, or z+cc. Hcc 67 a letter, or H+cc. czc 1 invalid. zzc - invalid. Hzc 13 a letter, or H+zc. cHc 22 a letter (gallows with platform?). zHc - invalid. HHc - invalid. ccz 2 invalid, or cc+z. zcz 1 invalid, or zc+z. Hcz 2 invalid, or Hc+z. czz - invalid. zzz - invalid. Hzz - invalid. cHz - invalid. zHz - invalid. HHz - invalid. ccH 12 a letter, or cc+H. zcH 5 a letter, or zc+H. HcH - invalid. czH - invalid. zzH - invalid. HzH - invalid. cHH - invalid. zHH - invalid. HHH - invalid. cccc 70 a letter, or cc+cc. zccc 87 a letter, or zc+cc, or z+ccc. Hccc 123 a letter, or Hc+cc, or H+ccc. czcc - invalid. zzcc 6 invalid, or z+zcc, or z+z+cc. Hzcc 40 a letter, or H+zcc, or H+z+cc. cHcc 24 a letter. zHcc 2 invalid, or z+Hcc, or z+H+cc. HHcc - invalid. cczc - invalid. zczc - invalid. Hczc 2 invalid, or Hc+zc. czzc - invalid. zzzc - invalid. Hzzc - invalid. cHzc - invalid. zHzc - invalid. HHzc - invalid. ccHc 3 invalid, or cc+Hc. zcHc 1 invalid, or zc+Hc. HcHc - invalid. czHc - invalid. zzHc - invalid. HzHc - invalid. cHHc - invalid. zHHc - invalid. HHHc - invalid. cccz 11 letter, or ccc+z. zccz 1 invalid, or zcc+z, or z+cc+z. Hccz 3 invalid, or Hcc+z, or H+cc+z. czcz - invalid. zzcz - invalid. Hzcz - invalid. cHcz - invalid. zHcz - invalid. HHcz - invalid. cczz - invalid. zczz - invalid. Hczz - invalid. czzz - invalid. zzzz - invalid. Hzzz - invalid. cHzz - invalid. zHzz - invalid. HHzz - invalid. ccHz - invalid. zcHz - invalid. HcHz - invalid. czHz - invalid. zzHz - invalid. HzHz - invalid. cHHz - invalid. zHHz - invalid. HHHz - invalid. cccH 21 letter, or ccc+H. zccH 25 letter, or zcc+H, or z+cc+H. HccH - invalid. czcH - invalid. zzcH - invalid. HzcH - invalid. cHcH - invalid. zHcH - invalid. HHcH - invalid. cczH - invalid. zczH - invalid. HczH - invalid. czzH - invalid. zzzH - invalid. HzzH - invalid. cHzH - invalid. zHzH - invalid. HHzH - invalid. ccHH - invalid. zcHH - invalid. HcHH - invalid. czHH - invalid. zzHH - invalid. HzHH - invalid. cHHH - invalid. zHHH - invalid. HHHH - invalid.