Let's have another quick look at the A/B differences in midfix frequencies of Note-009.txt. perhaps the difference will become sharper (or disappear) if we collapse k/t and replace ch=sh=ee, cth=ete, etc. foreach guy ( Friedman.f Currier.c ) foreach lang ( A.a B.b ) cat Note-009/he${lang:e}-${guy:e}.factored \ | grep -v -e '- -' \ | eva2erb \ | sort | uniq -c | expand | sort +0 -1nr \ > .he${lang:e}-${guy:e}-unifs-all.frq cat Note-009/he${lang:e}-${guy:e}.factored \ | grep -e '- -' \ | gawk '/./ {print $1}' \ | eva2erb \ | sort | uniq -c | expand | sort +0 -1nr \ > .he${lang:e}-${guy:e}-prefs-all.frq cat Note-009/he${lang:e}-${guy:e}.factored \ | grep -e '- -' \ | gawk '/./ {print $2}' \ | eva2erb \ | sort | uniq -c | expand | sort +0 -1nr \ > .he${lang:e}-${guy:e}-midfs-all.frq cat Note-009/he${lang:e}-${guy:e}.factored \ | grep -e '- -' \ | gawk '/./ {print $3}' \ | eva2erb \ | sort | uniq -c | expand | sort +0 -1nr \ > .he${lang:e}-${guy:e}-suffs-all.frq foreach elem ( pref midf suff unif ) set file = "he${lang:e}-${guy:e}-${elem}s-all" echo "${file}.frq -> ${file}.fmt" cat .${file}.frq \ | compute-freqs \ | gawk '\ BEGIN {\ printf "by '"${guy:r}"'\nlanguage '"${lang:r}"'\n"; \ printf "freq pc '"${elem}"'ix\n---- -- ----------------\n";} \ /./ {printf "%4d %2d %s\n",$1,int($2*100+0.5),$3; t+=$1;} \ END {printf "---- -- ----------------\n%4d 99 TOTAL\n",t;} \ ' \ > .${file}.fmt end end end dicio-wc .he{a,b}-{f,c}-{pref,midf,suff,unif}s-all.fmt foreach elem ( pref midf suff unif ) set tfiles = ( ) foreach guy ( f c ) foreach lang ( a b ) set file = "he${lang}-${guy}-${elem}s-all" set tfiles = ( ${tfiles} .${file}.fmt ) end end pr -m -t -i' '1 -w 88 ${tfiles} \ | expand \ > .herbal-${elem}-cmp.txt end dicio-wc .herbal-{pref,midf,suff,unif}-cmp.txt Looking at the suffix frequencies, it seems that the main difference between A and B is that the latter uses "d" instead of some letter that should be in the midfix. If we eliminate the "-do" suffix and renormalize, the by Friedman by Friedman by Friedman language A language B language B minus -do freq pc suffix freq pc suffix freq pc suffix ---- -- ------------- ---- -- ------------- ---- -- ------------- 2200 37 -o 642 26 -do 583 32 -o 1008 17 -ol 583 24 -o 254 14 -or 960 16 -or 254 10 -or 183 10 -ol 365 6 -oiin 183 8 -ol 145 8 -oiin 239 4 -odo 145 6 -oiin 116 6 -odo 127 2 -om 116 5 -odo 63 4 - 124 2 - 63 3 - 41 2 -om 85 1 -odoiin 41 2 -om 34 2 -doiin ---- -- ------------- ---- -- ------------- ---- -- ------------- 5967 99 TOTAL 2431 99 TOTAL 1789 99 TOTAL I still can't see much resemblance in the midfixes. Moreover the B midfixes seem longer on the average. So it is not a matter of moving some suffix letter to the midfix. Basically the B language does not use the ckh/cth gallows. Perhaps ckh = ked, or something of the sort? by Friedman by Friedman by Currier by Currier language A language B language A language B freq pc midfix freq pc midfix freq pc midfix freq pc midfix ---- -- ------------- ---- -- ------------- ---- -- ------------- ---- -- ------------- 1595 27 -ee- 590 24 -k- 1472 28 -ee- 404 21 -k- 913 15 -k- 279 12 -eee- 865 16 -k- 240 13 -ke- 856 14 -kee- 274 11 -ke- 705 13 -kee- 219 12 -eee- 459 8 -eke- 269 11 -ee- 385 7 -eke- 202 11 -ee- 418 7 -eee- 261 11 -kee- 316 6 -eee- 187 10 -kee- 155 3 -ke- 76 3 -eeeke- 128 2 -eeok- 62 3 -eeeke- 152 3 -keee- 72 3 -keee- 128 2 -keee- 48 3 -eeee- 132 2 -eeok- 60 3 -eeek- 101 2 -ke- 48 3 -eeek- 110 2 -pee- 49 2 -pee- 100 2 -pee- 44 2 -keee- 99 2 -eeee- 48 2 -eeee- 75 1 -epe- 34 2 -pee- 93 2 -ekee- 48 2 -p- 72 1 -eeee- 34 2 -peee- 81 1 -epe- 39 2 -peee- 69 1 -eeeke- 33 2 -eke- 73 1 -eeeke- 33 1 -eke- 66 1 -eeokee- 32 2 -p- 60 1 -p- 25 1 -ekee- 56 1 -ekee- 23 1 -ekee- 57 1 -eek- 24 1 -eek- 55 1 -p- 18 1 -eek- 55 1 -eeokee- 20 1 -eeekee- 52 1 -eek- 16 1 -eeeeke- Well, since the prefixes seem OK, let's compare the midfix+suffix together: foreach guy ( Friedman.f ) foreach lang ( A.a B.b ) cat Note-009/he${lang:e}-${guy:e}.factored \ | grep -e '- -' \ | gawk '/./ {print ($2 $3)}' \ | sed -e 's/--//g' \ | eva2erb \ | sort | uniq -c | expand | sort +0 -1nr \ > .he${lang:e}-${guy:e}-tails-all.frq foreach elem ( tail ) set file = "he${lang:e}-${guy:e}-${elem}s-all" echo "${file}.frq -> ${file}.fmt" cat .${file}.frq \ | compute-freqs \ | gawk '\ BEGIN {\ printf "by '"${guy:r}"'\nlanguage '"${lang:r}"'\n"; \ printf "freq pc '"${elem}"'ix\n---- -- ----------------\n";} \ /./ {printf "%4d %2d %s\n",$1,int($2*100+0.5),$3; t+=$1;} \ END {printf "---- -- ----------------\n%4d 99 TOTAL\n",t;} \ ' \ > .${file}.fmt end end end dicio-wc .he{a,b}-{f}-{tail}s-all.fmt foreach elem ( tail ) set tfiles = ( ) foreach guy ( f ) foreach lang ( a b ) set file = "he${lang}-${guy}-${elem}s-all" set tfiles = ( ${tfiles} .${file}.fmt ) end end pr -m -t -i' '1 -w 88 ${tfiles} \ | expand \ > .herbal-${elem}-cmp.txt end dicio-wc .herbal-{tail}-cmp.txt lines words bytes file ------ ------- --------- ------------ 774 3722 41887 .herbal-tail-cmp.txt by Friedman by Friedman language A language B freq pc tailix freq pc tailix ---- -- ---------------- ---- -- ---------------- 404 7 -keeo 153 6 -kor 395 7 -eeo 150 6 -kedo 370 6 -eeol 129 5 -keedo 337 6 -eeor 117 5 -eeedo 197 3 -ko 108 4 -koiin 189 3 -kol 88 4 -kol 176 3 -ekeo 87 4 -eedo 166 3 -eeeo 71 3 -keeo 162 3 -koiin 65 3 -ko 146 2 -keeor 55 2 -eeekeo 132 2 -keeol 51 2 -eeeo 119 2 -kor 31 1 -keeedo 99 2 -keeeo 31 1 -keo 91 2 -eeeor 30 1 -eeor 81 1 -ekeor 30 1 -kom 80 1 -ekeol 29 1 -eeeko 78 1 -eeoiin 28 1 -eeo 64 1 -eeodo 28 1 -keeeo 61 1 -eeeeo 27 1 -eeol 60 1 -eeoko 24 1 -eeodo 57 1 -ekeeo 23 1 -keodo 48 1 -eeeol 23 1 -koin 47 1 -eeekeo 22 1 -eeeor 44 1 -keol 21 1 -eeeodo 42 1 -eeodoiin 21 1 -ekeo 40 1 -eeoekeo 20 1 -kodo 39 1 -eeokeeo 19 1 -eeeeo 39 1 -keo 19 1 -peeedo 37 1 -keeodo 18 1 -keol 33 1 -eeom 18 1 -peedo 30 1 -peeo 17 1 -eeeol 29 1 -kom 16 1 -eeeekeo 28 1 -keor 13 1 -eeko 27 1 -peeor 12 1 -eeeedo 24 0 -ekeodo 12 1 -eeekeeo 24 0 -epeo 12 1 -ekeeo 24 0 -kodo 12 1 -koldo 23 0 -keeoiin 11 1 -eedoiin 22 0 -eeoro 11 1 -eeekedo 21 0 -eekeeo 11 1 -koir 20 0 -ekeoiin 10 0 -ekeedo 19 0 -eee 10 0 -k 19 0 -k 10 0 -keeodo 19 0 -koldo 10 0 -kolo 18 0 -eeoin 9 0 -keor 17 0 -eeeko 9 0 -peeo 17 0 -eeer 8 0 -eeed 17 0 -eeod 8 0 -eeekoiin 17 0 -ekeom 8 0 -por 17 0 -kod 7 0 -eedol 16 0 -eeekeeo 7 0 -eeedoiin 16 0 -eeko 7 0 -eeee 16 0 -eeolo 7 0 -eeek 16 0 -eeon 7 0 -eeer 16 0 -keeod 7 0 -eeoko 16 0 -peeol 7 0 -ekedo 15 0 -eeeodo 7 0 -koro 15 0 -eeokol 7 0 -peeeo 15 0 -eer 6 0 -eeekol 15 0 -keeeeo 6 0 -ked 14 0 -eekoiin 6 0 -kedoiin 14 0 -eeoeeo 6 0 -keedor 14 0 -eeokoiin 6 0 -keeol 14 0 -epeol 6 0 -poiin 14 0 -keeom 5 0 -eed 13 0 -eeoldo 5 0 -eee 13 0 -ekeeeo 5 0 -eeekeedo 12 0 -ee 5 0 -eeekeeeo 12 0 -eeeer 5 0 -eeekor 12 0 -eeeoiin 5 0 -kedor 12 0 -eeoo 5 0 -keeor 11 0 -ekeeor 5 0 -keer 11 0 -keeeor 5 0 -kodoiin 11 0 -keodo 5 0 -koror 11 0 -kodoiin 5 0 -peeedor 11 0 -koin 4 0 -eedom 10 0 -eedo 4 0 -eedor 10 0 -eekor 4 0 -eeedor 10 0 -eeok 4 0 -eeeekeeo 10 0 -eeokeo 4 0 -eeeer 10 0 -ekeeol 4 0 -eeeoekeo 10 0 -epeor 4 0 -eeepeedo 10 0 -kee 4 0 -eeepo 10 0 -keeeol 4 0 -eekoiin 10 0 -koo 4 0 -eer 10 0 -peeeo 4 0 -keed 9 0 -eeeeor 4 0 -keeeeo 9 0 -eeokor 4 0 -keeod 9 0 -ekeeodo 4 0 -peedoiin 8 0 -eeeeol 4 0 -peeol 8 0 -eeeodoiin 3 0 -eedolo 8 0 -eeeom 3 0 -eeedol 8 0 -eeodol 3 0 -eeeeko 8 0 -eeodor 3 0 -eeeked 8 0 -eeokeeeo 3 0 -eeeod 8 0 -eeokeeol 3 0 -eekedo ---- -- ---------------- ---- -- ---------------- 5967 99 TOTAL 2431 99 TOTAL Inspired by Landini's paper, let me prepare a graph of A-freq × B-freq for each segment: foreach guy ( Friedman.f ) foreach elem ( pref midf suff unif tail ) set pfile = "herbal-${guy:e}-${elem}s-all" set afile = "hea-${guy:e}-${elem}s-all" set bfile = "heb-${guy:e}-${elem}s-all" echo "${afile}.frq, ${bfile}.frq -> ${pfile}.plt" /n/gnu/bin/join \ -a 1 -a 2 -e 0 \ -j1 2 -j2 2 \ -o1.1,2.1,0 \ ${afile}.frq ${bfile}.frq \ > .${pfile}.plt plot-lang-diffs ${guy:r} ${elem} ${pfile}.plt end end dicio-wc .he-{f}-{tail}s-all.fmt