Now extract line-initial and line-final words: foreach lang ( a b ) cat he${lang}-f-eva.wds \ | gawk 'BEGIN {s=1} /[\/=]/ {s=1;next}; /./ {if(s)print; s=0}' \ > he${lang}-f-bol.wds cat he${lang}-f-eva.wds \ | gawk 'BEGIN {w=""} /[\/=]/ {if(w!=""){print w;w=""};next}; /./ {w=$0}' \ > he${lang}-f-eol.wds end dicio-wc he{a,b}-f-{bol,eol}.wds lines words bytes file ------ ------- --------- ------------ 1216 1216 7610 hea-f-bol.wds 1216 1216 7010 hea-f-eol.wds 362 362 2353 heb-f-bol.wds 362 362 2039 heb-f-eol.wds foreach lang ( a b ) foreach ext ( bol eol ) cat he${lang}-f-${ext}.wds \ | sed \ -e 's/sh/X/g' \ -e 's/$/}/' \ -e 's/^/{/' \ -e 's/{\([qoaydirslmngj][qoaydirslmngj]*\)/\1{/' \ -e 's/\([qoaydirslmngj][qoaydirslmngj]*\)}/}\1/' \ -e 's/X/sh/g' \ -e 's/{}/\./' \ -e 's/\.//g' \ -e 's/{/- -/' \ -e 's/}/- -/' \ > .he${lang}-f-${ext}.fwd end end lines words bytes file ------ ------- --------- ------------ 1216 3152 13418 .hea-f-bol.fwd 1216 2668 11366 .hea-f-eol.fwd 362 926 4045 .heb-f-bol.fwd 362 792 3329 .heb-f-eol.fwd foreach lang ( a b ) foreach ext ( bol eol ) cat .he${lang}-f-${ext}.fwd \ | grep -v -e '- -' \ > .he${lang}-f-${ext}-unifs-all.wds cat .he${lang}-f-${ext}.fwd \ | grep -e '- -' \ | gawk '/./ {print $2}' \ > .he${lang}-f-${ext}-midfs-all.wds end cat .he${lang}-f-bol.fwd \ | grep -e '- -' \ | gawk '/./ {print $1}' \ > .he${lang}-f-bol-prefs-all.wds cat .he${lang}-f-eol.fwd \ | grep -e '- -' \ | gawk '/./ {print $3}' \ > .he${lang}-f-eol-suffs-all.wds end dicio-wc .he{a,b}-f-[be]ol-{prefs,midfs,suffs,unifs}-all.wds lines words bytes file ------ ------- --------- ------------ 968 968 2737 .hea-f-bol-prefs-all.wds 282 282 754 .heb-f-bol-prefs-all.wds 968 968 5645 .hea-f-bol-midfs-all.wds 282 282 1673 .heb-f-bol-midfs-all.wds 726 726 4085 .hea-f-eol-midfs-all.wds 215 215 1166 .heb-f-eol-midfs-all.wds 726 726 3128 .hea-f-eol-suffs-all.wds 215 215 909 .heb-f-eol-suffs-all.wds 248 248 1176 .hea-f-bol-unifs-all.wds 490 490 2302 .hea-f-eol-unifs-all.wds 80 80 400 .heb-f-bol-unifs-all.wds 147 147 665 .heb-f-eol-unifs-all.wds foreach f ( .he[ab]-f-[eb]ol-{prefs,midfs,suffs,unifs}-all.wds ) cat ${f} \ | sort | uniq -c | expand | sort +0 -1nr \ > ${f:r}.frq end dicio-wc .he[ab]-f-[eb]ol-{prefs,midfs,suffs,unifs}-all.frq lines words bytes file ------ ------- --------- ------------ 169 338 2661 .hea-f-bol-midfs-all.frq 34 68 419 .hea-f-bol-prefs-all.frq 119 238 1852 .hea-f-eol-midfs-all.frq 110 220 1486 .hea-f-eol-suffs-all.frq 77 154 1180 .heb-f-bol-midfs-all.frq 22 44 259 .heb-f-bol-prefs-all.frq 59 118 880 .heb-f-eol-midfs-all.frq 54 108 722 .heb-f-eol-suffs-all.frq 93 186 1203 .hea-f-bol-unifs-all.frq 149 298 1986 .hea-f-eol-unifs-all.frq 35 70 464 .heb-f-bol-unifs-all.frq 82 164 1061 .heb-f-eol-unifs-all.frq pr -m -w 80 -e -t \ .labels-s-prefs-all.frq \ .he{a,b}-f-bol-prefs-all.frq \ Note-009/he{a,b}-f-prefs-all.frq \ | expand \ > .prefs-joint.frq all labels herbal-A bol herbal-B bol herbal-A all herbal-B all freq prefix freq prefix freq prefix freq prefix freq prefix ---- --------- ---- --------- ---- --------- ---- --------- ---- --------- 194 o- 384 - 136 - 3656 - 1234 - 54 - 154 o- 58 y- 807 o- 490 o- 19 y- 141 qo- 22 qo- 603 qo- 300 qo- 8 ol- 138 y- 19 d- 424 y- 216 y- 4 d- 71 d- 17 o- 201 d- 57 ol- 3 dy- 17 s- 9 l- 55 s- 35 d- 2 a- 11 so- 6 ol- 33 ol- 26 l- 2 da- 7 oy- 1 a- 20 so- 10 dy- 2 dar- 6 ol- 1 al- 15 l- 9 a- 2 so- 4 l- 1 ara- 13 dy- 6 s- 1 adair- 4 yo- 1 dy- 12 r- 5 al- 1 al- 3 dy- 1 lo- 10 oy- 3 a:i- 1 ala- 3 or- 1 lqo- 9 or- 3 dal- 1 alam- 2 od- 1 q- 6 da:i- 3 lo- 1 ali- 2 os- 1 qol- 6 do- 2 da:i- 1 arar- 2 q- 1 r- 6 od- 2 do- 1 aro- 2 yd- 1 s- 5 os- 2 or- 1 do- 1 dain- 1 sol- 5 yo- 2 qol- 1 dol- 1 dls- 1 ss- 4 qod- 2 r- 1 il- 1 dor- 1 yd- 4 ro- 1 a:ii- 1 oal- 1 i- 1 yo- 4 sol- 1 ad- 1 oar- 1 lor- 1 yol- 3 a- 1 ao- 1 or- 1 oldai- 3 da- 1 ar- 1 oyd- 1 ols- 3 dol- 1 ara- 1 q- 1 oor- 3 lo- 1 da- 1 qo- 1 oqo- 3 sy- 1 dalo- 1 s- 1 oso- 3 yd- 1 dol- 1 siiir- 1 qod- 2 al- 1 dor- 1 soi- 1 qoo- 2 da:in- 1 lol- 1 sol- 1 qy- 2 dal- 1 lqo- 1 yd- 1 ro- 2 dor- 1 o:n- 1 yy- 1 syd- 2 old- 1 od- 1 ydarai- 2 qoda:i- 1 olo- 1 yol- 1 :i- 1 orol- 1 :iiin- 1 oy- 1 a:i- 1 sa:i- 1 ar- 1 say- 1 da:iinr 1 sol- 1 dao- 1 ss- 1 dar- 1 sy- 1 darod- 1 yd- 1 day- 1 yo- 1 dl- 1 yol- 1 dls- 1 ds- 1 dyo- 1 lol- 1 lor- 1 ls- 1 oda:i- 1 olda:i- 1 ols- 1 oly- 1 oo- 1 oor- 1 oqo- 1 ora- 1 ory- 1 oso- 1 qol- 1 qoo- 1 qor- 1 qos- 1 qoy- 1 rolo- 1 sa- 1 sa:i- 1 soo- 1 syd- 1 ydara:i 1 yol- 1 yr- Comparing the beginning-of-line statistics with those of all words, we can see that: * In language A, the ratio y-/o- changes from 0.90 to 0.53; whereas in language B the ratio changes from 3.41 to 0.44. Yet another argument for the thesis that y- is merely a more ornate form of o-. * Otherwise, the major prefix frequencies seem roughly the same. Which is encouraging, since it says that line breaks and word spaces are similar. If line breaks are true word boundaries, then the same is true of most spaces. * The bol sample has a smaller set of prefixes, but that seems to be just about the expected number given the ratio of sample sizes (1:6). * The prefix frequencies in labels are significantly different from those in any of the four word samples. Now for the suffixes: pr -m -w 80 -e -t \ .labels-s-suffs-all.frq \ .he{a,b}-f-eol-suffs-all.frq \ Note-009/he{a,b}-f-suffs-all.frq \ | expand \ > .suffs-joint.frq all labels herbal-A eol herbal-B eol herbal-A all herbal-B all freq suffix freq suffix freq suffix freq suffix freq suffix ---- --------- ---- --------- ---- --------- ---- --------- ---- --------- 41 -y 190 -y 53 -y 1816 -y 639 -dy 19 -ol 58 -aiin 31 -am 903 -ol 533 -y 17 -ar 41 -ody 30 -dy 705 -or 168 -ar 16 -al 29 -am 9 -aiin 360 -o 143 -aiin 14 -or 29 -ol 8 - 316 -aiin 111 -ody 11 -ody 28 -om 8 -ar 218 -ody 97 -ol 10 -dy 28 -or 6 -ain 174 -ar 84 -al 9 -aly 20 - 6 -al 124 - 63 - 7 - 20 -al 6 -ody 104 -al 51 -or 6 -os 18 -ar 5 -ary 76 -odaiin 44 -o 5 -alar 14 -oldy 3 -daiin 76 -s 40 -am 4 -aiin 13 -ory 3 -dam 71 -os 34 -daiin 4 -ain 12 -dy 2 -ald 69 -od 33 -d 4 -air 11 -ain 2 -ardam 66 -om 31 -os 4 -aram 11 -an 2 -d 61 -am 30 -s 4 -ary 11 -oly 2 -ol 49 -oiin 28 -dar 4 -dal 9 -od 2 -or 48 -ain 26 -ain 4 -oldy 8 -odaiin 1 -a 40 -oldy 19 -od 4 -orain 8 -os 1 -aiily 35 -oy 15 -air 3 -am 8 -s 1 -ainqod 29 -an 14 -dal 3 -o 6 -a 1 -alaiin 29 -oly 11 -aly 3 -odar 6 -o 1 -alam 27 -ory 10 -aldy 3 -oly 5 -aldy 1 -alas 26 -odar 9 -odaiin 3 -r 5 -old 1 -aldy 24 -a 7 -dain 3 -s 5 -ordy 1 -aly 23 -dy 7 -dam 2 -alaiin 5 -yd 1 -amdy 15 -odal 6 -a 2 -aldy 5 -ydy 1 -ara 14 -oaiin 6 -ary 2 -alody 4 -ald 1 -aram 14 -yd 6 -odar 2 -an 4 -ary 1 -arar 12 -d 6 -oy 2 -araiin 4 -d 1 -ardy 12 -n 5 -dair 2 -aral 4 -m 1 -aro 12 -ydy 4 -dol 2 -as 4 -oaiin 1 -aros 10 -air 4 -oar 2 -d 4 -oy 1 -da 10 -l 4 -odal 2 -oaiin 4 -ys 1 -daly 10 -odain 4 -oldy 2 -oaly 3 -odain 1 -dol 10 -ols 4 -sy 2 -olar 3 -odal 1 -dolaii 9 -ordy 3 -araiin 2 -ols 3 -odar 1 -dydy 9 -sy 3 -aral 2 -om 3 -oiin 1 -dym 8 -aldy 3 -arar 2 -yd 3 -olo 1 -m 8 -odol 3 -ardy 1 -aday 3 -ols 1 -o 8 -olol 3 -as 1 -ainy 3 -yds 1 -oam 8 -r 3 -daly 1 -airdy 2 -dm 1 -odydy 7 -ady 3 -dor 1 -airy 2 -n 1 -odys 7 -ald 3 -dydy 1 -aj 2 -odody 1 -old 7 -old 3 -oly 1 -ala 2 -olm 1 -olody 7 -olo 3 -ydy 1 -alain 2 -olol 1 -ols 6 -aiir 2 -adaiin 1 -alal 1 -adaiin 1 -oram 6 -aly 2 -ady 1 -alalg 1 -aiiin 1 -orar 6 -ary 2 -ainr 1 -alaly 1 -aiim 1 -rodal 6 -oar 2 -ald 1 -alam 1 -aiind 1 -saiin 6 -odam 2 -amdy 1 -ald 1 -aiiny 1 -san 6 -olor 2 -an 1 -aldar 1 -air 1 -ym 6 -ydaiin 2 -aram 1 -aldm 1 -alod 1 -yom 5 -as 2 -ardam 1 -aldo 1 -alody 1 -yoram 5 -m 2 -da 1 -algar 1 -arar 5 -odaly 2 -dody 1 -aloiir 1 -ardl 5 -odan 2 -l 1 -alrar 1 -ariin 5 -on 2 -oiin 1 -alsain 1 -arm 5 -oror 2 -ols 1 -alsy 1 -aro 5 -ys 2 -oody 1 -alyd 1 -aroiin 4 -ay 2 -oram 1 -any 1 -arom 4 -daiin 2 -orar 1 -ao 1 -aryd 4 -dal 2 -so 1 -aralar 1 -as 4 -oal 2 -yl 1 -araldy 1 -da 4 -oiiin 1 -ad 1 -aralgy 1 -daiin 4 -oraiin 1 -ai:dy 1 -arar 1 -dal 4 -osy 1 -aiiin 1 -aro 1 -dam 3 -adaiin 1 -aiily 1 -dagy 1 -ds 3 -dain 1 -aiiny 1 -daiir 1 -l 3 -iin 1 -aiir 1 -dajy 1 -ld 3 -odl 1 -airaii 1 -dar 1 -oas 3 -olody 1 -airy 1 -din 1 -odaiin 3 -ooiin 1 -alaiin 1 -dorgy 1 -odam 3 -oor 1 -alaiin 1 -g 1 -odan 3 -orody 1 -alam 1 -iir 1 -odary 3 -orory 1 -alas 1 -lairgy 1 -odd 3 -osaiin 1 -aldaii 1 -ldam 1 -oddal 3 -ydal 1 -aldar 1 -m 1 -odoldy 3 -yds 1 -alody 1 -oaldy 1 -oiiin 2 -aiiin 1 -als 1 -odady 1 -olal 2 -alod 1 -amar 1 -odaiin 1 -oldam 2 -als 1 -ara 1 -odaiir 1 -oldar 2 -dam 1 -aro 1 -odals 1 -oloaii 2 -dm 1 -arodai 1 -odol 1 -olodal 2 -oary 1 -aror 1 -oj 1 -olom 2 -odody 1 -aros 1 -olaiin 1 -olsy 2 -odor 1 -asal 1 -olam 1 -on 2 -odys 1 -ay 1 -olarol 1 -ora 2 -olaiin 1 -daiiin 1 -oldain 1 -oraiin 2 -oldam 1 -dalo 1 -olg 1 -orar 2 -oldar 1 -dalor 1 -olinj 1 -ord 2 -olm 1 -dly 1 -oloara 1 -ordm 2 -olom 1 -do 1 -olor 1 -orly 2 -orar 1 -dolaii 1 -ora 1 -orm 2 -orol 1 -ds 1 -orad 1 -orods 2 -orom 1 -dsairy 1 -oraj 1 -oroiin 2 -osar 1 -dyd 1 -oraldy 1 -orom 1 -aar 1 -dyldy 1 -oram 1 -oror 1 -ad 1 -dym 1 -orol 1 -orory 1 -adam 1 -i:dy 1 -ory 1 -oross 1 -adar 1 -lain 1 -osal 1 -oryd 1 -aii:dy 1 -lal 1 -osam 1 -osory 1 -aii:m 1 -ls 1 -osar 1 -osy 1 -aii:od 1 -ly 1 -osarar 1 -oyd 1 -aii:s 1 -m 1 -osdy 1 -r 1 -aiilm 1 -oal 1 -oys 1 -sm 1 -aiind 1 -oam 1 -ral 1 -sordy 1 -aiinda 1 -odain 1 -sas 1 -sy 1 -aiiny 1 -odair 1 -sody 1 -yddor 1 -aind 1 -odalai 1 -sos 1 -yqoldy 1 -ainos 1 -odaly 1 -sy 1 -airin 1 -odam 1 -yar 1 -alody 1 -odody 1 -yda 1 -aor 1 -odydy 1 -ydal 1 -arar 1 -odys 1 -ydary 1 -arasy 1 -olal 1 -ydy 1 -ardl 1 -olar 1 -ys 1 -ariin 1 -old 1 -ysam 1 -arm 1 -olody 1 -aro 1 -olol 1 -aroiin 1 -olor 1 -arom 1 -oraiin 1 -aryd 1 -oyl 1 -da 1 -riin 1 -dan 1 -rodal 1 -doly 1 -saiin 1 -dom 1 -san 1 -draird 1 -sar 1 -ds 1 -sdy 1 -i:s 1 -ym 1 -ir 1 -yom 1 -ld 1 -yoram 1 -lol 1 -yr 1 -lor 1 -lsy 1 -ly 1 -oain 1 -oair 1 -oan 1 -oarom 1 -oas 1 -oda 1 -odaiin 1 -odaiir 1 -odair 1 -odairo 1 -odals 1 -odary 1 -odd 1 -oddal 1 -oddy 1 -odo 1 -odoaly 1 -odoldy 1 -odoral 1 -odr 1 -oii:s 1 -oiir 1 -oin 1 -olal 1 -olda 1 -oldain 1 -oldal 1 -oldm 1 -oldom 1 -oloaii 1 -olodal 1 -oloiin 1 -ololor 1 -olols 1 -ololy 1 -olr 1 -olraii 1 -olsy 1 -oo 1 -ooaiin 1 -ora 1 -orain 1 -oral 1 -oraly 1 -orari: 1 -orary 1 -ord 1 -ordaii 1 -ordm 1 -orl 1 -orly 1 -orm 1 -orodo 1 -orods 1 -oroiin 1 -oross 1 -ors 1 -oryd 1 -osory 1 -oyd 1 -ra 1 -raiin 1 -rrr 1 -ry 1 -saiin 1 -sal 1 -sm 1 -so 1 -sody 1 -sor 1 -sordy 1 -soy 1 -yaiin 1 -yays 1 -ydain 1 -ydainy 1 -yddor 1 -ydlo 1 -ydm 1 -yl 1 -yly 1 -yoar 1 -yol 1 -ysaiin 1 -ysaiin This is intriguing: the ratio -dy/-y, that is the clearest indicator of language B, is more marked on general words than on line-final words. The ratios are 0.012 and 1.199 for text, 0.566 and 0.063 for line-final words. The frequencies of {-al,-ol,-ar,-or} at end-of-line are about half of their overall frequencies, for both languages. The frequency of these suffixes in labels is intermediate between the general frequencies, but higher than the end-of-line frequency. Of the occurrences of -am in language A, 48% are at end of line; in language B, 78% are at end-of-line. Presumably -am is an abbreviation (used at e-o-l to avoid a line break), or something that occurs mostly at end-of-sentence. The {-o,-a}/-y ratio is 0.073 for labels; 0.063 and 0.037 for end-of-line (A and B, respectively); and 0.211 and 0.093 for all herbal words (A and B). Yet one more argument that -y is a fancy version of -o or -a. Here is a summary of the most important suffix classes: labels herbal-A eol herbal-B eol herbal-A all herbal-B all ---- ------ --- -------- ---- ------- ---- ------- ---- ------- -[yoa] 44 (14%) 202 (28%) 55 (26%) 2200 (37%) 583 (24%) -[oa][lr] 66 (21%) 95 (13%) 18 (8.4%) 1886 (32%) 400 (16%) -[aoy]d[yoa] 12 (3.8%) 41 (5.6%) 6 (2.8%) 232 (3.9%) 114 (4.7%) -d[yoa] 10 (3.2%) 13 (1.8%) 31 (14%) 24 (0.4%) 642 (26%) - 7 (2.2%) 20 (2.8%) 8 (3.7%) 124 (2.1%) 63 (2.6%) -aiin 4 (1.2%) 58 (8.0%) 9 (4.2%) 316 (5.3%) 143 (5.9%) Labels seem to use generally less -aiin, -dy, -y, -o Now let's look at unifixes at either extremity: pr -m -w 64 -e -t \ .labels-s-unifs-all.frq \ .hea-f-bol-unifs-all.frq \ .hea-f-eol-unifs-all.frq \ Note-009/hea-f-unifs-all.frq \ | expand \ > .unifs-a-joint.frq all labels herbal-A bol herbal-A eol herbal-A all freq unifix freq unifix freq unifix freq unifix ---- --------- ---- --------- ---- --------- ---- --------- 6 am 46 daiin 107 daiin 412 daiin 6 ar 16 m 30 dy 88 dy 3 ary 12 or 20 dam 87 s 2 dy 11 dain 19 dar 74 dain 2 gy 7 dar 18 dal 71 dar 2 odor 7 dor 18 s 52 or 2 sal 7 sor 12 d 47 dal 2 sar 6 oaiin 11 dain 45 ol 2 sary 6 saiin 11 sy 40 dol 2 siiir 5 dol 8 saiin 34 dam 1 aiin 5 iin 7 am 34 dor 1 ainaly 5 ol 6 da 33 saiin 1 ainam 5 sol 6 dan 26 dair 1 airar 4 doiin 6 dom 24 sy 1 al 4 soiin 6 or 19 odaiin 1 alols 3 in 6 sal 18 ar 1 aly 3 l 5 aiin 17 d 1 araly 3 odaiin 5 ar 16 m 1 arar 3 olor 5 dary 16 sor 1 araydy 3 sar 5 ody 16 y 1 arody 3 y 5 ol 15 aiin 1 asy 3 ydaiin 4 dair 15 r 1 daiin 2 dair 4 r 15 sol 1 daiindy 2 lor 4 raiin 14 sar 1 dainy 2 oain 4 sos 13 qodaiin 1 dal 2 odar 4 y 12 sal 1 dalary 2 qo 3 daiiin 10 al 1 daliir 2 qoaiin 3 dol 10 oaiin 1 dalsy 2 qody 3 n 10 ody 1 dan 2 qor 3 sar 9 am 1 dar 2 soy 3 sol 9 dan 1 daramgal 2 yol 3 ydaiin 9 do Label unifixes and text unifixes seem largey disjoint, except for {am,al,ary,dy,sal,sar}. The longer label unifixes occur once each; exceptions are sary and siir. The unifix "iin" occurs with nonzero frequency at b-o-l but almost absent elsewere. Checking the text one can see that almost all occurrences are due to a common word (usually "daiin") being split across a line break. The "m" unifix which is common at beginning of line is a Friedman transcription bug: in the Currier transcription most of those "m" are actually "g"s attached to the end of the previous line. pr -m -w 64 -e -t \ .labels-s-unifs-all.frq \ .heb-f-bol-unifs-all.frq \ .heb-f-eol-unifs-all.frq \ Note-009/heb-f-unifs-all.frq \ | expand \ > .unifs-b-joint.frq all labels herbal-B bol herbal-B eol herbal-B all freq unifix freq unifix freq unifix freq unifix ---- --------- ---- --------- ---- --------- ---- --------- 6 am 19 daiin 11 dam 86 daiin 6 ar 7 saiin 7 dy 55 or 3 ary 6 dar 6 daiin 50 dar 2 dy 4 iin 6 dal 40 aiin 2 gy 4 m 4 am 39 ar 2 odor 3 dair 4 dar 31 ol 2 sal 3 ol 3 aiin 28 dy 2 sar 3 or 3 ain 25 dal 2 sary 3 r 3 ar 25 saiin 2 siiir 2 dor 3 daly 17 dam 1 aiin 2 sor 3 ldy 14 ody 1 ainaly 1 aiin 3 ody 12 oraiin 1 ainam 1 ain 3 ol 12 s 1 airar 1 ar 3 or 9 al 1 al 1 daiir 3 s 8 olaiin 1 alols 1 darar 3 saiin 8 r 1 aly 1 iir 2 al 7 ain 1 araly 1 laiin 2 aram 7 dair 1 arar 1 lodaiin 2 d 7 odaiin 1 araydy 1 oaiin 2 da 7 y 1 arody 1 odair 2 daram 6 dol 1 asy 1 olair 2 dary 6 raiin 1 daiin 1 qoaiin 2 od 5 am 1 daiindy 1 rodaiin 2 oldam 5 dain 1 dainy 1 saiir 2 oraiin 5 dor 1 dal 1 sair 2 r 5 m 1 dalary 1 sairain 2 raiin 5 sair 1 daliir 1 sar 2 sa 5 sar 1 dalsy 1 saraiir 2 y 4 araiin 1 dan 1 sol 1 a 4 daly 1 dar 1 solaiin 1 aldy 4 iin 1 daramgal 1 y 1 alod 4 ldy The disjointness of label and text unifixes apparently holds for language B too.