Hacking at the Voynich manuscript - Side notes 507 A complete factorization of words in the original EVA alphabet Last edited on 1999-01-31 06:22:12 by stolfi OBSOLETE This is partly a remake of work from Notebook-1.txt and Notebook-2.txt, originally done between 97-07-05 and 97-07-16. Summary of previous relevant tasks: I obtained Landini's interlinear transcription of the VMs, version 1.6 (landini-interln16.evt) from http://sun1.bham.ac.uk/G.Landini/evmt/intrln16.zip. [Notebook-1.txt] Around 97-11-01 I split landini-interln16.evt into many files, with one text unit per page. [Notebook-12.txt] On 97-11-05 I mapped those files from FSG and other ad-hoc alphabets to EVA. [Notebook-12.txt] The files are L16-eva/fNNxx.YY, and a machine-readable description of their contents and logical order is in L16-eva/INDEX. Then I started going back to redoing some of the previous tasks using the new encoding. I extracted the Currier (;C>) and Friedman-I (;F>) versions of the "bio" section, in EVA alphabet, as files bio-{c,f}-eva.evt. I also built the associated text files and word lists bio-{c,f}-eva{-{gut,fun,bad},{wds,dic,frq},.txt}. [Note-001.txt] Afer mapping the data to a reduced alphabet (ERA) [Note-003.txt], I identified a paradigm which consists of 267 prefixes combined with 147 suffixes; the latter are maximal strings of the form [doirlmn]*. If we use just the 91 most common prefixes (>= 3 occurrences) and the most common 24 suffixes(>= 20 occurrences), we can reproduce 5867 of the 6166 original words (95.1%). [Note-006.txt] 97-11-11 stolfi =============== Now that we have a possible paradigm, let's redo it in the original EVA alphabet, and see whether there are any subtler patterns. In EVA, the suffix presumably can be defined as a maximal suffix of a word that has the form [daoirsl]*[rlmnyd]. Let's see how many words we can factor this way: cat bio-f-eva-gut.wds \ | sed -e 's:\([daoirsl]*[rslmnyd]\)$:- -\1:' \ | egrep -e '- -' \ > Note-007/.factored dicio-wc bio-f-eva-gut.wds Note-007/.factored lines words bytes file ------ ------- --------- ------------ 6182 6182 37279 bio-f-eva-gut.wds 6095 12190 55190 Note-007/.factored These are the 87 words in Friedman's version that don't factor as above: cat bio-f-eva-gut.wds \ | sed -e 's:\([daoirsl]*[rslmnyd]\)$:- -\1:' \ | egrep -v -e'- -' \ | sort | uniq -c | expand \ | sort +0 -1nr \ | format-counts lo(6) qoka(6) da(5) sa(4) dalo(3) olo(3) ora(3) qo(3) ra(3) shek(3) o(2) ota(2) ro(2) a(1) chckh(1) checkho(1) chek(1) chep(1) cholo(1) chop(1) dcheo(1) ka(1) lch(1) lka(1) lshdyqo(1) lshee(1) lsho(1) oka(1) ola(1) olda(1) olka(1) olkee(1) olqo(1) opa(1) oqo(1) oro(1) oroka(1) ot(1) qoeea(1) qok(1) qokede(1) qop(1) qopalo(1) qot(1) rsh(1) rshee(1) saino(1) sh(1) she(1) sheo(1) sheolo(1) shet(1) ta(1) tcho(1) ykeda(1) Here are the 74 words of the Currer file that don't factor as above: cat bio-c-eva-gut.wds \ | sed -e 's:\([daoirsl]*[rslmnyd]\)$:- -\1:' \ | egrep -v -e'- -' \ | sort | uniq -c | expand \ | sort +0 -1nr \ | format-counts lo(6) o(4) shek(4) q(3) qo(3) qok(3) qokag(3) da(2) dalo(2) sh(2) chcth(1) che(1) checkho(1) chee(1) cheef(1) cheg(1) chek(1) chkag(1) cholo(1) dalog(1) dok(1) lch(1) le(1) lkamo(1) lko(1) loe(1) log(1) lshdyqo(1) lsho(1) ok(1) olo(1) olqo(1) olshee(1) opa(1) oqo(1) orairaro(1) ot(1) otag(1) qokap(1) qokesh(1) qokyrlshe(1) qop(1) qot(1) rag(1) saino(1) shedea(1) sheolo(1) sheq(1) tchalolkee(1) teyteg(1) ycheolt(1) ysheeyqo(1) These are the 23 unfactorable words that occur in both versions: checkho chek cholo da dalo lch lo lshdyqo lsho o olo olqo opa oqo ot qo qok qop qot saino sh shek sheolo Thus we can probably ignore them for now... Now, let's collect the prefixes and suffixes: cat Note-007/.factored \ | gawk '/./ {print $1}' \ | revbytes | sort | uniq | revbytes \ > Note-007/.prefs-all.dic cat Note-007/.factored \ | gawk '/./ {print $2}' \ | sort | uniq \ > Note-007/.suffs-all.dic dicio-wc Note-007/.prefs-all.dic Note-007/.suffs-all.dic lines words bytes file ------ ------- --------- ------------ 526 526 3588 Note-007/.prefs-all.dic 206 206 1248 Note-007/.suffs-all.dic Great. Now let's count their occurrences and list the most important: cat Note-007/.factored \ | gawk '/./ {print $1}' \ | revbytes | sort | revbytes | uniq -c | expand \ | sort +0 -1nr \ > Note-007/.prefs-all.frq The 31 most important prefixes (at least 30 occurrences), accounting for 4627 words -(1291) qok-(497) she-(380) che-(362) qokee-(224) qoke-(211) q-(159) ok-(132) ot-(132) qot-(124) lche-(83) ch-(74) shee-(74) chee-(73) ote-(70) t-(65) olk-(62) sh-(62) qote-(61) k-(56) oke-(56) okee-(49) qotee-(48) chckh-(45) olche-(45) otee-(34) lshe-(33) olkee-(33) te-(32) olshe-(30) pche-(30) The next 129 prefixes (less than 30 but at least 3 occurrences), accounting for another 1040 words: ke-(29) sheckh-(28) kee-(26) dshe-(25) olke-(25) shckh-(24) p-(23) chcth-(22) qokch-(20) shek-(20) checkh-(19) ykee-(18) dche-(17) opche-(17) rche-(17) checth-(16) tche-(16) lk-(15) tee-(15) yshe-(15) chek-(14) lch-(14) shcth-(12) chckhe-(11) otch-(11) qokche-(11) rshe-(11) qolche-(10) shckhe-(10) shecth-(10) yche-(10) cheek-(9) cthe-(9) okch-(9) otche-(9) qokshe-(9) tch-(9) y-(9) yte-(9) chet-(8) chk-(8) cth-(8) dch-(8) kche-(8) olch-(8) pch-(8) qolk-(8) tshe-(8) yk-(8) ytee-(8) lkee-(7) lshee-(7) olt-(7) op-(7) qopche-(7) sche-(7) sshe-(7) yt-(7) lke-(6) qoee-(6) qotch-(6) sheet-(6) shet-(6) solche-(6) solkee-(6) ychee-(6) yke-(6) yshee-(6) cht-(5) dshee-(5) kshe-(5) lchee-(5) okche-(5) otshe-(5) psh-(5) qokeee-(5) qolkee-(5) qotche-(5) rch-(5) salche-(5) sheek-(5) shk-(5) chcthe-(4) cheet-(4) chete-(4) cph-(4) cphe-(4) kch-(4) lt-(4) oche-(4) ofche-(4) okshe-(4) olsh-(4) olshee-(4) opshe-(4) oshe-(4) polche-(4) qoky-(4) qop-(4) shcthe-(4) sheckhe-(4) shok-(4) sht-(4) solk-(4) chcphe-(3) chke-(3) ckhe-(3) dee-(3) dyk-(3) lcheckh-(3) lkche-(3) lsh-(3) ltee-(3) okesh-(3) olchee-(3) olok-(3) opch-(3) otsh-(3) polsh-(3) qee-(3) qet-(3) qockh-(3) qockhe-(3) qokeche-(3) qopch-(3) qopshe-(3) qotsh-(3) rshee-(3) sokee-(3) The other 366 prefixes (less than 3 occurrences), accounting for another 428 words: checthe-(2) chte-(2) ckh-(2) dalche-(2) dalsh-(2) dchee-(2) de-(2) dke-(2) dlche-(2) dsh-(2) dy-(2) dykee-(2) dyt-(2) dyte-(2) ee-(2) eee-(2) fche-(2) keee-(2) ksh-(2) lcheckhe-(2) lpche-(2) lsheckh-(2) lshet-(2) ockh-(2) ockhe-(2) octh-(2) of-(2) olchcth-(2) olfche-(2) olkch-(2) olkeey-(2) olkshe-(2) olpche-(2) olte-(2) oltee-(2) oltshe-(2) opchee-(2) oq-(2) oqok-(2) orche-(2) polch-(2) pshe-(2) qche-(2) qe-(2) qocthe-(2) qodee-(2) qoeee-(2) qofche-(2) qolch-(2) qolke-(2) qolshe-(2) qolshee-(2) qotshe-(2) rchek-(2) salkee-(2) shecph-(2) sheke-(2) shepch-(2) sk-(2) solke-(2) yfche-(2) yq-(2) acth-(1) ain-(1) ak-(1) akee-(1) alche-(1) alk-(1) alke-(1) alocfh-(1) alshee-(1) arote-(1) ary-(1) at-(1) atche-(1) cfh-(1) cfhee-(1) chckhy-(1) cheak-(1) chech-(1) checkhe-(1) checphe-(1) chedch-(1) chedche-(1) chedchey-(1) chedy-(1) chedyk-(1) cheeke-(1) cheke-(1) chekee-(1) cheok-(1) cheolch-(1) chep-(1) chepche-(1) cheyq-(1) chf-(1) chfee-(1) chkch-(1) chkee-(1) chlchpshee-(1) chok-(1) cholche-(1) cholkeee-(1) chot-(1) chpche-(1) chsh-(1) chytee-(1) daiin-(1) dain-(1) dairy-(1) dalk-(1) dalkeee-(1) daly-(1) darchee-(1) dchckhe-(1) dcheckh-(1) dchok-(1) deee-(1) dk-(1) dkee-(1) dkshe-(1) doke-(1) dolch-(1) dolche-(1) dolchee-(1) dolfche-(1) dolke-(1) dorch-(1) dotee-(1) dsholy-(1) dye-(1) dyke-(1) dyksh-(1) efche-(1) ek-(1) eke-(1) ep-(1) epch-(1) et-(1) fch-(1) fshe-(1) kech-(1) kolch-(1) kolsh-(1) korch-(1) kot-(1) lchcph-(1) lchcphe-(1) lchcth-(1) ldche-(1) lf-(1) lkede-(1) loche-(1) loee-(1) loiin-(1) lok-(1) lokee-(1) lolk-(1) lolke-(1) lpch-(1) lshckh-(1) lsheckhe-(1) lsheet-(1) lte-(1) n-(1) ochee-(1) octhe-(1) odche-(1) oddche-(1) oee-(1) oekee-(1) ofch-(1) okalch-(1) okech-(1) okeedy-(1) okeee-(1) okeshe-(1) okeylch-(1) okiin-(1) okok-(1) oky-(1) okylk-(1) olaiin-(1) olcheckh-(1) olcth-(1) olee-(1) oleee-(1) oleese-(1) olkeedy-(1) olkeee-(1) olkeeyshe-(1) olkesh-(1) olkeshe-(1) ollch-(1) oloke-(1) ololche-(1) ololkee-(1) olotche-(1) olotee-(1) olpockh-(1) olsht-(1) oltsh-(1) olty-(1) oly-(1) opalk-(1) opalshe-(1) opsh-(1) oqofche-(1) ork-(1) orshee-(1) osh-(1) oshep-(1) otalsh-(1) otalshe-(1) otchot-(1) otech-(1) otechy-(1) otedee-(1) otyot-(1) oyshe-(1) palch-(1) pchcfh-(1) pdalsh-(1) pe-(1) pok-(1) poldak-(1) poldshe-(1) polkee-(1) polshe-(1) poltesh-(1) porshe-(1) psche-(1) pshee-(1) pyke-(1) qckh-(1) qckhe-(1) qcphe-(1) qcth-(1) qcthdy-(1) qcthe-(1) qeedee-(1) qeee-(1) qek-(1) qekch-(1) qepche-(1) qete-(1) qoche-(1) qodche-(1) qodyke-(1) qoeekee-(1) qokech-(1) qokechckh-(1) qokede-(1) qokeedyqok-(1) qokeeylshe-(1) qokolche-(1) qokolk-(1) qokop-(1) qokych-(1) qolchee-(1) qolkch-(1) qolkeee-(1) qolkesh-(1) qolsh-(1) qopolk-(1) qopsh-(1) qoqok-(1) qoqokee-(1) qorch-(1) qorsh-(1) qoshe-(1) qoteee-(1) qoty-(1) qoyk-(1) qp-(1) qsolkee-(1) qyk-(1) racth-(1) ralch-(1) rchee-(1) reat-(1) rkch-(1) rolch-(1) rolche-(1) rsh-(1) rt-(1) sak-(1) salt-(1) sch-(1) schckh-(1) schcth-(1) schee-(1) schet-(1) scthe-(1) shckhee-(1) shcthey-(1) shech-(1) shecphe-(1) shecthe-(1) shecthedch-(1) sheecth-(1) sheecthe-(1) sheee-(1) sheekch-(1) sheeke-(1) sheekee-(1) sheete-(1) shekee-(1) sheok-(1) shepche-(1) shepshe-(1) shete-(1) shey-(1) sheyk-(1) shke-(1) shockh-(1) shocphe-(1) shoe-(1) shoksh-(1) shot-(1) skee-(1) sockhe-(1) sok-(1) sokchee-(1) solchk-(1) soltche-(1) soltee-(1) sot-(1) spch-(1) ssh-(1) ssheckh-(1) sshek-(1) sshkch-(1) st-(1) talsh-(1) tchdolt-(1) tchee-(1) tchet-(1) teae-(1) teche-(1) tedy-(1) teyte-(1) tocthe-(1) tok-(1) tolke-(1) torolsh-(1) tot-(1) tsheokee-(1) tyqok-(1) ychckh-(1) ycheckhe-(1) ychecth-(1) yckhe-(1) ydarshe-(1) yep-(1) ykch-(1) ykeech-(1) ykesh-(1) ykshe-(1) ylch-(1) yok-(1) yqok-(1) yrche-(1) yshche-(1) ytch-(1) yty-(1) Now for the suffixes: cat Note-007/.factored \ | gawk '/./ {print $2}' \ | sort | uniq -c | expand \ | sort +0 -1nr \ > Note-007/.suffs-all.frq The 24 most significant suffixes (at least 20 occurrences), accounting for 5489 words: -dy(1782) -y(1254) -ol(563) -aiin(458) -al(269) -ar(177) -or(130) -daiin(121) -dal(96) -dar(92) -ain(90) -saiin(56) -d(55) -s(53) -sol(51) -oly(37) -dol(34) -l(29) -aly(26) -am(25) -oldy(24) -lol(23) -r(22) -sal(22) The 61 intermediate-frequency ones (less than 20, at least 3 occurrences), accounting for another 471 words: -iin(18) -raiin(18) -m(17) -sor(17) -olol(15) -sar(15) -olor(14) -lor(13) -air(12) -ary(12) -dam(12) -rol(12) -aldy(11) -dor(11) -ly(11) -olaiin(11) -ral(11) -dain(10) -ldy(10) -ory(10) -sy(10) -ody(9) -oiin(9) -orol(9) -dair(8) -arol(7) -as(7) -daly(7) -dary(7) -oraiin(7) -rar(7) -ror(7) -soiin(7) -ady(6) -alor(6) -ry(6) -sdy(6) -alol(5) -daldy(5) -dl(5) -loly(5) -aiiin(4) -an(4) -aror(4) -ls(4) -ols(4) -ram(4) -ay(3) -laiin(3) -lal(3) -lar(3) -n(3) -odaiin(3) -oin(3) -olain(3) -olal(3) -olar(3) -orar(3) -oroiin(3) -oroly(3) -sair(3) The 121 least significant ones (less than 3 occurrences), accounting for only 135 words: -ail(2) -alom(2) -dairol(2) -ldaiin(2) -lddy(2) -lr(2) -odal(2) -odar(2) -osal(2) -oy(2) -roiin(2) -sain(2) -saly(2) -sarol(2) -aiir(1) -alaldy(1) -ald(1) -aloir(1) -aloly(1) -alory(1) -als(1) -alsy(1) -aol(1) -aos(1) -aoy(1) -aral(1) -araly(1) -aram(1) -arar(1) -ardy(1) -arodl(1) -arody(1) -arolom(1) -daiiin(1) -dail(1) -dairoldy(1) -dalal(1) -dalar(1) -dalary(1) -dalol(1) -darady(1) -daral(1) -darary(1) -dardy(1) -darody(1) -darol(1) -darom(1) -daror(1) -daroram(1) -darory(1) -doldy(1) -dolor(1) -doly(1) -doraiin(1) -dr(1) -drol(1) -iror(1) -laiiin(1) -lam(1) -larol(1) -ld(1) -ldal(1) -ldalor(1) -ldar(1) -ldol(1) -ll(1) -lldar(1) -lod(1) -loldy(1) -lolom(1) -lolor(1) -lom(1) -loraiin(1) -lorain(1) -lsairor(1) -lsan(1) -lsar(1) -oaan(1) -oaiin(1) -oal(1) -oar(1) -oiiiin(1) -oiiin(1) -olam(1) -olarar(1) -olardy(1) -old(1) -oldair(1) -oldal(1) -ollom(1) -oloiin(1) -olom(1) -oloraiin(1) -oloral(1) -olorol(1) -olsaly(1) -orain(1) -orair(1) -oral(1) -oram(1) -oraror(1) -orols(1) -orory(1) -os(1) -rain(1) -raly(1) -rary(1) -rd(1) -rl(1) -roly(1) -rom(1) -saiirol(1) -saral(1) -sas(1) -sls(1) -sodar(1) -soldy(1) -solor(1) -soly(1) -soraiin(1) -sorar(1) Let's compare the suffixes of a few common prefixes: set tfiles = ( ) set totw = 0 set sufw = 7 set digs = 3 echo " " foreach f ( q ok ot qok qot lche ch she che qoke qokee ) echo "$f-" set g = "Note-007/.suffs-${f}.frq" /bin/rm -f ${g} echo "frq" "$f-" \ | gawk '/./ {printf "%'"${digs}"'s %-'"${sufw}"'s\n", $1, $2}' \ >> ${g} echo "--------------" "--------------" \ | gawk '/./ {printf "%.'"${digs}"'s %.'"${sufw}"'s\n", $1, $2}' \ >> ${g} cat Note-007/.factored \ | egrep '^'"${f}"'-' \ | gawk '/./ {print $2}' \ | revbytes | sort | revbytes | uniq -c \ | gawk '/./ {printf "%'"${digs}"'d %s\n", $1, $2}' \ | sort +0 -1nr \ >> ${g} @ totw = ${totw} + ${digs} + 1 + ${sufw} + 1 set tfiles = ( ${tfiles} ${g} ) end pr -m -s' ' -t -i' '1 -w ${totw} ${tfiles} \ | expand \ > Note-007/prefs-cmp.txt Here are some conjugations: frq q- --- ------- 126 -ol 8 -or 7 -oly 3 -ody 1 -oal 1 -oar 1 -odaiin 1 -odar ... ... frq ok- frq ot- frq qok- frq qot- --- ------- --- ------- --- ------- --- ------- 52 -aiin 38 -aiin 184 -aiin 32 -aiin 22 -al 26 -ar 102 -al 29 -al 14 -ar 22 -al 59 -y 25 -y 9 -ain 16 -y 51 -ain 13 -ar 8 -y 8 -ol 39 -ar 9 -ol 5 -ol 5 -ain 21 -ol 5 -ain 4 -aly 3 -aly 8 -or 2 -am 3 -air 2 -aldy 6 -aly 2 -as ... ... ... ... ... ... ... ... frq lche- frq ch- frq she- frq che- frq qoke- frq qokee- --- ------- --- ------- --- ------- --- ------- --- ------- --- ------- 50 -dy 18 -dy 228 -dy 193 -dy 157 -dy 143 -dy 20 -y 12 -ol 83 -y 84 -y 37 -y 73 -y 3 -d 7 -y 28 -ol 28 -ol 3 -or 4 -d 3 -ol 5 -al 11 -or 9 -or 2 -ar 1 -dal 1 -al 4 -ar 7 -dal 8 -s 2 -d 1 -dor 1 -am 4 -l 4 -daiin 6 -al 2 -dal 1 -r 1 -ar 3 -dar 3 -al 6 -daiin 2 -dar 1 -s 1 -dal 2 -am 3 -ar 5 -d 1 -al ... ... ... ... ... ... ... ... ... ... ... ... OK, let's generate a table with the main prefixes and suffixes: cat Note-007/.prefs-all.frq \ | sort +0 -1nr \ | gawk '($1 >= 3) {print $2}' \ | revbytes | sort | revbytes \ > Note-007/.prefs-top.dic cat Note-007/.suffs-all.frq \ | sort +0 -1nr \ | gawk '($1 >= 15) {print $2}' \ > Note-007/.suffs-top.dic dicio-wc Note-007/.prefs-top.dic Note-007/.suffs-top.dic lines words bytes file ------ ------- --------- ------------ 160 160 980 Note-007/.prefs-top.dic 30 30 141 Note-007/.suffs-top.dic cat Note-007/.factored \ | count-diword-freqs \ -v rows=Note-007/.prefs-top.dic \ -v cols=Note-007/.suffs-top.dic \ -v digits=3 \ > Note-007/pref-suff-wds-table.txt cat Note-007/.factored \ | fgrep -w -f Note-007/.prefs-top.dic \ | fgrep -w -f Note-007/.suffs-top.dic \ | wc This set of prefixes and suffixes covers 5176 of the 6166 original words (84%)! Let's compute the corresponding numbers without taking word repetitions into account: cat bio-f-eva-gut.dic \ | sed -e 's:\([daoirsl]*[rslmnyd]\)$:- -\1:' \ | egrep -e '- -' \ > Note-007/.dic-factored dicio-wc bio-f-eva-gut.dic Note-007/.dic-factored lines words bytes file ------ ------- --------- ------------ 1342 1342 8998 bio-f-eva-gut.dic 1287 2574 12603 Note-007/.dic-factored Note that 55 distinct words were not factored. cat Note-007/.dic-factored \ | gawk '/./ {print $1}' \ | revbytes | sort | uniq | revbytes \ > Note-007/.prefs-all.dic cat Note-007/.dic-factored \ | gawk '/./ {print $2}' \ | sort | uniq \ > Note-007/.suffs-all.dic dicio-wc Note-007/.prefs-all.dic Note-007/.suffs-all.dic lines words bytes file ------ ------- --------- ------------ 526 526 3588 Note-007/.prefs-all.dic 206 206 1248 Note-007/.suffs-all.dic cat Note-007/.dic-factored \ | gawk '/./ {print $1}' \ | revbytes | sort | revbytes | uniq -c | expand \ | sort +0 -1nr \ > Note-007/.prefs-all.frq cat Note-007/.dic-factored \ | gawk '/./ {print $2}' \ | sort | uniq -c | expand \ | sort +0 -1nr \ > Note-007/.suffs-all.frq Here are the prefixes, with number of *distinct* words using them: The 175 prefs with at least two occurrences, accounting for 936 distinct words: -(171) qok-(25) ch-(24) che-(21) ok-(20) ot-(19) q-(19) t-(18) she-(17) k-(16) qot-(15) sh-(15) qoke-(13) p-(12) lche-(11) ote-(11) chee-(10) olk-(10) y-(8) lk-(7) olche-(7) op-(7) qokee-(7) chckh-(6) cth-(6) lshe-(6) oke-(6) okee-(6) olshe-(6) qote-(6) qotee-(6) rche-(6) shee-(6) tch-(6) chk-(5) ke-(5) lch-(5) pche-(5) shek-(5) te-(5) tee-(5) yche-(5) yt-(5) cht-(4) cph-(4) dch-(4) dche-(4) dshe-(4) dshee-(4) kee-(4) lchee-(4) olke-(4) olt-(4) otch-(4) pch-(4) psh-(4) qolk-(4) qop-(4) rshe-(4) sche-(4) shk-(4) tche-(4) ychee-(4) yk-(4) yshe-(4) ytee-(4) chek-(3) chet-(3) cthe-(3) kshe-(3) lkee-(3) lt-(3) oche-(3) okch-(3) olkee-(3) opch-(3) opshe-(3) otee-(3) polche-(3) polsh-(3) qet-(3) qokche-(3) qoky-(3) qolche-(3) qolkee-(3) qopch-(3) qotch-(3) rch-(3) salche-(3) sheckh-(3) shet-(3) sht-(3) solche-(3) solk-(3) sshe-(3) tshe-(3) ykee-(3) yshee-(3) chckhe-(2) chcth-(2) chcthe-(2) checkh-(2) checthe-(2) chete-(2) ckh-(2) ckhe-(2) cphe-(2) dalche-(2) dchee-(2) dee-(2) dlche-(2) dsh-(2) dy-(2) dyk-(2) dyt-(2) dyte-(2) ee-(2) eee-(2) kch-(2) kche-(2) lcheckhe-(2) lkche-(2) lke-(2) lpche-(2) lsh-(2) lshee-(2) ltee-(2) octh-(2) of-(2) okche-(2) okesh-(2) okshe-(2) olch-(2) olchee-(2) olkeey-(2) olkshe-(2) olok-(2) olshee-(2) oltshe-(2) opche-(2) oqok-(2) oshe-(2) otche-(2) otshe-(2) polch-(2) pshe-(2) qche-(2) qe-(2) qee-(2) qockh-(2) qockhe-(2) qocthe-(2) qodee-(2) qofche-(2) qokch-(2) qokeche-(2) qokeee-(2) qokshe-(2) qolch-(2) qolshe-(2) qopche-(2) qotche-(2) qotsh-(2) qotshe-(2) salkee-(2) shckhe-(2) shcthe-(2) sheckhe-(2) shok-(2) sk-(2) solke-(2) solkee-(2) yfche-(2) yq-(2) yte-(2) The 351 prefixes that account for only one word each: acth-(1) ain-(1) ak-(1) akee-(1) alche-(1) alk-(1) alke-(1) alocfh-(1) alshee-(1) arote-(1) ary-(1) at-(1) atche-(1) cfh-(1) cfhee-(1) chckhy-(1) chcphe-(1) cheak-(1) chech-(1) checkhe-(1) checphe-(1) checth-(1) chedch-(1) chedche-(1) chedchey-(1) chedy-(1) chedyk-(1) cheek-(1) cheeke-(1) cheet-(1) cheke-(1) chekee-(1) cheok-(1) cheolch-(1) chep-(1) chepche-(1) cheyq-(1) chf-(1) chfee-(1) chkch-(1) chke-(1) chkee-(1) chlchpshee-(1) chok-(1) cholche-(1) cholkeee-(1) chot-(1) chpche-(1) chsh-(1) chte-(1) chytee-(1) daiin-(1) dain-(1) dairy-(1) dalk-(1) dalkeee-(1) dalsh-(1) daly-(1) darchee-(1) dchckhe-(1) dcheckh-(1) dchok-(1) de-(1) deee-(1) dk-(1) dke-(1) dkee-(1) dkshe-(1) doke-(1) dolch-(1) dolche-(1) dolchee-(1) dolfche-(1) dolke-(1) dorch-(1) dotee-(1) dsholy-(1) dye-(1) dyke-(1) dykee-(1) dyksh-(1) efche-(1) ek-(1) eke-(1) ep-(1) epch-(1) et-(1) fch-(1) fche-(1) fshe-(1) kech-(1) keee-(1) kolch-(1) kolsh-(1) korch-(1) kot-(1) ksh-(1) lchcph-(1) lchcphe-(1) lchcth-(1) lcheckh-(1) ldche-(1) lf-(1) lkede-(1) loche-(1) loee-(1) loiin-(1) lok-(1) lokee-(1) lolk-(1) lolke-(1) lpch-(1) lshckh-(1) lsheckh-(1) lsheckhe-(1) lsheet-(1) lshet-(1) lte-(1) n-(1) ochee-(1) ockh-(1) ockhe-(1) octhe-(1) odche-(1) oddche-(1) oee-(1) oekee-(1) ofch-(1) ofche-(1) okalch-(1) okech-(1) okeedy-(1) okeee-(1) okeshe-(1) okeylch-(1) okiin-(1) okok-(1) oky-(1) okylk-(1) olaiin-(1) olchcth-(1) olcheckh-(1) olcth-(1) olee-(1) oleee-(1) oleese-(1) olfche-(1) olkch-(1) olkeedy-(1) olkeee-(1) olkeeyshe-(1) olkesh-(1) olkeshe-(1) ollch-(1) oloke-(1) ololche-(1) ololkee-(1) olotche-(1) olotee-(1) olpche-(1) olpockh-(1) olsh-(1) olsht-(1) olte-(1) oltee-(1) oltsh-(1) olty-(1) oly-(1) opalk-(1) opalshe-(1) opchee-(1) opsh-(1) oq-(1) oqofche-(1) orche-(1) ork-(1) orshee-(1) osh-(1) oshep-(1) otalsh-(1) otalshe-(1) otchot-(1) otech-(1) otechy-(1) otedee-(1) otsh-(1) otyot-(1) oyshe-(1) palch-(1) pchcfh-(1) pdalsh-(1) pe-(1) pok-(1) poldak-(1) poldshe-(1) polkee-(1) polshe-(1) poltesh-(1) porshe-(1) psche-(1) pshee-(1) pyke-(1) qckh-(1) qckhe-(1) qcphe-(1) qcth-(1) qcthdy-(1) qcthe-(1) qeedee-(1) qeee-(1) qek-(1) qekch-(1) qepche-(1) qete-(1) qoche-(1) qodche-(1) qodyke-(1) qoee-(1) qoeee-(1) qoeekee-(1) qokech-(1) qokechckh-(1) qokede-(1) qokeedyqok-(1) qokeeylshe-(1) qokolche-(1) qokolk-(1) qokop-(1) qokych-(1) qolchee-(1) qolkch-(1) qolke-(1) qolkeee-(1) qolkesh-(1) qolsh-(1) qolshee-(1) qopolk-(1) qopsh-(1) qopshe-(1) qoqok-(1) qoqokee-(1) qorch-(1) qorsh-(1) qoshe-(1) qoteee-(1) qoty-(1) qoyk-(1) qp-(1) qsolkee-(1) qyk-(1) racth-(1) ralch-(1) rchee-(1) rchek-(1) reat-(1) rkch-(1) rolch-(1) rolche-(1) rsh-(1) rshee-(1) rt-(1) sak-(1) salt-(1) sch-(1) schckh-(1) schcth-(1) schee-(1) schet-(1) scthe-(1) shckh-(1) shckhee-(1) shcth-(1) shcthey-(1) shech-(1) shecph-(1) shecphe-(1) shecth-(1) shecthe-(1) shecthedch-(1) sheecth-(1) sheecthe-(1) sheee-(1) sheek-(1) sheekch-(1) sheeke-(1) sheekee-(1) sheet-(1) sheete-(1) sheke-(1) shekee-(1) sheok-(1) shepch-(1) shepche-(1) shepshe-(1) shete-(1) shey-(1) sheyk-(1) shke-(1) shockh-(1) shocphe-(1) shoe-(1) shoksh-(1) shot-(1) skee-(1) sockhe-(1) sok-(1) sokchee-(1) sokee-(1) solchk-(1) soltche-(1) soltee-(1) sot-(1) spch-(1) ssh-(1) ssheckh-(1) sshek-(1) sshkch-(1) st-(1) talsh-(1) tchdolt-(1) tchee-(1) tchet-(1) teae-(1) teche-(1) tedy-(1) teyte-(1) tocthe-(1) tok-(1) tolke-(1) torolsh-(1) tot-(1) tsheokee-(1) tyqok-(1) ychckh-(1) ycheckhe-(1) ychecth-(1) yckhe-(1) ydarshe-(1) yep-(1) ykch-(1) yke-(1) ykeech-(1) ykesh-(1) ykshe-(1) ylch-(1) yok-(1) yqok-(1) yrche-(1) yshche-(1) ytch-(1) yty-(1) Now, the top 18 suffixes, accounting for 963 distinct words: -y(292) -dy(248) -ol(79) -aiin(45) -al(39) -ar(38) -d(36) -or(32) -s(28) -dal(18) -dar(18) -am(16) -l(16) -ain(14) -oldy(13) -r(11) -aly(10) -ary(10) The 58 suffixes with less than 10 but at least 2 distinct prefixes, accounting for 194 distinct words: -aldy(8) -air(7) -daiin(7) -dor(7) -m(7) -oly(7) -ody(6) -sy(6) -ady(5) -alor(5) -as(5) -dam(5) -dol(5) -oiin(5) -sdy(5) -alol(4) -lor(4) -ly(4) -olol(4) -oraiin(4) -aiiin(3) -ay(3) -dair(3) -ldy(3) -lol(3) -olaiin(3) -orol(3) -ory(3) -ail(2) -alom(2) -arol(2) -dain(2) -dairol(2) -daldy(2) -daly(2) -dary(2) -ldaiin(2) -lddy(2) -lr(2) -ls(2) -n(2) -odaiin(2) -odal(2) -odar(2) -oin(2) -olain(2) -olal(2) -olar(2) -olor(2) -ols(2) -orar(2) -oroiin(2) -osal(2) -oy(2) -ry(2) -sain(2) -sal(2) -sol(2) The 130 suffixes that occur with only one prefix: -aiir(1) -alaldy(1) -ald(1) -aloir(1) -aloly(1) -alory(1) -als(1) -alsy(1) -an(1) -aol(1) -aos(1) -aoy(1) -aral(1) -araly(1) -aram(1) -arar(1) -ardy(1) -arodl(1) -arody(1) -arolom(1) -aror(1) -daiiin(1) -dail(1) -dairoldy(1) -dalal(1) -dalar(1) -dalary(1) -dalol(1) -darady(1) -daral(1) -darary(1) -dardy(1) -darody(1) -darol(1) -darom(1) -daror(1) -daroram(1) -darory(1) -dl(1) -doldy(1) -dolor(1) -doly(1) -doraiin(1) -dr(1) -drol(1) -iin(1) -iror(1) -laiiin(1) -laiin(1) -lal(1) -lam(1) -lar(1) -larol(1) -ld(1) -ldal(1) -ldalor(1) -ldar(1) -ldol(1) -ll(1) -lldar(1) -lod(1) -loldy(1) -lolom(1) -lolor(1) -loly(1) -lom(1) -loraiin(1) -lorain(1) -lsairor(1) -lsan(1) -lsar(1) -oaan(1) -oaiin(1) -oal(1) -oar(1) -oiiiin(1) -oiiin(1) -olam(1) -olarar(1) -olardy(1) -old(1) -oldair(1) -oldal(1) -ollom(1) -oloiin(1) -olom(1) -oloraiin(1) -oloral(1) -olorol(1) -olsaly(1) -orain(1) -orair(1) -oral(1) -oram(1) -oraror(1) -orols(1) -oroly(1) -orory(1) -os(1) -raiin(1) -rain(1) -ral(1) -raly(1) -ram(1) -rar(1) -rary(1) -rd(1) -rl(1) -roiin(1) -rol(1) -roly(1) -rom(1) -ror(1) -saiin(1) -saiirol(1) -sair(1) -saly(1) -sar(1) -saral(1) -sarol(1) -sas(1) -sls(1) -sodar(1) -soiin(1) -soldy(1) -solor(1) -soly(1) -sor(1) -soraiin(1) -sorar(1) cat Note-007/.prefs-all.frq \ | sort +0 -1nr \ | gawk '($1 >= 2) {print $2}' \ | revbytes | sort | revbytes \ > Note-007/.prefs-top.dic cat Note-007/.suffs-all.frq \ | sort +0 -1nr \ | gawk '($1 >= 5) {print $2}' \ > Note-007/.suffs-top.dic dicio-wc Note-007/.prefs-top.dic Note-007/.suffs-top.dic lines words bytes file ------ ------- --------- ------------ 175 175 1068 Note-007/.prefs-top.dic 33 33 152 Note-007/.suffs-top.dic cat Note-007/.dic-factored \ | count-diword-freqs \ -v rows=Note-007/.prefs-top.dic \ -v cols=Note-007/.suffs-top.dic \ -v digits=3 \ > Note-007/pref-suff-dic-table.txt cat Note-007/.dic-factored \ | fgrep -w -f Note-007/.prefs-top.dic \ | fgrep -w -f Note-007/.suffs-top.dic \ | wc These prefixes and suffixes account for 712 out of the original 1342 words (53%). Not very impressive...