Hacking at the Voynich manuscript - Side notes
506 An alternative complete factorization of words in ERA alphabet

Last edited on 1999-01-31 06:21:20 by stolfi

OBSOLETE

  This is partly a remake of work from Notebook-1.txt and Notebook-2.txt,
  originally done between 97-07-05 and 97-07-16.

  Summary of previous relevant tasks:

    I obtained Landini's interlinear transcription of the VMs, version
    1.6 (landini-interln16.evt) from
    http://sun1.bham.ac.uk/G.Landini/evmt/intrln16.zip. [Notebook-1.txt]
    
    Around 97-11-01 I split landini-interln16.evt into many files, with one
    text unit per page. [Notebook-12.txt]
    
    On 97-11-05 I mapped those files from FSG and other ad-hoc
    alphabets to EVA.  [Notebook-12.txt] The files are
    L16-eva/fNNxx.YY, and a machine-readable description of their
    contents and logical order is in L16-eva/INDEX.
    
    Then I started going back to redoing some of the previous tasks
    using the new encoding.
    
    I extracted the Currier (;C>) and Friedman-I (;F>)
    versions of the "bio" section, in EVA alphabet, as files
    bio-{c,f}-eva.evt. I also built the associated text files and word lists
    bio-{c,f}-eva{-{gut,fun,bad},{wds,dic,frq},.txt}. [Note-001.txt]
    
    Eventually I decided that it was necessary to map the data 
    to a reduced alphabet (ERA), identifying similar letters:
    both to reduce transcription and sampling noise, and to 
    make the results more manageable. Accordingly,
    I created files bio-{c,f}-era-gut.{wds,dic} [Note-003.txt]

    After some ad-hoc hacking, I tentatively identified a paradigm
    which consists of 156 prefixes combined with 219 suffixes;
    the latter are maximal strings of the form [edoirlmn]*. [Note-005.txt]
    
97-11-11 stolfi
===============
  
    It seems that we have too many suffixes and too few prefixes.
    Probably, attaching the "e"s to the suffix is a mistake.
    For one thing, some "e"s come from 'c[pftk]h' gallows,
    which should be in the prefix. Also, the 'e's are always
    at the left end of the suffixes.  
    
    So let's redo Note-005.txt, with the "e"s in the prefix.
    I.e. a suffix is now a maximal terminal string of the 
    form /[doirlmn]*/
    
    cat bio-f-era-gut.wds \
      | sed -e 's:\([doirlmn]*\)$:- -\1:' \
      > Note-006/.factored

    dicio-wc bio-f-era-gut.wds Note-006/.factored
  
     lines   words     bytes file        
    ------ ------- --------- ------------
      6166    6166     34893 bio-f-era-gut.wds
      6166   12332     53391 Note-006/.factored

  Now, let's collect the prefixes and suffixes:
  
    cat Note-006/.factored \
      | gawk '/./ {print $1}' \
      | revbytes | sort | uniq | revbytes \
      > Note-006/.prefs-all.dic

    cat Note-006/.factored \
      | gawk '/./ {print $2}' \
      | sort | uniq \
      > Note-006/.suffs-all.dic

    dicio-wc Note-006/.prefs-all.dic Note-006/.suffs-all.dic
  
     lines   words     bytes file        
    ------ ------- --------- ------------
       267     267      1858 Note-006/.prefs-all.dic
       147     147       901 Note-006/.suffs-all.dic

  Great. Now let's count their occurrences and list the most important:

    cat Note-006/.factored \
      | gawk '/./ {print $1}' \
      | revbytes | sort | revbytes | uniq -c | expand \
      | sort +0 -1nr \
      > Note-006/.prefs-all.frq

  The 27 most important prefixes (at least 30 occurrences), 
  accounting for 5449 words

    -(1510) ok-(923) che-(748) oke-(414) okee-(383) chee-(147)
    ch-(139) k-(123) lche-(116) cheke-(113) olche-(88) olk-(80)
    cheeke-(77) ke-(62) okch-(54) chek-(53) okche-(52) olkee-(44)
    dche-(43) rche-(42) kee-(41) kche-(37) oche-(35) chekee-(32)
    pche-(32) opche-(31) olke-(30)

  The next 64 prefixes (less than 30 but at least 3 occurrences),
  accounting for another 507 words:

    cheek-(24) p-(24) chk-(22) lk-(20) lch-(19) kch-(16) olch-(16)
    eke-(15) ekee-(14) op-(14) lchee-(13) ochee-(13) pch-(13)
    rolche-(12) dch-(11) olchee-(11) lkee-(10) oekee-(10) rolkee-(10)
    cheekee-(9) rch-(9) oee-(8) oeke-(8) ofche-(8) okech-(8) opch-(8)
    chok-(7) dchee-(7) lke-(7) okeee-(7) chke-(6) ek-(6) polch-(6)
    rchee-(6) dok-(5) ee-(5) epee-(5) lcheeke-(5) polche-(5) rolk-(5)
    cheok-(4) doke-(4) epe-(4) okeche-(4) olkch-(4) olkche-(4)
    rchek-(4) rk-(4) chepche-(3) chepee-(3) dee-(3) dokee-(3)
    dolch-(3) dolche-(3) eee-(3) fche-(3) kok-(3) kolch-(3)
    lcheekee-(3) lkche-(3) olok-(3) orche-(3) rok-(3) rokee-(3)

  The other 176 prefixes (less than 3 occurrences), accounting for 
  another 210 words:
    
    chech-(2) chedche-(2) cheepe-(2) cheepee-(2) chep-(2) chepch-(2)
    de-(2) dke-(2) dlche-(2) e-(2) keee-(2) lchek-(2) lcheke-(2)
    lpche-(2) odche-(2) odee-(2) oeee-(2) of-(2) okede-(2) okolch-(2)
    okolche-(2) okolk-(2) olcheke-(2) olfche-(2) olkech-(2) olkeee-(2)
    olpche-(2) ook-(2) opchee-(2) opolk-(2) orch-(2) rcheke-(2)
    rolch-(2) rolke-(2) chch-(1) chedch-(1) chedok-(1) cheee-(1)
    cheeeke-(1) cheeekee-(1) cheekch-(1) cheekeedch-(1) chekeee-(1)
    cheolch-(1) chf-(1) chfee-(1) chkch-(1) chkee-(1) chlchpchee-(1)
    choe-(1) choeke-(1) choepee-(1) chokch-(1) chokee-(1) cholche-(1)
    cholkeee-(1) chop-(1) chpche-(1) dcheeke-(1) dchekee-(1) dchok-(1)
    deee-(1) dk-(1) dkche-(1) dkee-(1) doe-(1) dokch-(1) dolchee-(1)
    dolfche-(1) dolk-(1) dolke-(1) dolkeee-(1) dorch-(1) dorchee-(1)
    eedee-(1) efche-(1) efe-(1) efeee-(1) ekch-(1) ep-(1) epch-(1)
    epche-(1) fch-(1) kchdolk-(1) kchee-(1) kchek-(1) kcheokee-(1)
    kech-(1) keche-(1) keoe-(1) keoke-(1) koekee-(1) kolke-(1)
    korch-(1) korolch-(1) lcheek-(1) lchepe-(1) lchepee-(1) ldche-(1)
    lf-(1) lkede-(1) loche-(1) loee-(1) lok-(1) lokee-(1) lolk-(1)
    lolke-(1) lpch-(1) och-(1) ochche-(1) ocheeke-(1) ocheekee-(1)
    ocheke-(1) ochep-(1) oddche-(1) odoke-(1) odorche-(1) oeekee-(1)
    oep-(1) ofch-(1) okchok-(1) okecheke-(1) okedee-(1) okeech-(1)
    okeeolche-(1) okeolch-(1) okoch-(1) okok-(1) okook-(1) okop-(1)
    olcheeke-(1) olchk-(1) olee-(1) oleee-(1) oleere-(1) oleke-(1)
    olkeche-(1) olkeeoche-(1) ollch-(1) oloefe-(1) olokche-(1)
    oloke-(1) olokee-(1) ololche-(1) ololkee-(1) olpoeke-(1) ooche-(1)
    opolche-(1) orchee-(1) ork-(1) orok-(1) oroke-(1) pchee-(1)
    pchefe-(1) pdolch-(1) pe-(1) pok-(1) poke-(1) poldche-(1)
    poldok-(1) polkech-(1) polkee-(1) porche-(1) prche-(1) rcheeke-(1)
    rchkch-(1) rekee-(1) reok-(1) rkch-(1) rkee-(1) roeke-(1)
    roekee-(1) rokchee-(1) rolchk-(1) rolkche-(1) rpch-(1)

  Now for the suffixes:

    cat Note-006/.factored \
      | gawk '/./ {print $2}' \
      | sort | uniq -c | expand \
      | sort +0 -1nr \
      > Note-006/.suffs-all.frq

  The 24 most significant suffixes (at least 20 occurrences),
  accounting for 5796 words:
  
    -do(1781) -o(1275) -ol(828) -oin(564) -or(315) -dol(131)
    -doin(129) -dor(104) -rol(96) -roin(86) -r(70) -olo(69) -d(53)
    -ror(47) -oldo(36) -oro(27) -lol(26) -om(25) -olor(24) -ro(24)
    -l(23) -olol(23) -(20) -orol(20)

  The 33 intermediate-frequency ones (less than 20,
  at least 3 occurrences), accounting for another 264 words:

    -odo(19) -in(18) -lo(15) -m(15) -oloin(15) -lor(14) -dom(13)
    -oir(13) -dolo(11) -oroin(11) -ldo(9) -oror(9) -doir(8) -doldo(7)
    -doro(7) -lr(6) -odoin(6) -rdo(6) -dl(5) -lolo(5) -olr(5) -oo(5)
    -orolo(5) -rolo(5) -rom(5) -loin(4) -olom(4) -on(4) -n(3) -ool(3)
    -oor(3) -roir(3) -rorol(3)

  The 90 least significant ones (less than 3 occurrences), accounting for 
  only 106 words:
  
    -dolol(2) -dolor(2) -dool(2) -dorodo(2) -dororo(2) -ldoin(2)
    -ldol(2) -lom(2) -loroin(2) -odol(2) -odor(2) -oil(2) -old(2)
    -ololo(2) -olorol(2) -orom(2) -doil(1) -doindo(1) -doinl(1)
    -doirodo(1) -doirol(1) -doiroldo(1) -dolord(1) -doloro(1) -door(1)
    -dordo(1) -doroin(1) -dorol(1) -dorom(1) -doror(1) -dororom(1)
    -dr(1) -drol(1) -ino(1) -ld(1) -lddo(1) -ldolor(1) -ldor(1) -ll(1)
    -lldor(1) -lod(1) -loinm(1) -loldo(1) -lolom(1) -lolor(1)
    -lorol(1) -lroiror(1) -lron(1) -lror(1) -nl(1) -od(1) -odoirol(1)
    -odorol(1) -oinolo(1) -olddo(1) -oldoir(1) -oldol(1) -ollom(1)
    -olod(1) -oloino(1) -oloir(1) -ololdo(1) -olordo(1) -oloro(1)
    -oloroin(1) -oloror(1) -olro(1) -olrolo(1) -ooin(1) -oolor(1)
    -ooo(1) -ooon(1) -ordo(1) -orodl(1) -orodo(1) -oroir(1) -orolom(1)
    -orolr(1) -ororo(1) -ororor(1) -rl(1) -rlr(1) -rodor(1) -roino(1)
    -roirol(1) -roldo(1) -rolor(1) -roro(1) -roroin(1) -roror(1)

  Let's compare the suffixes of a few common prefixes:
  
    set tfiles = ( )
    set totw = 0
    set sufw = 7
    set digs = 3
    echo " "
    foreach  f ( k ok ke oke che kee okee chee lche ch chek )
    
      set g = "Note-006/.suffs-${f}.frq"
      echo "$f-"
      /bin/rm -f ${g}
      echo "frq" "$f-" \
        | gawk '/./ {printf "%'"${digs}"'s %-'"${sufw}"'s\n", $1, $2}' \
        >> ${g}
      echo "--------------" "--------------" \
        | gawk '/./ {printf "%.'"${digs}"'s %.'"${sufw}"'s\n", $1, $2}' \
        >> ${g}
      cat Note-006/.factored \
        | egrep '^'"${f}"'-' \
        | gawk '/./ {print $2}' \
        | revbytes | sort | revbytes | uniq -c \
        | gawk '/./ {printf "%'"${digs}"'d %s\n", $1, $2}' \
        | sort +0 -1nr \
        >> ${g}
        
      @ totw = ${totw} + ${digs} + 1 + ${sufw} + 1
      set tfiles = ( ${tfiles} ${g} )
    
    end
    
    pr -m -s' ' -t -i' '1 -w ${totw} ${tfiles} \
      | expand \
      > Note-006/prefs-cmp.txt

  It seems that "k-" and "ok-" have a somewhat different "conjugation" than 
  most other prefixes: 

    frq k-      frq ok-    
    --- ------- --- -------
     37 -oin    387 -oin   
     35 -ol     225 -ol    
     19 -or     118 -o     
      8 -o      111 -or    
      4 -om      13 -olo   
      4 -oro      8 -oldo  
      3 -oldo     6 -oir   
      2 -odo      6 -om    
    ... ...     ... ...    

    frq ke-     frq oke-    frq che-    frq lche-   frq kee-    frq okee-  
    --- ------- --- ------- --- ------- --- ------- --- ------- --- -------
     47 -do     305 -do     422 -do      72 -do      25 -do     242 -do    
      7 -ol      69 -o      169 -o       27 -o        9 -o      118 -o     
      3 -o        9 -dor     65 -ol       5 -ol       3 -d        7 -d     
      1 -dol      8 -ol      28 -or       4 -d        3 -ol       5 -r     
      1 -dool     6 -or      13 -dol      1 -dol      1 -dom      3 -ol    
      1 -dor      3 -d       11 -r        1 -dom                  2 -dor   
      1 -olo      3 -dol     10 -doin     1 -dor                  2 -ro    
      1 -or       2 -doin     8 -d        1 -doro                 1 -dol   
    ... ...     ... ...     ... ...     ... ...     ... ...     ... ...    

    frq chee-   frq ch-     frq chek-   
    --- ------- --- ------- --- ------- 
     66 -o       41 -ol      35 -o      
     62 -do      20 -do       5 -       
      6 -ol      20 -o        5 -or     
      6 -r       16 -or       4 -oin    
      2 -n        6 -l        2 -om     
      1 -d        4 -dor      1 -ol     
      1 -dor      4 -r        1 -r      
      1 -oin      3 -oldo               
    ... ...     ... ...     ... ...     ... ...     ... ...     ... ...    

  OK, let's generate a table with the main prefixes and suffixes:
  
    cat Note-006/.prefs-all.frq \
      | sort +0 -1nr \
      | gawk '($1 >= 3) {print $2}' \
      | revbytes | sort | revbytes \
      > Note-006/.prefs-top.dic
    
    cat Note-006/.suffs-all.frq \
      | sort +0 -1nr \
      | gawk '($1 >= 20) {print $2}' \
      > Note-006/.suffs-top.dic
    
    dicio-wc Note-006/.prefs-top.dic Note-006/.suffs-top.dic
    
     lines   words     bytes file        
    ------ ------- --------- ------------
        91      91       558 Note-006/.prefs-top.dic
        24      24       110 Note-006/.suffs-top.dic

    cat Note-006/.factored \
      | count-diword-freqs \
          -v rows=Note-006/.prefs-top.dic \
          -v cols=Note-006/.suffs-top.dic \
          -v digits=3 \
      > Note-006/pref-suff-wds-table.txt
      
    cat Note-006/.factored \
      | fgrep -w -f Note-006/.prefs-top.dic \
      | fgrep -w -f Note-006/.suffs-top.dic \
      | wc

  This set of prefixes and suffixes covers 5867 of the 6166 
  original words (95.1%)!
      
  Let's compute the corresponding numbers without taking word 
  repetitions into account:
  
    cat bio-f-era-gut.dic \
      | sed -e 's:\([doirlmn]*\)$:- -\1:' \
      | egrep -e '- -' \
      > Note-006/.dic-factored
  
    dicio-wc bio-f-era-gut.dic Note-006/.dic-factored
    
     lines   words     bytes file        
    ------ ------- --------- ------------
       763     763      5164 bio-f-era-gut.dic
       763    1526      7453 Note-006/.dic-factored
    
  Let's get again the prefixes and suffixes (should not change):
  
    cat Note-006/.dic-factored \
      | gawk '/./ {print $1}' \
      | revbytes | sort | uniq | revbytes \
      > Note-006/.prefs-all.dic

    cat Note-006/.dic-factored \
      | gawk '/./ {print $2}' \
      | sort | uniq \
      > Note-006/.suffs-all.dic

    dicio-wc Note-006/.prefs-all.dic Note-006/.suffs-all.dic

     lines   words     bytes file        
    ------ ------- --------- ------------
       267     267      1858 Note-006/.prefs-all.dic
       147     147       901 Note-006/.suffs-all.dic

    cat Note-006/.dic-factored \
      | gawk '/./ {print $1}' \
      | revbytes | sort | revbytes | uniq -c | expand \
      | sort +0 -1nr \
      > Note-006/.prefs-all.frq
      
    cat Note-006/.dic-factored \
      | gawk '/./ {print $2}' \
      | sort | uniq -c | expand \
      | sort +0 -1nr \
      > Note-006/.suffs-all.frq
      
  Here are the prefixes, with number of *distinct* words using them:
  
  The 98 prefs with at least two occurrences, accounting for
  594 distinct words:
  
    -(123) ok-(32) ch-(25) che-(22) oke-(16) k-(15) lche-(12)
    okee-(11) chee-(10) olk-(10) p-(10) cheke-(9) olche-(9) ke-(8)
    lk-(8) oche-(8) olkee-(8) rche-(8) chek-(7) chk-(6) eke-(6)
    lch-(6) okch-(6) op-(6) pche-(6) dch-(5) dchee-(5) kch-(5) kee-(5)
    lchee-(5) opch-(5) opche-(5) pch-(5) rch-(5) dche-(4) kche-(4)
    ochee-(4) oekee-(4) okech-(4) olke-(4) polch-(4) rchee-(4)
    rolche-(4) cheeke-(3) chekee-(3) chok-(3) ee-(3) ek-(3) ekee-(3)
    epe-(3) kolch-(3) lkee-(3) oeke-(3) ofche-(3) okche-(3) olchee-(3)
    polche-(3) rk-(3) rolk-(3) chech-(2) chedche-(2) cheekee-(2)
    chep-(2) chepche-(2) chke-(2) dee-(2) dlche-(2) dok-(2) doke-(2)
    dokee-(2) dolch-(2) dolche-(2) e-(2) eee-(2) epee-(2) kok-(2)
    lcheekee-(2) lkche-(2) lke-(2) lpche-(2) odche-(2) odee-(2)
    oee-(2) of-(2) okeche-(2) okede-(2) okeee-(2) okolch-(2) okolk-(2)
    olch-(2) olkch-(2) olkche-(2) olkech-(2) olok-(2) rok-(2)
    rolch-(2) rolke-(2) rolkee-(2)
  
  The 169 prefixes that account for only one word each:
  
    chch-(1) chedch-(1) chedok-(1) cheee-(1) cheeeke-(1) cheeekee-(1)
    cheek-(1) cheekch-(1) cheekeedch-(1) cheepe-(1) cheepee-(1)
    chekeee-(1) cheok-(1) cheolch-(1) chepch-(1) chepee-(1) chf-(1)
    chfee-(1) chkch-(1) chkee-(1) chlchpchee-(1) choe-(1) choeke-(1)
    choepee-(1) chokch-(1) chokee-(1) cholche-(1) cholkeee-(1)
    chop-(1) chpche-(1) dcheeke-(1) dchekee-(1) dchok-(1) de-(1)
    deee-(1) dk-(1) dkche-(1) dke-(1) dkee-(1) doe-(1) dokch-(1)
    dolchee-(1) dolfche-(1) dolk-(1) dolke-(1) dolkeee-(1) dorch-(1)
    dorchee-(1) eedee-(1) efche-(1) efe-(1) efeee-(1) ekch-(1) ep-(1)
    epch-(1) epche-(1) fch-(1) fche-(1) kchdolk-(1) kchee-(1)
    kchek-(1) kcheokee-(1) kech-(1) keche-(1) keee-(1) keoe-(1)
    keoke-(1) koekee-(1) kolke-(1) korch-(1) korolch-(1) lcheek-(1)
    lcheeke-(1) lchek-(1) lcheke-(1) lchepe-(1) lchepee-(1) ldche-(1)
    lf-(1) lkede-(1) loche-(1) loee-(1) lok-(1) lokee-(1) lolk-(1)
    lolke-(1) lpch-(1) och-(1) ochche-(1) ocheeke-(1) ocheekee-(1)
    ocheke-(1) ochep-(1) oddche-(1) odoke-(1) odorche-(1) oeee-(1)
    oeekee-(1) oep-(1) ofch-(1) okchok-(1) okecheke-(1) okedee-(1)
    okeech-(1) okeeolche-(1) okeolch-(1) okoch-(1) okok-(1)
    okolche-(1) okook-(1) okop-(1) olcheeke-(1) olcheke-(1) olchk-(1)
    olee-(1) oleee-(1) oleere-(1) oleke-(1) olfche-(1) olkeche-(1)
    olkeee-(1) olkeeoche-(1) ollch-(1) oloefe-(1) olokche-(1)
    oloke-(1) olokee-(1) ololche-(1) ololkee-(1) olpche-(1)
    olpoeke-(1) ooche-(1) ook-(1) opchee-(1) opolche-(1) opolk-(1)
    orch-(1) orche-(1) orchee-(1) ork-(1) orok-(1) oroke-(1) pchee-(1)
    pchefe-(1) pdolch-(1) pe-(1) pok-(1) poke-(1) poldche-(1)
    poldok-(1) polkech-(1) polkee-(1) porche-(1) prche-(1) rcheeke-(1)
    rchek-(1) rcheke-(1) rchkch-(1) rekee-(1) reok-(1) rkch-(1)
    rkee-(1) roeke-(1) roekee-(1) rokchee-(1) rokee-(1) rolchk-(1)
    rolkche-(1) rpch-(1)

  Now, the top 22 suffixes, accounting for 595 distinct words:

    -o(165) -do(130) -ol(56) -or(39) -oin(27) -d(26) -r(19) -dol(17)
    -dor(15) -(14) -olo(12) -om(11) -oldo(10) -l(8) -odo(8) -oro(7)
    -ro(6) -dom(5) -m(5) -oir(5) -olor(5) -orol(5)

  The 30 suffixes with less than 5 but at least 2 distinct prefixes,
  accounting for 73 distinct words:
  
    -doin(4) -oo(4) -rdo(4) -doir(3) -doldo(3) -oloin(3) -olol(3)
    -olr(3) -oroin(3) -rol(3) -dolo(2) -dool(2) -doro(2) -ldo(2)
    -ldoin(2) -lo(2) -loin(2) -lol(2) -lor(2) -lr(2) -n(2) -odoin(2)
    -odol(2) -old(2) -ololo(2) -olom(2) -oor(2) -orolo(2) -orom(2)
    -oror(2)
  
  The 95 suffixes that occur with only one prefix:
  
    -dl(1) -doil(1) -doindo(1) -doinl(1) -doirodo(1) -doirol(1)
    -doiroldo(1) -dolol(1) -dolor(1) -dolord(1) -doloro(1) -door(1)
    -dordo(1) -dorodo(1) -doroin(1) -dorol(1) -dorom(1) -doror(1)
    -dororo(1) -dororom(1) -dr(1) -drol(1) -in(1) -ino(1) -ld(1)
    -lddo(1) -ldol(1) -ldolor(1) -ldor(1) -ll(1) -lldor(1) -lod(1)
    -loinm(1) -loldo(1) -lolo(1) -lolom(1) -lolor(1) -lom(1)
    -loroin(1) -lorol(1) -lroiror(1) -lron(1) -lror(1) -nl(1) -od(1)
    -odoirol(1) -odor(1) -odorol(1) -oil(1) -oinolo(1) -olddo(1)
    -oldoir(1) -oldol(1) -ollom(1) -olod(1) -oloino(1) -oloir(1)
    -ololdo(1) -olordo(1) -oloro(1) -oloroin(1) -olorol(1) -oloror(1)
    -olro(1) -olrolo(1) -on(1) -ooin(1) -ool(1) -oolor(1) -ooo(1)
    -ooon(1) -ordo(1) -orodl(1) -orodo(1) -oroir(1) -orolom(1)
    -orolr(1) -ororo(1) -ororor(1) -rl(1) -rlr(1) -rodor(1) -roin(1)
    -roino(1) -roir(1) -roirol(1) -roldo(1) -rolo(1) -rolor(1) -rom(1)
    -ror(1) -roro(1) -roroin(1) -rorol(1) -roror(1)
  
    cat Note-006/.prefs-all.frq \
      | sort +0 -1nr \
      | gawk '($1 >= 2) {print $2}' \
      | revbytes | sort | revbytes \
      > Note-006/.prefs-top.dic
    
    cat Note-006/.suffs-all.frq \
      | sort +0 -1nr \
      | gawk '($1 >= 5) {print $2}' \
      > Note-006/.suffs-top.dic
    
    dicio-wc Note-006/.prefs-top.dic Note-006/.suffs-top.dic
    
     lines   words     bytes file        
    ------ ------- --------- ------------
        98      98       600 Note-006/.prefs-top.dic
        22      22        95 Note-006/.suffs-top.dic

    cat Note-006/.dic-factored \
      | count-diword-freqs \
          -v rows=Note-006/.prefs-top.dic \
          -v cols=Note-006/.suffs-top.dic \
          -v digits=3 \
      > Note-006/pref-suff-dic-table.txt
      
    cat Note-006/.dic-factored \
      | fgrep -w -f Note-006/.prefs-top.dic \
      | fgrep -w -f Note-006/.suffs-top.dic \
      | wc

  These prefixes and suffixes account for 534 out of the 
  original 763 words (70%).  Not terribly impressive,
  but still...
  
  Perhaps we are being too greedy, and including in the 
  suffix things that belong in the prefix.