Hacking at the Voynich manuscript - Side notes
057 Statistics of crust-mantle-core components

Last edited on 2000-10-22 12:22:12 by stolfi

INTRODUCTION

  This note analyzes the statistics of the
  `multiple layer' (crust/mantle/core) components
  of Voynichese words
  
  [ Should be merged with some previous notes. ]
  
  [ Redone on 2000-10-05 to exclude the letter <v>
    (which occurs only in cos.1, mostly alone),
    and weird combinations such as <cthh>, <ith>,
    <cs>, etc. (which are very few, and just as
    weird as other weirdos). ]
  
THE DATA

  The source data will be majority edition derived from interlinear
  release 1.6e6, already chopped into pages and sections:

    ln -s ../045/subsecs-m text-subsecs

    ln -s ../019/unit-to-type.tbl

    ln -s ../../columnate
    ln -s ../../combine-counts
    ln -s ../../compare-counts
    ln -s ../../compute-cum-freqs
    ln -s ../../compute-freqs
    ln -s ../../count-digraph-freqs
    ln -s ../../count-diword-freqs
    ln -s ../../count-word-lengths
    ln -s ../../select-units
    ln -s ../../tabulate-frequencies
    ln -s ../../words-from-evt
    
  Get (sub)section names:

    set secs = ( `cat text-subsecs/all.names` )
    set secscm = `echo ${secs} | tr ' ' ','`

  Paper directory:

    set trdir = "/home/staff/stolfi/papers/voynich-words/techrep"

  Extract text words and label words, separately, for each section:

    mkdir data-raw
    mkdir data-raw/words data-raw/labels

    foreach sec ( ${secs} )
      echo ${sec}
      cat text-subsecs/${sec}.evt \
        | select-units \
            -v types='parags,starred-parags,circular-lines,circular-text,radial-lines,titles' \
            -v table=unit-to-type.tbl \
        | words-from-evt \
        > data-raw/words/${sec}.wds
    end

    foreach sec ( ${secs} )
      echo ${sec}
      cat text-subsecs/${sec}.evt \
        | select-units \
            -v types='labels,words' \
            -v table=unit-to-type.tbl \
        | words-from-evt \
        > data-raw/labels/${sec}.wds
    end
    
  Separate the good and bad words, for each good section:

    mkdir data
    mkdir data/{words,labels}

    mkdir data-bad
    mkdir data-bad/{words,labels}

    foreach f ( words labels )
      foreach sec ( ${secs} )
        echo ${sec}
        cat data-raw/$f/${sec}.wds \
          | condense-valid-groups \
          | egrep -v '[^CESTKPFadefiklmnopqrsty]' \
          | expand-valid-groups \
          > data/$f/${sec}.wds
        cat data-raw/$f/${sec}.wds \
          | condense-valid-groups \
          | egrep '[^CESTKPFadefiklmnopqrsty]' \
          | expand-valid-groups \
          > data-bad/$f/${sec}.wds
      end
    end
    
  Copy section names to handy places:

    foreach dir ( data-raw data data-bad )
      foreach f ( words labels )
        cp -av text-subsecs/all.names ${dir}/${f}/all.names
        cat text-subsecs/all.names \
          | grep -v 'unk' \
          > ${dir}/${f}/.ok.names
      end
    end

BASIC CHARACTER PAIR FREQUENCIES

    foreach f ( words labels )
      cat data/$f/*.wds \
        | condense-valid-groups \
        | sed -e 's/^/_/g' -e 's/$/_/g' \
        | count-digraph-freqs \
            -v pad='_' \
            -v chars='_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' \
            -v showentropy=1 \
            -v showstrangeness=1 \
        > data/$f/.digraphs.tbl
    end

ANALYZING CORRELATIONS BETWEEN VARIOUS PARTS OF THE WORD

  In what follows we analyze the frequency and correaltions
  between various word components (precrust, premantle,
  core, sufmantle, sufcrust).

PREPARING A TEST SAMPLE

  Generate a word sample:

    set sec = "bio.1"
    
    echo "data/*/*.evt -> .sample.wds"
    /bin/rm -f .sample.wds
    foreach f ( words labels )
      foreach sec ( `cat data/${f}/all.names` )
        cat data/${f}/${sec}.wds \
          | head -100 \
          >> .sample.wds
      end
    end
    cat .sample.wds \
      | sort | uniq \
      | gawk '/./{printf "%7.5f %s\n", rand(), $0;}' \
      | sort -b +0 -1g \
      > .xxx
    cat .xxx | gawk '/./{print $2;}' > .sample.wds

TESTING THE WORD ANALYSIS SCRIPTS

  Testing the element-factoring script:
  
    echo ".sample.wds -> .sample.els"
    cat .sample.wds \
      | factor-words -f factor-text.gawk \
          -v hicsmash=1 -v esplit=1 \
      > .sample.els
    head -10 .sample.els
    
      {_}{a}{d}{_}{e}{_}{e}{o}{d}{y}
      {_}{_}{k}{_}{e}{_}{sh}{_}{e}{y}
      {_}{_}{p}{_}{ch}{_}{e}{_}{d}{y}
      {_}{o}{f}{a}{in}{_}
      {_}{y}{t}{_}{e}{_}{d}{y}
      {_}{_}{ch}{_}{cth}{o}{d}{y}
      {_}{_}{sh}{_}{e}{_}{e}{o}{r}{_}
      {_}{y}{t}{a}{in}{_}
      {_}{_}{s}{a}{in}{o}
      {_}{_}{ch}{_}{e}{_}{t}{y}
      ...
  
    cat .sample.wds \
      | factor-words -f factor-text.gawk \
          -v eelump=1 \
      > .sample.els
    head -10 .sample.els

      {_}{a}{d}{_}{ee}{o}{d}{y}
      {_}{_}{ke}{_}{she}{y}
      {_}{_}{p}{_}{che}{_}{d}{y}
      {_}{o}{f}{a}{in}{_}
      {_}{y}{te}{_}{d}{y}
      {_}{_}{ch}{_}{cth}{o}{d}{y}
      {_}{_}{sh}{_}{ee}{o}{r}{_}
      {_}{y}{t}{a}{in}{_}
      {_}{_}{s}{a}{in}{o}
      {_}{_}{che}{_}{t}{y}
      ...

  Testing the component-splitting script:
  
    echo ".sample.wds -> .sample.wsp"
    mv .sample.wsp .sample.wsp~
    cat .sample.wds \
      | factor-words -f factor-text.gawk -v eelump=1 \
      | split-words  -v omods=0 \
      > .sample.wsp
    diff .sample.wsp~ .sample.wsp
    head -10 .sample.wsp
    
      {a}{d}({ee}){o}{d}{y}
      (<{ke}>{she}){y}
      (<{p}>{che}){d}{y}
      {o}(<{f}>){a}{in}
      {y}(<{te}>){d}{y}
      ({ch}<{cth}>){o}{d}{y}
      ({sh}{ee}){o}{r}
      {y}(<{t}>){a}{in}
      {s}{a}{in}{o}
      ({che}<{t}>){y}
      ...
      
  Testing effect of "omods=1":

    echo ".sample.wds -> .sample.wsp"
    mv .sample-o.wsp .sample-o.wsp~
    cat .sample.wds \
      | factor-words -f factor-text.gawk -v eelump=1 \
      | split-words  -v omods=1 \
      > .sample-o.wsp
    diff .sample-o.wsp~ .sample-o.wsp
    head -10 .sample.wsp
    diff .sample{,-o}.wsp | prettify-diff-output
    
      --- 14c14 -----------------------
      < ({ch}){o}(<{t}>){o}{l}

      > ({ch}{o}<{t}>){o}{l}

      --- 35c35 -----------------------
      < ({ch}){o}(<{k}>{ch}){y}

      > ({ch}{o}<{k}>{ch}){y}

      --- 71c71 -----------------------
      < {o}(<{t}>){o}(<{k}>){o}{l}

      > {o}(<{t}{o}><{k}>){o}{l}

      --- 94c94 -----------------------
      < ({che}){o}(<{k}>{che}<{t}>)

      > ({che}{o}<{k}>{che}<{t}>)

      --- 102c102 -----------------------
      < ({ch}){o}(<{t}>{sh}){o}{l}

      > ({ch}{o}<{t}>{sh}){o}{l}

      --- 117c117 -----------------------
      < ({sh}){o}({sh}){y}

      > ({sh}{o}{sh}){y}

      ...

  OK, let's create the split-word statistics per section. The option
  settings below are appropriate for model building (see Note 058).

    foreach f ( words labels )
      foreach sec ( `cat data/${f}/all.names` ) 
        set ifile = "data/${f}/${sec}.wds"
        set ofile = "data/${f}/${sec}.wsp"
        echo "${ifile} -> ${ofile}"
        mv ${ofile} ${ofile}~
        cat ${ifile} \
          | factor-words -f factor-text.gawk -v eelump=1 \
          | split-words -v omods=1 \
          > ${ofile}
        diff ${ofile}~ ${ofile} \
          | prettify-diff-output \
          > data/${f}/${sec}.diffs
      end
      (cd data/${f} && dicio-wc *.wsp )
      (cd data/${f} && dicio-wc *.diffs )
    end
    
      lines   words     bytes file        
    ------- ------- --------- ------------
       6555    6555    104974 bio.1.wsp
        146     146      1295 cos.1.wsp
       1353    1353     21389 cos.2.wsp
        713     713     10885 cos.3.wsp
       6703    6703    103030 hea.1.wsp
        823     823     13194 hea.2.wsp
       2820    2820     44853 heb.1.wsp
        510     510      8163 heb.2.wsp
        858     858     12969 pha.1.wsp
       1309    1309     20578 pha.2.wsp
        670     670     10955 str.1.wsp
      10097   10097    170197 str.2.wsp
        202     202      3012 unk.1.wsp
        134     134      2216 unk.2.wsp
         44      44       687 unk.3.wsp
        292     292      4927 unk.4.wsp
        309     309      5147 unk.5.wsp
        431     431      7067 unk.6.wsp
        357     357      5533 unk.7.wsp
          0       0         0 unk.8.wsp
        701     701     10843 zod.1.wsp

      lines   words     bytes file        
    ------- ------- --------- ------------
        142     142      2530 bio.1.wsp
          9       9       223 cos.1.wsp
        237     237      4164 cos.2.wsp
         82      82      1551 cos.3.wsp
          1       1        21 hea.1.wsp
          0       0         0 hea.2.wsp
          0       0         0 heb.1.wsp
          0       0         0 heb.2.wsp
         86      86      1696 pha.1.wsp
        143     143      2665 pha.2.wsp
          0       0         0 str.1.wsp
          0       0         0 str.2.wsp
          0       0         0 unk.1.wsp
          0       0         0 unk.2.wsp
          0       0         0 unk.3.wsp
         14      14       219 unk.4.wsp
          0       0         0 unk.5.wsp
          0       0         0 unk.6.wsp
          0       0         0 unk.7.wsp
          2       2        27 unk.8.wsp
        287     287      5389 zod.1.wsp

  Consistency check:
  
    foreach f ( words labels )
      foreach sec ( ${secs} ) 
        echo ${f}/${sec}
        cat data/${f}/${sec}.wsp | tr -d '()<>{}' > .tmp
        diff .tmp data/${f}/${sec}.wds
      end
    end
    
    /bin/rm -f data/{words,labels}/*.diffs
  
EXTRACTING COMPONENT STATISTICS

  Usually we consider only words that have a simple structure (at most
  one core component and at most one coremantle component.) The
  non-simple words are tabulated separately.
  
  Here are the components considered, and their brief explanations:

    set comps = ( pmcns mcn c  p s  m n pm ns )
    
      tag      word subset                 counted component(s)
      -------  -------------------------   ---------------------------------
      pmcns    simple, all words           entire word.       
                                                      
      mcn      simple, all words           core+mantle.
                                           
      c        simple, all words           core
       
      p        simple, w/ core or mantle.  crust prefix.
                                           
      s        simple, w/ core or mantle.  crust suffix.
                                           
      m        simple, w/ core             mantle prefix.   
                                                      
      n        simple, w/ core             mantle suffix.   
                                                             
      pm       simple, w/ core             crust+mantle prefix
      
      ns       simple, w/ core             crust+mantle suffix
      
  Testing the component extractor:

    echo ".sample.wds -> .sample-o.wsp"
    cat .sample.wds \
      | factor-words -f factor-text.gawk -v eelump=1 \
      | split-words -v omods=1 \
    > .sample-o.wsp
    foreach item ( ${comps} )
      echo ".sample.wsp -> .sample-${item}.pct"
      cat .sample-o.wsp \
        | extract-components \
            -f get-components.gawk \
            -v select=${item} \
        > .sample-${item}.pct
    end
  
  Testing the simple/nonsimple separation:
  
    cat .sample-o.wsp \
      | select-simple-words -v complex=0 \
      > .simple.wsp
      
    cat .sample-o.wsp \
      | select-simple-words -v complex=1 \
      > .nonsimple.wsp
      
    diff .sample-o.wsp .simple.wsp \
      | sed -e '/^[^<]/d' -e 's/< //' \
      > .notsimple.wsp
      
    diff .nonsimple.wsp .notsimple.wsp

    dicio-wc .sample-o.wsp .simple.wsp .nonsimple.wsp .notsimple.wsp

  Let's create the component item statistics per section:

    mkdir stats/words stats/labels
    
    foreach ctag ( ${comps} )
      analyze-components ${ctag}
    end
  
EXTRACTING COMPONENT PAIR STATISTICS

  We next gather statistics about pairs of components.  Here are
  the pairs we consider:

    set pairs = ( k-w tc-y tf-z tw-w pm-c p-mcn c-ns mcn-s p-s pm-ns m-n )
    
      tag      word subset                left component       right component
      -------  -------------------------  -------------------  --------------------
      pm-c     simple, w/ core            crust+mantle prefix  core.
         
      c-ns     simple, w/ core            core                 mantle+crust suffix.
                                                               
      p-mcn    simple, w/ core or mantle  crust prefix         complete mantle+core.
                                                               
      mcn-s    simple, w/ core or mantle  complete mantle+core crust suffix.
                                                               
      pm-ns:   simple, w/ core            crust+mantle prefix  mantle+crust suffix.
                                                               
      m-n:     simple, w/ core            mantle prefix        mantle suffix.
                                                               
      p-s:     simple, w/ core or mantle  crust prefix         crust suffix.
                                                               
  The following pairs are a bit special:                     
                                                               
      tag      word subset                left component       right component
      -------  -------------------------  -------------------  --------------------
      tc-y:    all simple                 type of component    coarse component.
                                                               
      tf-z:    all simple                 type of component    fine component.
                                                               
      tw-w:    all simple                 type of word         the word.
                                                               
      k-w:     all non-simple             number of "peaks"    the word.
    
    The `fine components' of a word are the crust prefix, mantle
    prefix, core, mantle suffix, and crust suffix, whose `types' are
    the letters "p", "m", "c", "n", "s", respectively. When the core
    is empty, the mantle components "m" and "n" cannot be separated so
    they are extracted as a single component "mn". Similarly when the
    core and mantle are empty the crust is a single component "ps".
    
    The `coarse components' are the crust+mantle prefix (type "pm"), the 
    core ("c"), and the mantle+crust suffix ("ns").  In words without core,
    there is only one coarse component (type "pmns").
    
    The `type' of a word is the string of types of its non-empty
    components; thus "qoteedy" has fine type "pcns" (assuming that the
    "ee" are treated as mantle not crust).
    
    In all cases, the output word has the element brace delimiters `{}'
    deleted. Empty elements are omited.   On the other hand, the words
    listed under the pair tags `tw-w' and `k-w' have their contiguous
    crust/mantle/core components marked off with `()' and `<>'.

  Testing the pairs extractor:
  
    foreach item ( ${pairs} )
      echo ".sample.wsp -> .sample-${item}.pct"
      cat .sample-o.wsp \
        | extract-components \
            -f get-components.gawk \
            -v select=${item} \
        > .sample-${item}.pct
    end

  Let's create the pair counts:

    foreach ptag ( ${pairs} )
      analyze-pairs ${ptag}
    end
  
TESTS WITH "DISASSEMBLED" PLATFORM GALLOWS

  For the tests in this section, we considered "c", "h" (except in
  "ch" and "sh"), "i" (before gallows), and "e" (anywhere) as separate
  mantle elements. Moreover, we mapped those letters to "e". So a
  platform gallows letter like "cth" or "ith" was turned into three
  elements "{e}{t}{e}".

  NOTE: The tables in this section are still based on the word files
  generated prior to the 2000-10-05 remake. That version of the 
  text allowed a few more words, e.g. words with "hh" or isolated "c"
  It is probably not worth redoing this analysis just because of
  a few tokens.

    foreach sec ( ${secs} ) 
      echo ${sec}
      cat data/words/${sec}.wds \
        | select-simple-words \
        | factor-words -f factor-text.gawk \
            -v hicsmash=1 -v esplit=1 \
        | split-words \
            -v omods=0 \
        | tr -d '{}' \
        > data/words/${sec}.wsp
    end
  
  Let's create the component statistics per section:

   foreach ctag ( ${comps} )
      analyze-components ${ctag}
    end
  
    foreach ptag ( ${pairs} )
      analyze-pairs ${ptag}
    end

  The following two tables may tell us whether the mantle is more
  attached to the prefix or to the core:
  
    Anomalies for p-mcn (crust prefix)�(mantle+core):

      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                                                                           -    
                                   -   -                   -                   -                   -   -   c    
                           -   -   k   k           -   -   t   -   -       -   p           -   -   c   s   h    
               T       -   k   k   c   e       -   t   t   c   e   e       p   c   -   -   c   s   h   h   e   -
               O   -   k   e   c   h   e   -   t   e   c   h   k   t   -   c   h   c   s   h   h   e   e   k   e
               T   k   e   e   h   e   e   t   e   e   h   e   e   e   p   h   e   h   h   e   e   e   e   e   e
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      o-     +14  +5  +6  +4  +3  +2   . +10  +8  +8  +5  +3  -1  -1  +4  +2  +4  -6  -8  -9  -9 -10  -9 -12  -1
      qo-    +12  +8  +9  +8  +6  +4  +1  +9  +6  +6  +6  +4  +2  -2  +1  +2  +2  -9 -12 -10 -14 -11 -12 -10  +2
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      dy-     +0   .  +1  -1  +1   .   .  +3  +2  +2  +4   .   .   .   .  +1   .  -4  -2  -2  -4  -1   .  +2  -1
      y-     +10  +1  +2  +3  +3   .  -2  +6  +4  +5  +4  +1  -6  -6  -1   .   .   .  -1   .   .  +3   .  -9  -9
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      q-      -1   .   .  -1  -2   .  +1  -2  -2  -1  -1   .  +7  +6  +1   .   .   .  -2  -7  -4  -1   .  +2  +8
      a-      -1   .  -2  -5   .   .   .  +1  -1   .   .   .  +6  +5  +5   .   .  -5  -1  -6  -1  -1   .  +3   .
      so-     -1  -2  -3   .  +2   .   .  +2  +2  -1  +4   .  +1  +3   .   .   .  -5  +2  -6  -4  -1   .  +2  +2
      lo-     -1  -2  -3   .  -1   .   .  +4   .   .   .  +1  +1  +1   .   .   .  -4  -1  -3  -3   .   .  +3  +5
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      l-      +7  +4  +4  +5   .  +5  +3  -1  -1   .  -6  -4  -7  -8  -3  -4   .  +4  +2  +7  +5  +4  +3  -4  -6
      ol-     +6  +4  +5  +6   .   .  +3  -2  -1  -2  -3  -3  -6  -5  -5  -7   .  +2  +1  +4  +5  +3  +2  -3  +1
      al-     +0  +3   .  +1  -1  -1  -1  -4  -1  -2  -2   .   .   .   .  +2  -1  +2  +2  +1  +2   .   .  +1  +1
      qol-    +0  +1  -3  +3  -1  -1   .  -4  -1   .  -1   .   .   .   .   .  -1  -2  +2  +4  +1  +2  +2  +1  -1
      sol-    -1   .  +2  +1   .  +1   .  -3  -1   .  -1   .   .   .   .  +1   .  -2  -1  +1   .  -1   .  +2   .
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      -      +19  -3  -3  -4  -1  -3  -7   .  -2  -3   .   .   .  +5   .   .   .  +6  +6  +4  +5  +2  +3  +4  -8
      d-      +3  -5  -2  -7  -5  -4  -3  -3  -6  -5  -4  -4   .   .  -4   .  -3 +10 +10  +6  +9  +8  +7  +3  +6
      s-      +0   .  -4  -3  -1  -1  -1  -4  -3  -2  -2   .   .   .   .   .  -1  +6   .  +5  +6   .  +1  +3  +1
      r-      +0  -6  -3  -3  -1  -1   .  -4  -1  -1  -1   .   .   .   .   .   .  +4  -1  +6  +5  +3  +4  +2  -1
      or-     -1  -4  -3  -5  +1   .  +1  -3  -1   .   .   .  +1   .   .   .   .  +1   .  +2   .  +2  +1  +3   .
      dal-    -1  -3   .  -5  -1   .  +1  -3   .   .   .   .  +1   .   .   .   .  +4  +4   .  -1  -1   .  +3   .
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      TOT    +43 +35 +31 +32 +27 +24 +21 +32 +28 +27 +27 +23 +23 +26 +24 +23 +24 +33 +29 +33 +31 +26 +25 +23 +22


    Anomalies for pm-c (crust+mantle prefix)�(core):   
  
      ----- ----- --- --- --- ---
                T                
                O   -   -   -   -
                T   k   t   p   f
      ----- ----- --- --- --- ---
      o-      +20   .  +2   .  -2
      qo-     +18  +2  +1   .  -3
      y-      +13   .  +2   .  -2
      -       +19  -3   .  +3   .
      ----- ----- --- --- --- ---
      e-      +14  -5  +2  +2   .
      ----- ----- --- --- --- ---
      che-    +12   .   .   .   .
      cho-     +9  -1  +1   .   .
      ch-      +7   .   .  +2  -2
      she-     +6  +1   .   .  -1
      chee-    +5   .  +1  -3  +1
      choe-    +4  -2   .   .  +1
      cheo-    +1   .   .  +1  -1
      ----- ----- --- --- --- ---
      shee-    +4   .  +1   .  -2
      sho-     +2   .  +1   .  -1
      sh-      +1  +1   .  -4  +2
      shoe-    +2  -3   .   .  +4
      ----- ----- --- --- --- ---
      qoe-     +3  +1   .   .  -2
      oe-      +4  -1   .  +1   .
      qe-      +0  -1   .   .  +1
      ----- ----- --- --- --- ---
      l-       +8  +3  -3   .   .
      ol-      +8  +2  -3  -1  +1
      al-      -1  +2  -7   .  +4
      qol-     -2  +2  -4   .  +1
      ----- ----- --- --- --- ---
      dy-      +0  -1  +1  -2  +1
      ----- ----- --- --- --- ---
      TOT     +42 +39 +37 +30 +25

  Note the much lower anomalies for pairs "pm-c" than for "p-mcn".

  Same for suffixes:
  
    Anomalies for mcn-s (core+mantle)�(crust suffix):

      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                                                 -                                
                                     -                                           o                                
                                     d                                       -   d       -                        
                             -   -   a                                   -   o   a       a   -   -               -
                 T       -   d   d   i               -   -   -   -   -   o   l   i       i   a   a   -   -   -   a
                 O   -   d   a   a   i   -   -   -   o   o   o   o   o   d   d   i       i   i   i   a   a   a   l
                 T   d   y   r   l   n   s   y   o   l   r   s   d   m   y   y   n   -   n   n   r   r   l   m   y
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      k-        +9 -12 -16 -11 -10 -10 -12  -1  -3  +1   .  -3   .  -1  -2  +3  -3  +3 +14 +16 +10 +10 +11  +8  +8
      t-        +8 -10 -18  -9  -8  -8 -11  -1  -3  +2  +1  -2  -2  -4  -1  +4  +2  +4 +12 +12  +9 +10  +9  +8  +5
      p-        +1  -5 -12   .  +1  +1  -4  -8  -9   .   .  -5   .  +1  -6  +2  +3  +5  +8  -1  +9  +5  +4  +3  +3
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      chee-     +2   .  +2  +2  -2  +3 +10  +4  +4  -3   .  +3   .  -2  +1   .  -2   .  -4  -3   .  -4  -2  -3  -1
      shee-     +1  +4  +4   .  +3  -2  +7  +3  +1  -2   .  -1  -2  -1  -3   .  -1  +8  -5   .   .  -3  -4  -2   .
      ee-       +1   .  +1  +2  -2  +4  +8  -1   .  -3   .   .  -2   .  -3   .  -2   .  -3  -1   .   .   .  +1  +1
      che-      +9  +2  +4  +4  +3  +4  +3   .  +2  +1  +1  +2  -1  +1  +1  -3   . -10  -5  -2  -7  +2   .  -1  -3
      she-      +7  +3  +6  -1  +2  +3   .   .  +1  +1   .  +1  +2  -2  +1  -1  -2  +5  -2  -6  -5   .   .  -5  -4
      keee-     -1  +2  +1  -1  +1   .  +6  +3   .  -5  -8  +2   .  +1  -1  +2   .   .  -2   .  +2  -4  -3   .  +2
      kee-      +5  +4  +7  +1  +3  -2  +4  +5  +4   .   .  +2   .  -2  +3  -3  -1  -6  -3  -6  -4   .  -2  -1  -4
      tee-      +3  +4  +6  +3  +2   .  +5  +3  +2  -2  -6  +5   .  -2  +3   .   .  -4  -5  -3  -1  -4  -1   .  -1
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      te-       +5   .  +5  +5  +1  +4  -1  -2   .  +1  -1  +5   .  -1  +3   .  +3  -5  -3  -2  -3  -1   .  -3  -3
      ke-       +7  +1  +6  +3  +3  +1  -3  -1   .  +2  +1  +1  +1  +2  +4  -1  -2  -7  -4   .  -5  -2   .  -1  -2
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      ch-      +11  -3  -2   .   .   .   .  -4   .  +3  +3  +1  -1  +1   .   .  +4  -5  +2  +1  -5  +2  +1  +1  -2
      sh-       +7  -2  -3  -2  -2  -1  -4  -3  +6  +3  +3  -1  +1  -1  +1  +2  +4  +3  +1   .  -4  +2   .   .  -2
      kch-      +4  +3  +1   .  +1  -3   .   .  +3  +1  +3  -2  +4  +2   .  -3  -5   .  +1   .  -2   .  -1  -3  -1
      tch-      +4   .   .  -3   .   .  -1   .  +1  +1  +3   .  +1   .   .  -3   .  +1   .  -1  -2   .  -1  +2  -1
      pch-      +1   .  +1  +4   .  +2  -4  -3  -3   .  +1  -5   .   .   .   .  +4   .   .  -2  +1  +2  +2  -1  +1
      pche-     +1   .  +5  +7  +5  +7   .  -1  -2   .  -3   .  -2  -1   .   .  -1  -1  -3  -2   .   .  -6  -1   .
      kche-     +0  +1  +6  -2  -1   .   .  +2  +2  -1   .  -3   .   .  +1   .  +3  +1  -3   .  +1  -5  -5  -1  +1
      tche-     +0  +4  +5  +4  +1   .  +1  +1  +2  -3  -3   .   .   .   .   .   .  -2   .   .  +1  -5  -5  -1   .
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      cheke-    +0  +4  -1  -1   .   .   .  +5  -7  -4  -3  -2   .  +2  -6  +1   .  +4  +2   .  +2  -3  +3   .  +3
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      ete-      +3  -2  -7  -5  -4  -4  -2   .  +1  +2  +4  -3  +2  +6  +2   .   .  +2  +3  +3  -1  +3   .   .  -1
      eke-      +0  -3  -5  -2  -1  -1  -2  +1   .  +2  +1  +3   .  +1  +1  +1  -1  +1   .   .  +1  -4  +1  +4   .
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      TOT      +43 +23 +36 +22 +21 +21 +25 +38 +29 +33 +31 +24 +21 +19 +29 +18 +21 +24 +30 +28 +21 +30 +29 +24 +19


    Anomalies for c-ns (core)�(mantle+crust suffix):

      --- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                                                          -                    
                  -       -       -   -                                                   c   -   -   -   -    
                  e   -   e   -   e   a   -   -                               -   -   -   h   c   c   c   c    
              T   o   e   e   e   e   i   a   a   -   -   -   -   -       -   e   e   c   e   h   h   h   h    
              O   d   d   d   e   e   i   i   i   a   a   a   o   o   -   e   o   o   h   d   e   d   o   o    
              T   y   y   y   y   y   n   n   r   r   l   m   l   r   y   y   l   r   y   y   y   y   l   r   -
      --- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      k-    +15  +2  +4  +4  +3  +2  +1  +6  -1   .  +2  +1  -1  -1  +1   .   .   .  -2  -5  -3  -3  -3  -3  -3
      t-    +13  +2  +4  +2  +1   .   .  +3  -1   .  +1  +2   .   .  +1   .   .   .  -1  -4  -2  -3  -2  -1  -3
      p-     +4  -1  -6  -1  -3  -1   .  -7   .  -1  -1   .  +1  +1  -1   .   .   .  +2  +6  +4  +3  +3  +2  +2
      f-     +0  -3  -2  -6  -1   .  -1  -2  +2   .  -1  -3   .  +1   .   .  -1   .  +1  +4  +1  +3  +2  +2  +4
      --- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      TOT   +42 +23 +29 +29 +30 +21 +30 +28 +21 +28 +28 +22 +26 +23 +30 +30 +25 +22 +26 +25 +24 +23 +22 +21 +22

  Note again that the anomalies are much lower for "c-ns" than for "mcn-s".

  Considering the anomalies of "mcn-s" above, it seems that "ke-" and "te-"
  look quite unlike "k-" and "t-" to the following suffix. That may be a sign
  that "e" is an independent letter, akin to the tables.
  
  Also "ete-" ("cth-" or "ith-" in the original) looks quite different from "te-"
  and very similar to "ch-".
  
  Looking at the "p-s" and "pm-ns"  tables:
  
      Anomalies (crust prefix)�(crust suffix):

      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                                                       -        
                                                                                   -   o        
                                   -                                               d   d        
                                   a           -       -                       -   a   a   -    
               T       -   -   -   i   -   -   o       a           -   -       d   i   i   a   -
               O   -   d   o   o   i   a   a   d   -   i   -       o   a   -   a   i   i   i   o
               T   y   y   l   r   n   r   l   y   o   n   s   -   s   m   d   r   n   n   r   d
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      -      +18   .  -1  +2  +2  -1   .   .  +2  +1  -3   .  +1   .  -2  -1   .   .  +2  -3  -1
      o-     +15  -1  -1   .   .  +2  +2  +3   .   .  +1   .  -1   .   .  -3  -1  -1  -1   .  -1
      qo-    +13   .  +1  +1   .  +3  +2  +4   .   .  +4  -3  -1  -2   .   .  -2  -1  -5   .   .
      y-      +9   .  -2   .  +2  +1  +1   .  +3  +2  -2   .  -1   .   .  -2  -1  -1   .   .   .
      l-      +6   .  +3   .  -1  +2  +1  +1  +1  +1  +1  +1   .  -4  +1  +2  -1  -3  -5  -1  -1
      ol-     +5  +1  +2   .   .  +2  +1  +1   .   .  +3  +1  -2  -3  +1  +1  -3   .  -4  +1  -2
      d-      +3  +1   .  +3  +6  -4  -5  -1   .  +3  -4  +1   .  +3  -3   .  +1  -2  +1  -3   .
      s-      +0   .  -2  +1  +1  +1  -2   .   .  +1  -2   .  +1  +3  -2  -2  -1  +1  +2   .   .
      q-      -1  -1  -2   .  +1   .  +1  -1  +1  -3  -1  -2   .   .  +2   .  +2   .  +1   .   .
      al-     -1  -1  -1  -3  -4   .  +1  -3  -4  -3  +3   .   .   .  +3  +3  +3  +1  +2  +2   .
      r-      -2   .   .  -2  -4  -5  -1   .  -2   .  -4  +3   .  +1   .  +2  +3  +1  +2  +1  +1
      qol-    -3  +1  +2  -4  -3  -1  -3  -3  -2  -3  +4   .  +1   .   .   .  +1  +2  +3  +2  +2
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      TOT    +44 +38 +36 +33 +31 +31 +30 +29 +29 +29 +28 +25 +25 +24 +24 +24 +23 +22 +21 +21 +21
  
      Anomalies (crust+mantle prefix)�(mantle+crust suffix):

      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                            -                                    
                    -                               -       c   -   -   -               -        
                    a   -                           e   -   h   c   c   c       -   -   e   -   -
                T   i   a   -   -   -   -   -       e   c   e   h   h   h   -   e   e   o   e   e
                O   i   i   a   a   a   o   o   -   d   h   d   e   d   o   e   o   o   d   e   d
                T   n   n   r   l   m   l   r   y   y   y   y   y   y   l   y   l   r   y   y   y
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      l-       +5  +5  +4  +3   .  +1   .  -6  -2  +5  -4  +5  +2  +3  -4  -8  -2  -6   .   .  +4
      ol-      +5  +4  +5  +1  +1  +1  -3   .   .  +4  -1  +4   .  -1  -5  -5  -1  -5  -2   .  +3
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      o-      +17  +2  +2  +2  +3   .  +1   .  -2   .  -1  +1   .   .   .  -6   .  -2  -1  -2  +1
      qo-     +16  +2  +3   .  +3  -1  +1   .  -2  +3   .  +1  -1  +1  -1  -5  -2  -3  -3  -1  +3
      -       +13  +1   .  +1   .  -2  +4  +2  -3   .   .  +4  +2  +2  +3  -7   .  -3  -1  -4   .
      y-      +10  +2  -2  +1   .   .   .  +1  -2   .  +1  +2  +1  +1  +1  -7  -1  -1  +1  -1   .
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      cho-     +5  +2   .  +1  +2  +2  +2  +3  +3  -6  +4  -1  +2  -1  +2  -4   .  -5  -1  -4  -3
      ch-      +3  +4  +3  +3  +5  +1  +3  +1  +2  -7  +2  -3  -1  -4  +4  -3  -6  -1  -2  -1   .
      sh-      +0  +1  -1   .  +2   .  +2   .  +3  -1  +3  +2   .   .   .  -6  -1   .  +3  -4  -4
      sho-     +0   .  -1   .  +1   .  +5  +1  +2  -5  +7  -1  +3   .   .  -2  -1   .  -2  -4  -3
      cheo-    +0   .  +3  -1  -1  +4  +1  +3  +2  -5   .  -1  +2   .   .  -3   .   .  -1  -1  -1
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      shee-    +0  -4  -1   .  -1   .   .   .  +5  +1  +1   .  -1   .   .  +7  -2   .  -1   .  -1
      chee-    +0   .   .   .  -3   .  -2   .  +4   .  -4  -1   .   .   .  +8   .  -1   .  +2   .
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      e-       +3  -5  -6  -3  -7  -4  -4  -4  -2  +4  -2  -3  -5  -2  -2  +8 +13 +13  +7  +6  +2
      che-     +5   .  +1  +1  +1  -3  -3  -6  +5  +2   .  -3   .  -1  -4  +7  -1   .  -2  +1  +3
      qoe-     +0  -4  -3  -1  -2   .   .   .  -5  +2  -2  -1  -1   .   .  +4  +6  +3  +3  +4   .
      oe-      +0   .  -1  -3  -3   .  -2  +1  -7  +3  -4  -1  -1   .  +1  +4  +2  +5  +6  +1  -1
      choe-    -1  -4  -2  -4  -2   .  -1   .   .   .  -1   .   .   .  +1  +7  +1  +5   .  +2   .
      shoe-    -2  -3  -1  -1  -1  +1   .  +1  -5  -2  -2   .   .  +1  +2  +5  +2  +2  +1  +1  -3
      she-     +1  -5  -3  -1  +1  +1  -4   .  +5  +1  +4  -1  -1  +1   .  +7  -3   .  -2  +2  -1
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      TOT     +42 +30 +28 +28 +28 +22 +26 +23 +30 +29 +26 +25 +24 +24 +22 +31 +25 +22 +23 +31 +29

  We see that the anomalies for "p-s" are quite small, meaning
  that the core+mantle essentially decouples the crust prefix from the
  crust suffix.  
  
  The anomalies for "pm-ns" are larger but still relatively modest,
  meaning that the core alone isolates the prefix from the suffix
  fairly effectively. Actually the main anomaly seems to be a general
  attraction between those prefixes that end with single "e-" and
  those suffixes that begin with "-e". Most of this effect is probably
  due to the splitting of "cth" and "ith" as "e-t-e".
  
TESTS WITH INTACT PLATFORM GALLOWS

  Considering the observation about correlation between "e" in prefix
  and in suffix, we repeated the tests above with the gallows kept intact.
  Still, the "e"s were treated as separate elements.  Also, preliminary tests 
  showed that "k" and "t" behave pretty much alike, and the same mostly holds
  for "p" and "f", "ch" and "sh".  So we map those pairs to the most popular
  member ("k", "p", and "ch", resectively), to save space in tables.
  
  Note: these tests were done WITH labels included in the database.
  
  Testing the factoring script:
  
    echo ".sample.wds -> .sample.els"
    cat .sample.wds \
      | factor-words -f factor-text.gawk \
           -v ktsmash=1 -v chshsmash=1 -v esplit=1 \
      | sort | uniq \
      > .sample.els
  
  OK, let's do it:

    mkdir stats/words
    
    foreach ptag ( ${pairs} )
      analyze-pairs -eelump ${ptag}
    end

ANALYSIS

  [ old data ]

    Anomalies for p-mc (crust prefix)�(mantle+core):

      ---- ----- --- --- --- --- --- --- --- --- --- ---   --- --- ---   --- ---   --- --- --- --- --- --- --- --- ---
                                                                                                                 -   -
                                                                                                 -               c   c
                               -       -   -       -             -                           -   c   -   -       h   h
                       -       k   -   k   k       p   -     -   c   -         -     -       c   h   c   c   -   e   c
               T       k   -   c   k   e   e       c   p     c   k   c     -   e     c   -   h   c   h   h   c   c   k
               O   -   e   k   h   c   e   c   -   h   c     k   h   p     e   e     h   c   e   k   o   e   h   k   h
               T   k   e   e   e   h   e   h   p   e   h     h   e   h     e   e     e   h   e   h   k   k   k   h   e
      ---- ----- --- --- --- --- --- --- --- --- --- ---   --- --- ---   --- ---   --- --- --- --- --- --- --- --- ---
      qo-    +10 +10 +10 +10  +8  +9  +4  +3  +3  +5  +4    +4  +4  -3    +4   .    -9  -8  -9 -10  -8  -9  -8  -7  -8
      o-     +11  +9  +8 +10  +7  +8  +4  +3  +7  +7  +5    +2  +1  -2    +2   .    -6  -4  -7  -9 -10  -9 -10 -10  -9
      ---- ----- --- --- --- --- --- --- --- --- --- ---   --- --- ---   --- ---   --- --- --- --- --- --- --- --- ---
      y-      +7  +5  +6  +6  +5  +7  +2   .  +3  +3  +3    -3  -3  -7    -6  -8    +2  +1  +5  -2  -6  -5  +1  -5  -5
      l-      +5  +4  +6  +4  +5   .  +4   .  -2  +1  -1    -8  -6  -5    -4  -5    +7  +4  +5   .  -4   .  -4  +2  -3
      ol-     +5  +3  +6  +4  +2   .  +3  -2  -1  +3  -4    -6  -6  -1    +1  +2    +4  +2  +4   .  -2  -3  -2  -3  -3
      ---- ----- --- --- --- --- --- --- --- --- --- ---   --- --- ---   --- ---   --- --- --- --- --- --- --- --- ---
      -      +19  -3  -4  -3   .   .  -7  -9   .   .   .    +4  +1   .    -8  -7    +4  +5  +3  +5  +5  +4  +3  +2  +1
      d-      +2  -5  -7  -4  -2  -2  -3  -2  -5  -3  -1    +1  -3  -1    +7  +3    +8 +10  +9  +3  +2   .  -1  -1   .
      s-      +0  -3  -3  -6  -3  -4  -2   .   .  -2  -1    -1   .   .    +2  +2    +5  +4  +1  +3  +2  +3   .  +2  +1
      al-     +0  +1   .  -1  -3  -4  -2   .  -1   .  +1    -2   .  +2     .   .    +1  +1   .   .  +1  +2  +2   .  +1
      r-      +0  -6  -4  -3  -2  -3  -1   .  -1   .  -1    -1   .  +1    -1  +1    +5  +3  +4   .  +1  +4  +1  +1  +2
      qol-    +0   .  +3  -3  -1  -3   .  +1   .  -2  -1    -1   .  +1    -2   .    +3  -1  +2   .  +1  +1  +1  +1  +2
      or-     -1  -7  -7  -6  -2   .   .  +1  -1  -1   .     .   .  +2     .  +1    +1   .  +1  +2  +2  +2  +2  +3  +3
      dal-    -1  -4  -6  -1  -1  -2   .  +1  -1   .   .     .   .  +2    -1   .     .  +3  -3  +1  +2  +2  +2  +2  +3
      ---- ----- --- --- --- --- --- --- --- --- --- ---   --- --- ---   --- ---   --- --- --- --- --- --- --- --- ---
      q-      +0  -1  -2  -3  -3  -5   .   .  +1  -2  -1    +6  +6  +1    +7  +6   -10  -3  -4   .  +1  +1  +1  +1  +2
      so-     +0  -1   .   .  -2  +3  -1   .  -1  -1  -1    +3  +3  +1    +1  +1    -8  -1  -4  +2  +2  +1  +1  +1  +2
      dy-     +0   .   .  +1  -2  +4  -1   .  -1  -1  +2    -1   .  +1    -1  +1    -4  -6  -3   .  +2  +1  +2  +3  +2
      sol-    -1  -2  +1   .  -1  -3  -1   .  -1  -1   .    -1   .  +1    +1   .    +1  -4  -3   .  +2  +1  +3  +1  +2
      a-      -1   .  -5  -4  -2  -2   .  +1  +3  -1   .    +6  +3  +7    -1   .    -7  -7  -3  +1  +2  +2  +2  +2  +3
      ---- ----- --- --- --- --- --- --- --- --- --- ---   --- --- ---   --- ---   --- --- --- --- --- --- --- --- ---
      TOT    +44 +37 +33 +33 +28 +30 +22 +18 +25 +25 +25   +27 +23 +20   +22 +19   +35 +35 +29 +25 +23 +23 +22 +21 +19

  We see strong dependencies between crust prefix and core+mantle.
  Essentially, the proper prefixes "o-", "qo-", "y-", "l-", and "ol-"
  are attracted to cores that begin with naked gallows, and repelled
  by cores that begin with "ch" and "sh".
  
  The strong attraction of "o-" and "qo-" for naked gallows suggests
  that "a" and "o" are modifiers for the following letters.  However,
  the prefixes "sho-" and "cho-" do not seeem particularly attracted to 
  gallows (see below), and prefixes like "lo-" are fairly rare.
  These facts argue against "o" being a pre-modifier.

  On the other hand, there seems to be little dependency between the
  prefix and the tail of the core (minus the first letter).

  Interestingly, the platform gallows seem to behave intermediately
  between the naked gallows and the "ch"/"sh" elements.

  More curously, the isolated "ee" and "eee" elements resemble
  platform gallows in that they are intermediate between naked gallows
  and "ch"/"sh".

    Anomalies for pm-c (crust+mantle prefix)�(core):   
  
      ----- ----- --- --- --- ---   --- --- ---
                                          -    
                                          k   -
                            -   -     -   c   k
                T           c   c     k   h   e
                O   -   -   k   p     o   o   o
                T   k   p   h   h     k   k   k
      ----- ----- --- --- --- ---   --- --- ---
      ch-      +8   .   . +13  +6    -6  -5  -5
      che-     +7  +1  +3 +10   .    -5  -4  -4
      cho-     +8  +1   .  +6  +3    -2  -4  -4
      cheo-    +3  -1  -2  +5  +3    -1  -1  -1
      ----- ----- --- --- --- ---   --- --- ---
      -       +15   .  +4  +7  +4    -3  -3 -10
      ----- ----- --- --- --- ---   --- --- ---
      o-      +13  +5  +5   .  -3    -1  -5   .
      qo-     +10  +8  +4  +3  -3    -2  -6  -4
      y-       +5  +7  +6  -2  -6    -2   .  -2
      ----- ----- --- --- --- ---   --- --- ---
      l-       +2  +7  +4  -7  -3     .   .   .
      ol-      +2  +5  +3  -6   .    -1   .   .
      ----- ----- --- --- --- ---   --- --- ---
      chee-    +1   .   .   .  -2     .   .   .
      chy-     +0  -1   .  -5   .    +1  +2  +2
      al-      +0  -1   .  -5   .    +1  +2  +2
      dy-      +0   .  -2  -4   .    +2  +2  +2
      chol-    +0  -1   .  -4   .    +1  +2  +2
      qol-     +0   .  -4  -4   .    +2  +3  +3
      d-       +0  -3  -1   .  -1    +1  +1  +2
      e-       +0  -3   .  -4   .    +3  +2  +2
      qe-      +0  -2  -2  -4   .    +2  +3  +3
      oe-      +0  -2  -4  -2   .    +2  +3  +3
      sol-     -1  -2  -4  -3   .    +2  +3  +3
      ----- ----- --- --- --- ---   --- --- ---
      a-       +1  -6  -2  +2  +4     .  +1  +1
      q-       +0  -5  -3  +5   .    +1  +1  +1
      so-      +0  -2  -6  +2   .    +2  +2  +2
      ----- ----- --- --- --- ---   --- --- ---
      TOT     +42 +41 +31 +32 +23   +15 +13 +13
      
  The double-gallows cores are quite rare - 32, 20 and 20 cases, respectively,
  against ~17,000 for the single-gallows ones.
  
  While the "ch*-" prefixes show a moderate preference for platform gallows,
  overall the prefix and core seem quite independent.
  
  Comparing the tables for pm-c and p-mcn pairs, we can say that the main 
  discontinuity lies at the mantle-core boundary.

    Anomalies for c-ns (core)�(mantle+crust suffix):

      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                                 -                                                
                         -           -                           c   -       -       -       -       -           -
                         a   -   -   e               -       -   h   c       c   -   e       c       c   -   -   e
                 T       i   e   e   e   -   -   -   a   -   c   e   h   -   h   e   o   -   h       h   a   o   e
                 O   -   i   e   d   d   a   a   e   i   o   h   d   e   o   d   o   d   a   o       o   i   d   e
                 T   y   n   y   y   y   r   l   y   n   l   y   y   y   r   y   l   y   m   l   -   r   r   y   y
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      k-       +21  -3  +3  +4  +3  +7   .  +2  -1  +4  -2   .   .   .  -4   .   .  -1  -1   .  -3  -1  -1  -4   .
      p-        +6  -3  +4 -11 -11  -7  +2  +2 -12  -3  +1  +7 +13  +9   .  +9  -7  -8  -2  +8  +5  +6  +4  -4  -5
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      ckh-      +6  +9   .  +4  +7  -5   .  +2 +11  -1  +5  -7  -9  -8  +5  -8  +4  +3  +1  -7  +2  -7  -7  +6  -4
      cph-      +1  +4  +2  +1  +5  -1   .   .  +6  -3  +4  -5  -3  -3  +3  -3  +2  +3   .  -2  -4  -2  -1   .   .
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      kok-      -2  -1  -3   .  -2  +3  +1  -2  -3   .   .  +2   .   .  -2   .   .   .  +1   .  -1   .  +1  +2  +4
      kchok-    -2  -1  -3  -1   .  +2  -2  -2  -3   .  -3  +3   .  +1  -3   .   .  +3   .  +1   .  +2  +1   .  +3
      keok-     -2  -3  -3  +3   .  +1  -2  -2  +2  +3  -5  -1   .   .   .   .   .   .   .  +1   .  +1  +1   .  +3
      ------ ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      TOT      +42 +32 +30 +30 +29 +29 +29 +28 +28 +28 +27 +27 +26 +25 +25 +24 +24 +23 +23 +22 +22 +22 +21 +21 +20

  Apparently, cores with naked "k" and "t" gallows are fairly similar
  to cores with platform gallows, while the "p" and "f" gallows
  are quite aberrant.  
  
  Except for "p" and "f" cores, there is a fairly good independence
  between the core and suffix.

    Anomalies for mcn-s (core+mantle)�(crust suffix):

    ------- ----- --- --- --- --- --- --- ---   --- --- --- --- --- --- --- --- --- ---   ---   --- --- --- --- --- ---
                                                          -                                                            
                                                          o                                               -            
                    -                                     d                       -                       d            
                    a   -   -           -                 a   -                   o   -           -   -   a            
                T   i   a   a   -   -   a   -     -   -   i   o   -           -   l   o           d   d   i   -        
                O   i   i   i   a   a   l   a     o   o   i   d   o   -       o   d   l     -     a   a   i   d   -   -
                T   n   n   r   r   l   y   m     l   r   n   y   d   o   -   s   y   y     y     l   r   n   y   d   s
    ------- ----- --- --- --- --- --- --- ---   --- --- --- --- --- --- --- --- --- ---   ---   --- --- --- --- --- ---
    k-        +12 +13 +13  +9 +10 +10  +7  +7    +3  +3   .   .   .  -2  +2  -2  +3   .    -2   -11 -10 -11 -15 -13 -15
    p-         +3  +8   . +10  +6  +4  +1  +1    +2  +1  +2  -5  -1  -7  +5  -5  +2  +1    -9    +1   .  +1 -11  -5  -6
    chok-      +2  +8  +6  +4  +5  +5   .  +5    +1   .  -2  -2  -3   .  +6  -4   .   .     .    -2  -3  -2 -13  -4  -3
    chek-      +0  +5  +8   .  +5  +5  +1  +3    -5  -6  -1   .  -2  -6  +9  -3   .   .    +3    -1  -2  -1  -8  -2  -2
    chk-       +1  +9  +8  +5  +6  +7  +5  +3     .  -4   .  -6   .  -3  -2  -3  +2  -1    -1    -1  -2  -1 -12  -3  -4
    ------- ----- --- --- --- --- --- --- ---   --- --- --- --- --- --- --- --- --- ---   ---   --- --- --- --- --- ---
    ckh-       +4  +2  +1  -3  +2   .  -3  +1    +5  +5  -1  +3  +1  +3  +1  +1   .  +3    +1    -5  -6  -5  -4  -3  -2
    chckh-     +0  +2  -1   .  -4  +3  +2  +2    -4  -2  -1  -1  +3  -3  +5  -2   .   .    +6    -1  -2  -1   .  +3  -2
    checkh-    -1  -2   .  +3  -3  +1  +2  +3    -4  -5   .  -3   .  -2   .   .  +2  +1    +5    +1   .  +1  -2  +1  -1
    cph-       +0  +5   .  +2  +2   .  +1   .    +3  +1   .  -1   .  -3  -2  -1  +4  +1    -2     .  -1   .  -7  -1  -1
    ------- ----- --- --- --- --- --- --- ---   --- --- --- --- --- --- --- --- --- ---   ---   --- --- --- --- --- ---
    ch-       +13  +1   .  -6  +2   .  -3   .    +4  +5  +4  +1  -1  +4  -2  +1   .   .    -5     .   .   .   .  -2  -2
    kch-       +8   .  -2  -6   .  -1  -1   .    +3  +5  -1  +1  +2  +4  +1   .  -6  -3     .     .   .  -1  +3  +3  -1
    pch-       +4  -1  -3   .  +1   .  -1   .    +2  +3  +4   .   .  -1  -3  -6  +3  +3    -4     .  +5  +2  +2  -1  -5
    kech-      -1  -2   .  +2  -2  -3  +2   .    -4  -4   .  -4  +3  +1  +1   .  +1  +2     .     .   .   .  +2   .  +1
    ------- ----- --- --- --- --- --- --- ---   --- --- --- --- --- --- --- --- --- ---   ---   --- --- --- --- --- ---
    che-      +12  -5  -5 -10  +1   .  -5  -3    +3  +2  -1  +2   .  +3   .  +3  -2  -5     .    +3  +2  +4  +7  +3  +1
    ke-        +9  -5  -2  -8  -1   .  -5  -3    +4  +2   .  +5  +1  +1  -9  +4  -1   .    -2    +3  +4  +3  +7  +1  -2
    kche-      +3  -4  -3  -2  -8  -8  -2  -5     .  +1  +1  +4  +2  +5  -1  +1  -3   .    +2     .  +3   .  +8  +5  +2
    pche-      +2  -5  -3   .   .  -7  -1  -2    +1   .  -1  +3  -3  -1  -3   .   .  -1    -1    +5  +7  +7  +8  +2   .
    ckhe-      +0  -4  -1  +1  -2  -5   .  -2    +3  +1  -1  +4  +1  +1  -3  +3   .   .    +3    -1  -2  -1  +3  -1  +1
    chckhe-    -1  -2   .  +3  -3  -1  +3   .    -6  -5  +1  -1   .  -2   .   .  +2  +1    +2    +1   .  +1  +4  +1  -1
    ------- ----- --- --- --- --- --- --- ---   --- --- --- --- --- --- --- --- --- ---   ---   --- --- --- --- --- ---
    kee-       +7  -4  -9  -6  -1  -2  -6  -1    +1   .   .  +4   .  +5  -9  +4  -3  -1    +4    +3  +3   .  +9  +5  +4
    chee-      +4  -7  -4  -3  -4  -3  -3  -6     .  +2  -2  +1  -2  +5  +4  +3  -2  -3    +3    +1  +2  +1  +6  +3  +9
    ee-        +1  -4  -2   .   .   .   .   .    -2  +1  -1  -1  -3  +1  -1   .  -1   .    -2    -2  +2  +4  +3  +1  +8
    keee-      +0  -4  -1  +1  -5  -4  +1  -1    -4  -3   .   .  -1  +1   .  +2   .   .    +2    +1  -1   .  +4  +4  +7
    eee-       -1  -2   .  +2  -4  -3  +2   .    -8  -5  +2  -4  +1  -3  -1  +2  +1  +2    -1     .  +1   .  +2  +2 +12
    ------- ----- --- --- --- --- --- --- ---   --- --- --- --- --- --- --- --- --- ---   ---   --- --- --- --- --- ---
    TOT       +44 +31 +28 +21 +30 +29 +20 +24   +33 +31 +21 +29 +21 +29 +25 +24 +19 +19   +38   +21 +23 +22 +36 +24 +25

  Apparently, core-mantles ending in "e" are all fairly similar, and
  distinct from core-mantles that end with naked gallows. Core-mantles
  ending with "ch" and "sh", as well as platform gallows, are similar
  to each other, and roughly halfway between "e"-terminated
  core-mantles and naked gallows (actually closer to the former than
  to the latter).
  
  Core-mantles ending with "e" show fair independence to the suffix.
  Core-mantles ending with naked gallows are more selective - they
  strongly attract suffixes starting with "a", and strongly repel
  suffixes starting with a dealer.

  The platform gallows too sem intermediate between the "ch" elements
  and naked gallows.
  
  Comparing the c-ns and mcn-s tables, we again conclude that the main break is
  at the core-mantle boundary.

    Anomalies for pm-ns (crust+mantle prefix)�(mantle+crust suffix):
    
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                                                                    -            
                                -                                                   -   -   -   -   c   -   -   -
                                a   -   -                       -   -   -   -   -   e   c   c   c   h   e   e   c
                T           -   i   a   a   -   -   -   -   -   o   e   e   c   e   o   h   h   h   e   e   e   h
                O       -   e   i   i   i   a   a   a   o   o   d   e   d   h   o   d   o   o   d   d   d   e   e
                T   -   y   y   n   n   r   r   l   m   l   r   y   y   y   y   l   y   l   r   y   y   y   y   y
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      o-      +17  -4  -2  -1  +2   .   .  +2  +3   .  +1   .  -2  +1  +2  -1  +1   .   .   .   .   .  +1  -2   .
      qo-     +16  -4  -2   .  +2  +2  -3  +1  +3  -2  +2   .  -3  +3  +3   .   .  -2  -1  -2   .  +1  +4  -2   .
      -       +15  -2   .   .   .  -2  -1  +1   .  -4  +4  +3   .  -1   .   .  +1   .  +1  +2   .  +2   .  -4   .
      y-      +10  -5  -3  -1  +1  -3   .  +1   .  -1   .   .  -1  +2   .  +1   .  +2   .  +1   .  +1  +1   .   .
      l-       +5  -6  -2  -3  +4  +3   .  +3   .   .   .  -7  -4  +4  +4  -5  -1   .  -5  -4  +2  +5  +6  +2  +2
      ol-      +5  -6  -1   .  +3  +4  +1  +1  +1   .  -3  -1  -6  +4  +4  -1   .  -2  -6  -4  -1  +4  +5  +2  +1
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      ch-      +6   .  +9  +6  +2   .  -1  +1  +4   .  +2  +1   .   .  +3  +2  -4   .   .  -3  -8  -4  -5  -5  -4
      che-     +5  +7  +9  +5   .   .  -5  +2  +3   .   .  -5  +2   .  +2  +3  -5  -6   .  -5   .  -1  -3  -5  +1
      cheo-    +1  +4  +5  +4  -1  +1  -1   .  -1  +2   .   .  -1  +2  -1   .  +2  -2  -1  -1  -2  -1  -3  -1   .
      chee-    +0  +7  +6  +4  -1   .   .  +1  -2   .  -1  -1   .  +3  -3  +1  -1   .  -1   .  -2  -2  -4   .   .
      cho-     +7  +3  +4  +3   .  -2  -3  +1  +1   .  +3  +2   .   .  -2  +4   .  -1   .  -1  -2  -3  -6  -1   .
      a-       -1  +1  +4   .  -2  -2  +2  -1   .  +1  -1   .  +1   .   .  -1   .   .   .  +1   .   .  -2  +1   .
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      al-      -1   .  -3  -2  +1  +4  +1  +1  -1  +3  -1   .   .  -3  -2  -3  +1   .   .   .  +2  -1  +2   .  -1
      dy-      -1   .   .   .   .   .   .  -3  -1   .  -1   .   .  -1  +2  +4   .   .   .  +2  +2  -1   .   .  -1
      chy-     -1  +1  +2   .  -1  -3   .   .  -2   .  -1   .   .  -2  -3  +5   .  +3   .  +1   .   .  -1   .   .
      chol-    -1   .  -1  -4  +1   .  +1   .  +2   .  -1   .   .  -4  -2   .   .   .   .  +2   .   .  +2  +4  -1
      q-       -1   .  -3  +1  -3  -1   .  -1  -2  +2  +5  +1   .  -2   .  -3  +1  +3   .  +1   .   .  -2  +2   .
      qol-     -1  +1  -2  -2  -1  +3   .  -3  -2   .  -1   .   .  +3  -2  -1   .   .   .  +1   .  -1  +5  +2   .
      d-       -1   .  -1  -3  -4  -1  +2  -3   .   .   .  +1  +2  -2   .  +2   .   .  +2  +1   .   .  -2  +1  +2
      e-       -1  +1   .  -1   .  -2  +1   .  -1   .  -1   .  +1  -2  -1   .   .   .  +1  +1   .  +1   .  +1   .
      qe-      -1  +1  -1  -3  -1  -1   .   .  +1   .  -1  +3   .  -2  -1  -3  +1   .   .  +1  +5   .  -2  +1   .
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      so-      -1   .  -4  +2  -1  -3   .  -1   .   .   .   .  +4   .  -2   .  +1   .  +2  +2  +1  -1  +2   .   .
      oe-      -1   .  -4   .   .  -1  +1  -1  -1   .  -1  +1  +1  -2  -3  -1   .  +5  +2  +1  +1   .   .  +1   .
      sol-     -1   .  -6  -1   .  +2  +1  -3  -1  +1  -1   .  +1   .  +1  -1   .   .   .  +1   .   .  +4  +1   .
      ----- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      TOT     +42 +22 +32 +28 +30 +28 +21 +29 +28 +23 +27 +25 +21 +30 +29 +27 +24 +23 +22 +22 +24 +26 +29 +20 +25

  Thre are only weak dependencies between the crust+mantle prefixes
  and suffixes. There is a slight attraction by "ch"-containing
  prefixes and short suffixes "-", "-y", "-ey"; and a slight repulsion
  between the `standard' prefixes "o-", "-", "qo-". "l-", "ol-" and
  "y-" and the empty suffix "-".
  
    Anomalies for p-s (crust prefix)�(crust suffix):

      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
                                                                                       -                        
                                                                                   -   o                        
                                   -                                               d   d                   -    
                                   a           -       -                       -   a   a   -       -   -   o   -
               T       -   -   -   i   -   -   o       a           -   -       d   i   i   a   -   d   a   l   o
               O   -   d   o   o   i   a   a   d   -   i   -       o   a   -   a   i   i   i   o   a   l   d   l
               T   y   y   l   r   n   r   l   y   o   n   s   -   s   m   d   r   n   n   r   d   l   y   y   y
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      -      +19  +1   .  +5  +4   .  +1   .  +3  +3  -2   .  +1   .  -2  -1   .   .  +1  -4  -1  -1  -4  -2  -1
      o-     +16   .   .  +2  +1  +3  +3  +4  +1   .  +2  -1  -1   .   .  -3  -1  -2  -2   .  -2  -1   .  -1  -1
      qo-    +13  +1  +3  +3  +2  +5  +3  +5  +1   .  +5  -2   .  -2   .   .  -1  -1  -5   .   .  -3   .  -8  -6
      y-     +10   .   .  +2  +3  +2  +2   .  +4  +3  -1  -1  -1   .   .  -2  -1  -1   .   .   .  -2  -1   .  -3
      l-      +6  +1  +5  +1   .  +4  +3  +2  +2  +2  +2  +1   .  -4  +2  +2  -1  -3  -6  -1  -2   .  -3  -4  -6
      ol-     +5  +2  +3  +1  +1  +3  +2  +2   .  +1  +4  +1  -2  -3  +1  +1  -3   .  -5  +1  -2  -2  -2  -3  -1
      d-      +4  +1  +2  +4  +7  -3  -5   .  +1  +4  -4   .   .  +3  -3   .   .  -3   .  -3   .   .  -3   .   .
      s-      +0   .   .  +3  +2  +1  -2   .   .  +2  -1   .   .  +2  -2  -3  -1   .  +1  -1  -1   .   .   .   .
      q-      +0  -1  -1  +2  +1   .  +2  -1  +2  -3  -1  -3   .   .  +2  -1  +2   .   .   .   .   .   .  +1   .
      al-     +0   .   .  -1  -3   .  +2  -2  -3  -2  +4  -1   .  -1  +3  +2  +2   .   .  +1   .   .   .   .   .
      r-      +0   .  +1   .  -4  -5  -1  +1  -1  +1  -3  +2   .   .  -1  +1  +2   .  +1   .   .   .   .  +1  +1
      qol-    -1  +1  +2  -3  -3   .  -3  -2  -2  -2  +4  -1   .   .   .  -1   .  +1  +2   .  +1  +1  +1  +2  +2
      so-     -1  -1  -4   .  -2   .  -2  -1   .  -1  -3  +4  +2   .  -1   .   .  +2  +1   .  +2   .  +1  +1  +3
      dy-     -1   .  -1  -5   .   .  -2   .  -2  -2   .   .   .   .   .  -1   .  +1  +1   .  +1  +1  +3  +2  +3
      sol-    -1  -3   .  -3  -3   .  -3  -1  -2  -2  +2   .   .   .   .  +4   .  +1  +2  +1  +1  +1  +1  +2  +2
      a-      -2  +1  -7  -4  -3  -2  -1   .  -1  -2  -2   .  +3   .  +1   .   .  +1  +2  +2  +1  +1  +2  +2  +2
      or-     -1  -1  -4  -4   .  -2   .  -2  -2   .  -2   .  +1  +1   .   .  +1  +1  +2  +1  +1  +2  +1  +3  +2
      dal-    -2  -2   .  -4  -1  -4  +1  -2  -2  -2  -2   .   .   .   .  +2   .  +1  +2  +1  +1  +1  +2  +2  +2
      ---- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
      TOT    +44 +38 +36 +33 +31 +31 +30 +29 +29 +29 +28 +25 +25 +24 +24 +24 +23 +22 +21 +21 +21 +21 +20 +19 +19
    
  These anomalies are all very low, confirming the independence between the prefixes and suffixes.

ANALYSIS OF UNCOLLAPSED COMPONENTS

  Now let's analyze the various pieces together, without modifications.
  The goal now is to obtain component lists for the probabilistic 
  grammar.
  
  Testing the factoring script:
  
    echo ".sample.wds -> .sample.els"
    cat .sample.wds \
      | factor-words -f factor-text.gawk \
          -v esplit=1 \
      > .sample.els
  
  OK, let's do it:

    mkdir stats/words
    
    foreach ptag ( ${pairs} )
      analyze-pairs -esplit ${ptag}
    end

STATISTICS OF WORDS BY TYPE

  The pairs tw-w are the word itself preceded by its `type' and a dash.
  The `type' has one letter "p", "m", "c", "n", "s" depending on
  whether the crust prefix, mantle prefix, core, mantle suffix, and
  crust suffix are non-empty (or could be, if ambiguous).
    
    foreach ptag ( tw-w )
      analyze-pairs -esplit ${ptag}
    end

CHECKING THE EQUIVALENCE OF GALLOWS

  Gather counts fo words with each gallows letter, minus the same.
  (Exclude words with two or more gallows.)

    foreach f ( t p k f )
      echo .gal-${f}.cts
      cat stats/words/pmcns/tot.frq \
        | egrep -v '[ktpf].*[ktpf]' \
        | gawk ' \
            ($3 ~/['"$f"']/){ \
              gsub(/[ktpf]/,"-",$3); \
              gsub(/[{}]/,"",$3); \
              print $1,$3; \
            } ' \
        > .gal-${f}.cts
    end
    
  Gather counts for words with no gallows, and insert "-" in 
  all possible positions.  (Caution: total count of this file
  will be wrong --- add only lines with "-" in first position.)
    
    cat stats/words/pmcns/tot.frq \
      | egrep -v '[ktpf]' \
      | gawk ' \
          /./{ \
            gsub(/[{}]/,"",$3); \
            w = $3; n = length(w); \
            for (i=0; i <= n; i++) \
              { print $1, (substr(w,1,i) "-" substr(w,i+1)); } \
          } ' \
      > .gal-z.cts
    
  Join the files and plot them:
    
    join-counts .gal-{t,p,k,f,z}.cts > .gal.mct
    
    plot-gallows-freqs .gal .gal
    
  Count consistency:
  
    cat stats/words/pmcns/tot.frq \
      | egrep -v '[ktpf]' \
      | totalize-fields    

      16548 0

    cat stats/words/pmcns/tot.frq \
      | totalize-fields    

      33352 1

  Listing the words with most essential gallows:
  
    foreach which ( tk pf )
      cat .gal.mct \
        | gawk ' \
              /./{ \
                tk = $1+$3; pf = $2+$4; z = $5; \
                s = '"${which}"' \
                if (s < 5){next} \
                printf "%+7.4f %s\n", (log(1+z)-log(1+s))/log(10), $0; \
              } ' \
        | sort -b +0 -1g \
        > .gal-${which}.mcl
    end