Summary of previous notebooks
=============================

  On 97-07-05 I obtained Landini's interlinear transcription of the VMs, version 1.6
  (landini-interln16.evt) from
  http://sun1.bham.ac.uk/G.Landini/evmt/intrln16.zip
  
  I manually extracted from it a homogeneous, full-text sample
  bio-m-evt.evt, consisting of pages 147-166 (f75r--f84v) of the
  "biological" section, in Currier's Language B, hand 2.  This section
  includes Currier's and Friedman's transcriptions.  Currier's seems
  to be the most complete of them.
  
  The two versions have many differences (affecting 5-10% of the
  words), and often disagree even in the grouping of symbols: where
  one sees two words the other sees a single word, what is [A] for one
  may be [CI] for the other, and so on.
  
  So I decided to break all characters doen to individual "logical"
  strokes, and use one (computer) character to encode each stroke.
  I called this new encoding "jsa" (Jorge's Super-Analytic). 
  
  After mapping to jsa, I generated a "consensus" version
  of the biological section 
  
    cat bio-m-evt.evt \
      | fsg2jsa \
      > bio-m-jsa.evt
      
    cat bio-m-jsa.evt \
      | make-consensus-interlin \
      > bio-x-jsa.evt
  
    cat bio-x-jsa.evt \
      | egrep '^<.*;J> ' \
      | sed \
          -e 's/{[^}]*}//g' \
      > bio-j-jsa.evt

    extract-words-from-interlin \
        -chars "qocilgysxju" \
        bio-j-jsa.evt \
        bio-j-jsa
        
     lines   words     bytes file        
    ------ ------- --------- ------------
      7054    7054     62690 bio-j-jsa.wds
      2132    2132     24925 bio-j-jsa.dic
      4661    4661     40897 bio-j-jsa-gut.wds
       992     992      9720 bio-j-jsa-gut.dic
       840     840      2445 bio-j-jsa-fun.wds
         2       2         5 bio-j-jsa-fun.dic
      1553    1553     19348 bio-j-jsa-bad.wds
      1138    1138     15200 bio-j-jsa-bad.dic

   Digraph counts:

                  q     o     c     i     l     g     y     s     x     j     u   TOT
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
            .  1398   965  1877   361    60     .     .     .     .     .     .  4661
      q     1     .  1229    18     .     1   154     .     .     .   700     .  2103
      o    21   486     1    63  1087  1071     .     .     .     .     .     .  2729
      c     4   167   176  6137  1209   232  2114  2921  1019     .     .     . 13979
      i     4     1     1     8  1997     2     .     .   560  1616    37   457  4683
      l     .     .     .     .     .     .    16     .     .     .  1566     .  1582
      g    52     .    74  2150     4     4     .     .     .     .     .     .  2284
      y  2790    26     2    47    13    43     .     .     .     .     .     .  2921
      s   463     1    99  1013     1     2     .     .     .     .     .     .  1579
      x   827    24   105   488     5   167     .     .     .     .     .     .  1616
      j    46     .    76  2175     6     .     .     .     .     .     .     .  2303
      u   453     .     1     3     .     .     .     .     .     .     .     .   457
        ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
    TOT  4661  2103  2729 13979  4683  1582  2284  2921  1579  1616  2303   457 40897

  Some conclusions we get from this and other data:
  
    \ci/ and \o/ are lexically similar but distinct letters. 
    
    The valid \i/ sequences are \ij/  \is/ \iis/ \iiu/ \iiiu/ \ix/;
    the others are likely to be scription or transcription errors.
    
    \qo/ is a combination that occurs only in word-initial position.