Jorge's error-tolerant encoding "ECC"

The ECC column is an error-tolerant encoding that I used to prepare the label location map. (The name "ECC" has no particular meaning.)

To produce this encoding, the Voynich characters are broken down into separate pen strokes. Then certain stroke sequences are collapsed back into single characters (not necessarily honoring the original character boundaries), omitting several details that are either too prone to error, that seem to be meaningless calligraphic variations, or that seem to have little semantic value.

In particular, the ECC encoding ignores

The ECC encoding is, therefore, highly ambiguous. The purpose of this encoding is to remove the main sources of noise that could confuse the statistical analysis, even at the cost of losing some significant information.

The ECC encoding uses only the letters 8coqHPemrkij. Here is the approximate correspondence between Frogguy and ECC:

.
    Frogguy:  o    a    9    4    8    ig  
    --------  ---- ---- ---- ---- ---- ----
    ECC:      o    o    o    q    8    8   
            
    Frogguy:  x    ix   iix  iiix
    --------  ---- ---- ---- ----
    ECC:      e    e    e    e   
            
    Frogguy:  e    e'   c    t    s    v    iv   iiv  iiiv
    --------  ---- ---- ---- ---- ---- ---- ---- ---- ---- 
    ECC:      c    c    c    c    c    m    m    m    m    
            
    Frogguy:  2    i2   ii2  iii2 r    ir   iir  iiir 
    --------  ---- ---- ---- ---- ---- ---- ---- ---- 
    ECC:      r    r    r    r    r    r    r    r    
            
    Frogguy:  k    ik   iik  iiik eg  
    --------  ---- ---- ---- ---- ----
    ECC:      k    k    k    k    cj  
            
    Frogguy:  qp   lp   eQPt eLPt
    --------  ---- ---- ---- ----
    ECC:      H    H    cHc  cHc 
            
    Frogguy:  dj   fj   eDJt eFJt
    --------  ---- ---- ---- ----
    ECC:      P    P    cPc  cPc 

Here is the approximate inverse mapping from ECC to FSG:

    ECC   FSG  
    ----  ---  
    8     8, 7    
    H     H, D       
    P     P, F       
    o     A, G, CI, O   
    om    AM, AIN, CIIIL, AN, CM, CN, OM, ...
    or    AR, CIR, OR, OIR, O2, A2, ...
    ak    AIK, CIIK, OK, ...
    c     C        
    cHc   HZ, DZ      
    cPc   PZ, FZ      
    co    CA, CG, CCI, TI, SI
    cc    T, S, CC     
    cj    6        
    e     E, IE, IIE, IIIE
    i     I        
    k     K, IK, IIK, IIIK
    m     M, N, IM, IL, L
    q     4        
    r     R, IR, IIR, IIIR, 2, I2, II2, III2

Note that the inverse mapping is (on purpose) quite ambiguous.

Leaving the cj code for FSG 6 was a minor mistake; I shoudl have mapped it to 8, as I did for FSG 7.


Last edited on 97-10-23 by stolfi