Label reference maps

Notes on the source text and encoding

The source text used to prepare this map is Gabriel Landini's interlinear transcription of the VMs, version 1.6.

The file was slightly edited to simplify my processing. The main change was replacing the "anonymous" text locations by specific locations (usually ".P", but sometimes ".P1", "P2", or "R".)

From this file, I manually extracted two sub-files: the labels and the the paragraphic text (or parags for short).

Labels file

For the puposes of this map, a label is an isolated short bit of Voynich text (a couple of words at most). This definition includes the "day names" from the zodiac diagrams, the labels of stars in the "astro" section, and the labels of plants in the "pharma" section, and so forth. It includes also the "words" in the left column of page f66r.

Unfortunately, the interlinear file contains only a small subset of all the labels in the VMs. The bulk comes from these pages:

  location   section  L & H  text type        comments            
  ---------  -------  -----  ---------------  --------------------
  f66r       ?        B  ?   words            left col of a table 
   
  f67r1      astro    ?  ?   labels           on sectors          
  f67v2      cosmo    ?  ?   labels           on diagram          
   
  f68r2      astro    ?  ?   labels           on stars            
  f68v3      astro    ?  ?   labels           on diagram          
  f68v2      astro    ?  ?   labels           radial              
   
  f70v2      zodiac   ?  ?   labels           on stars            
  f70v1      zodiac   ?  ?   labels           on stars            
  f71r       zodiac   ?  ?   labels           on stars            
  f71v       zodiac   ?  ?   labels           on stars            
  f72r1      zodiac   ?  ?   labels           on stars            
  f72r2      zodiac   ?  ?   labels           on stars            
   
  f88r       pharma   A  4   labels           under plants        
  f88v       pharma   A  4   labels           under plants        
  f89r1      pharma   A  ?   labels           under plants        
  f89r2      pharma   A  ?   labels           under plants        
  f89v2      pharma   A  ?   labels           under plants        
  f89v1      pharma   A  ?   labels           under plants        
  f100r      pharma   A? 4?  labels           under plants        
  f100v      pharma   A? 4?  labels           under plants        
  f101v1     pharma   A? 4?  labels           under plants        

Parags file

The parags sub-file consists of all multi-line text blocks that seem to be continuous paragraphs of "prose" text, broken into lines in the usual way. In particular, it includes the right-hand text in page f66r.

(The balance of the interlinear file comprises text in circles, isolated lines, titles, and the columns of isolated letters in f49v and f66r.)

Reencoding

The labels and parags sub-files were converted to a special encoding, named "ECC" for no good reason, that discards those character details that are most prone to transcription error or appear to have low semantic value.

As part of the ECC recoding, all the inter-word spaces (denoted by "." in the interlinear file) were discarded. Line breaks were preserved, however.

Mechanical consensus

After the ECC recoding, all the transcriptions for each line were mechanically combined into a "consensus" transcription, producing the encoded labels and the the encoded parags files, respectively.

The labels file was then sorted and cleaned, discarding duplicates and labels that had conflicting transcriptions or unreadable characters. Any label with an embedded "-" (signifying an intruding star, plant stem, or other element of a drawing) was entered twice: once as single label, with the "-" removed, and once as two distinct labels, as if the "-" was a line break. After these transformations, there remained 231 distinct labels, ranging between 2 and 18 ECC characters.

The encoded parags file was not subjected to any further cleanup. In particular, embedded "-" codes were left there. It comprised 3918 lines and 164571 ECC characters of Voynich text.

Label reference index

Each encoded label was then searched in the encoded parags file. The position of each match was recorded into an index file, both as a (line-num, char-offset) pair and as a total count of Voynich characters since the beginning of the encoded parags file. The location where the label was first defined was also listed there, in curly braces.

This index was then used to build the label reference maps and tables.

Another version of the index file, with the ECC labels replaced by their FSG originals, was used to produce a printout of selected pages showing the matches and near matches for selected labels.


Last edited on 97-10-23 by stolfi