A Concordance of the VMS

This page describes a complete word index (concordance) of the VMS which was announced last year to the Voynich list, and has now been rebuilt with some impreovements for release 1.6e6 the EVA interlinear file.


The VMS concordance is merely a sorted list of every word of the VMS (and many short phrases), with context, in the following format:

-------------------------------------------- doineeoeeeoe ----------------------

pha f89v2.P1.3   H        /yche* okeey qoeol daiin-chor chor cheos qol **eey 
hea f35v.P.21    ACFH      shy dchy ckhy dan/doiin chor chor=

-------------------------------------------- doineeoeeo ------------------------

hea f47v.P.11    ACH          key chyky-dchy daiin chy/cho chokeesy chy chy 
hea f47v.P.11    F            key chyky-dchy daiin chy/chy chokeesy chy chy 
hea f32r.P.18    ACF         /chokeol dchoty/doiin shoshy/dol dchol dan=
hea f32r.P.18    H           /chokeol dchoty/doiin sho shy/dol dchol dan=

Each entry lists the section, the page and line, the transcriber codes, some left context, the indexed word/phrase, and some right context.

Ordering of entries

The entries are sorted and grouped on the basis of their "pattern", a string derived from the indexed phrase by discarding some easily confused details (such as spaces, plumes, ligatures, gallows eyes, minor shape details, etc.) and the q prefix, if any. For instance the phrases daiiinchy ckhy and daiin/sho cthy yield the same pattern doineeoeteo, and are thus listed together.


This version of the concordance covers all transcriptions in the 1.6e6 interlinear file, not just a "best pick" as in the previous version. In particular it includes Takeshi Takahashi's new complete transcription (code H). It also includes an artificial version (code A, in brighter color) derived from all the other versions by "majority vote", character by character.

I believe that all existing VMS text is now completely covered by the interlinear, except perhaps for some text in the rightmost 1/3 of the nine-rosette diagram (f85v/f86r).

This concordance includes all single words (text and labels) from the interlinear. It also includes all short phrases (up to 17 characters) that yield the same pattern as one of those words. Finally it includes every short phrase whose pattern occurs in two or more distinct locations.

On the other hand the concordance does not list words and phrases that contain "unreadable" characters ("*" or "?") Note that these characters also denote ties in the majority version; so many words are listed only in the individual versions.

For clarity, the EVA word spaces "." and "," are printed as " ". Indexed phrases may span line breaks ("/") and gaps due to figures or vellum defects ("-"), but not paragraph breaks or page boundaries ("="). For this purpose, each label is considered a single paragraph.

For technical reasons, the context phrases are always taken from the majority version, even when the indexed string comes from a "minority" one.

Control experiments

As a kind of "control experiment", mainly to illustrate the effect of "pattern sorting", this site includes also full concordances, built on the same principles, for the following texts:

