Hacking at the Voynich manuscript - Side notes 042 Label occurrences in the text Last edited on 1998-10-28 02:01:59 by stolfi GOAL Looking for occurrences of selected labels in the text. PHARMA LABELS For starters, let's take the labels on page f101v2+f101v1: otaldy={Grove's X.6} otol= yty={Grove's X.9} dokor={was } orar={was } otarar={was } otoly={was } soraly={was } arom= orar,am= dytoly= olkor= dolary={was } odor= olaran= From the VMS concordance (see Notes/037) I manually extracted all occurrences of these labels in the text. In most cases I took only exact matches, but in some cases I tolerated a/o variations. The result is in the file f101v2-labels-xref.txt. The entries in that file that matter begin with "+" and have the following fields: 1 2 3 4 5 6 7 + SEC FNUM LOCUS TRANS LABEL CONTEXT Here LOCUS has the form UNIT.LINE, TRANS is a transcriber code (uppercase letter), LABEL is the original label, and CONTEXT is its occurrence in the text, with some surrounding words. To get here I had to do some cleanup. I deleted the bogus references in f100v.T and f100v.M (which were misplaced copies of those in f101v2). I inserted manually the label as field 5, with a terminating "|", then filtered through gawk -v FS='|' \ ' /^[#<]/{print;next;} \ /^ *$/{print;next;} \ //{gsub(/[ ]/, ".", $2); printf "%-32s %s\n", $1, $2;} \ ' Sorting that file by page: cat f101v2-labels-xref.txt \ | egrep '^[+]' \ | map-field \ -v inField=3 -v outField=3 \ -v table=fnum-to-pnum.tbl \ | sort +2 -4 +5 -6 \ | gawk '//{print $2, $3, $4, $5, $6, $7, $8;}' \ > f101v2-labels-xref-sort.txt This file has fields 1 2 3 4 5 6 7 SEC PNUM FNUM LOCUS TRANS LABEL CONTEXT Counting the number of occurrences of each label: cat f101v2-labels-xref-sort.txt \ | gawk '//{print $6;}' \ | sort | uniq -c | expand \ | sort +0 -1nr 76 otol 23 yty 22 orar 17 otarar 10 otaldy 7 odor 7 otoly 5 olkor 4 dokor 4 dolary 3 olaran 3 orar,am 3 soraly 2 arom 2 dytoly Again, herbal-only: cat f101v2-labels-xref-sort.txt \ | gawk '(($1=="hea")||($1=="heb")){print $6;}' \ | sort | uniq -c | expand \ | sort +0 -1nr 32 otol 15 yty 4 olkor 4 orar 4 otarar 3 odor 2 dokor 2 otaldy 2 otoly 1 dolary 1 dytoly 1 olaran 1 orar,am Tabulating it: cat f101v2-labels-xref-sort.txt \ | format-label-xref \ > f101v2-labels-xref-sort.plt Let's look at the herbal pages that have the low-frequency labels: cat f101v2-labels-xref-sort.txt \ | gawk \ ' /^[<]/{print;next;} \ /^ *$/{print; next;} \ ($1 !~ /hea|heb/) { next; } \ ($6 ~ /^(olkor|orar|otarar|odor|dokor|otaldy|otoly|dolary|dytoly|olaran|orar[,]*am)$/) {print;} \ ' \ | sort +5 -6 +1 -2 heb 097 f50r P.2 F dokor ..ockhody.shos.alol.dy.kar.oky.daiin.okar. heb 118 f66v P.10 F dokor .....kal.daiin.otal.dakar.otam-yteeod.aiin. No obvious resemblance. hea 043 f23r P.11 F dolary ....dar.ykain-ykyka.dalory= No obvious resemblance. heb 090 f46v P.5 F dytoly ..qokedy.chdy.okedy.dykaly.daiin.chedy.okeedy. No obvious resemblance. hea 046 f24v P.13 U odor ..-oeeey.cheol.chol.odor.sho.do.otolodal- heb 089 f46r P.6 F odor .....chdalor.sheedy.odor.aiin.opchedy.dykedy. hea 103 f53r P.6 F odor -ykeodar.oqoor.ockh.odor.chain.qokod-ykchdy. The leaf of f101v2[2,8] = odor has some resemblance to that of f53r[1,1]. Othwerwise, there is no obvious resemblance. heb 094 f48v P.6 F olaran .okar.otar.or.otees.ol.orain-otal.okytar.chedy. No obvious resemblance. heb 063 f33r P.5 F olkor -pair.oraiin.otaiin.olkor.aiin.okal.otal. heb 076 f39v P.5 F olkor ..aiin.okaiin.ckhol.ol.kor.otor.opchy-lkedy. heb 193 f95r1 P.8 F olkor chetchdy.chdy.chkam-olkor.chdaiin.chol.kaiin. heb 194 f95r2 P.6 U olkor .....qopchdy.kary-y.olkor.ol.shol.qotar.chdy. No obvious resemblance. heb 065 f34r P.15 F orar .chor.ar.aiiin.daly-or.ar.ykar.ol.al.oky- heb 077 f40r P.6 F orar ...ar.ar.or.dam-tor.or.ar.shokoram.olshedy. heb 097 f50r P.4 F orar ....qokchdy.qokaiin.or.ar.alol.keodaiin.olr. hea 178 f87v P.12 F orar ....-yksho.qos.arol.or.ar.al.daraiinm-saiin. No obvious resemblance. heb 107 f55r P.1 F orar,am ...chepaiin.qokchdy.or.arod-okair.or.aiin.chody. No obvious resemblance. hea 104 f53v P.13 F otaldy ....adam-ycthadaiin.otaldy= heb 196 f95v1 P.4 F otaldy .qokal.oty.shekshey.otaldy.okshey.ytshedy. No obvious resemblance. heb 066 f34v P.8 F otarar chkain.otain-ysheos.otar.ar.cho.raiin.cheky. heb 076 f39v P.5 F otarar ..kor.or.sheky.kain.otar.or.aiin.okaiin.ckhol. heb 090 f46v P.2 F otarar .okaly.daiin.qokedy.otar.ar.oldy.otedy.saim heb 094 f48v P.6 F otarar ..-shdy.qokain.okar.otar.or.otees.ol.orain- No obvious resemblance. hea 071 f37r P.5 F otoly okchy.qotchor.chkol.otoly-shor.shol.qokchy. heb 094 f48v P.8 F otoly chckhedy.ykedy.oldy-otoly.chey.taly.tokar. The leaves of f37r[1,1] resemble those of f101v2[1,8]. Othwerwise, there is no obvious resemblance. So the coincidences between this Pharma page and the herbal pages seem to be unrelated to the shape of the plants. The following pages have an unusual number of occurrences of labels from f101v2 (not counting the very popular "otol" and "yty"): o s o o o d o o d o d o a o y r o t d l y t l o t o r r t t a r a o k t o a k a l a o o y r a r r o o l r o l a r m l l a r l y a r d r , y r y n y y a m ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- heb f39v | | |* | |* | | | | | | | | | | | heb f46v | | |* | | |* | | | | | | | | | | heb f48v | | |* | | | |* |* | | | | | | | | heb f50r |* | | | | | | | |* | | | | | | | ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- unk f76r |** |* | | | | | | | | | | | | | | ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- bio f79v | | | | | | | | | |* |* | | |** |* | bio f81r |* | | | | | | |* | | | | | | | | bio f84r |* | | | | | |* | | | | | | | | | ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ast f67r1 | | | | | | | | | |* |* | | | | | ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- pha f89v2 | | | |** | | | | | | | | | |** | | pha f99r |* | | | | | | | | |* | | | | | | ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- unk f85r1 | | |* |* | | | | | | | | | |* | | unk f86v6 | | |* | | | | | | | | |* | |*** |* | ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- str f58r |* | | | | | | | | |* | | | | | | str f104v |* | |* | | | | | | | | | | | | | str f106r |* | |* | | | | | | | | | | |* | | str f113v | | |****| | | | | | | | | | |* | | str f115r |** | |* | | | | | | | | | | | | | ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- A notable coincidence: "otaldy" and "dolary", which occur together on page bio f79v (the Mermaid) also occur as consecutive radial labels in the "astro" diagram f67r1: at 00:00 (first) and 11:00 (last). Perhaps f101v2 is bad because the labels were copied from the wrong page altogether. Note that there are three rows of plants but only two have labels. Let's do the same search with the labels on f100v, which seem to have been writen with more care. Here are they: tolchd={was } chols={was } opchor={was } solsy={was } soleesos={was } ykchochdy={was } ykchdy={was } dchdy={was } dalsy={was } okcheor={was } ytchol={was } dykchal={was } chos.cthoral={was } The references (extracted manyally from the concordance) are in f100v-labels-xref.txt. Processing them: cat f100v-labels-xref.txt \ | egrep '^[+]' \ | map-field \ -v inField=3 -v outField=3 \ -v table=fnum-to-pnum.tbl \ | sort +2 -4 +5 -6 \ | gawk '//{print $2, $3, $4, $5, $6, $7, $8;}' \ > f100v-labels-xref-sort.txt This file has fields 1 2 3 4 5 6 7 SEC PNUM FNUM LOCUS TRANS LABEL CONTEXT Counting the number of occurrences of each label: cat f100v-labels-xref-sort.txt \ | gawk '//{print $6;}' \ | sort | uniq -c | expand \ | sort +0 -1nr 39 otchol 34 ykchdy 30 opchor 11 chols 7 dchdy 4 okcheor 3 tolchd 2 dalsy 1 chos.cthoral Again, herbal-only: cat f100v-labels-xref-sort.txt \ | gawk '(($1=="hea")||($1=="heb")){print $6;}' \ | sort | uniq -c | expand \ | sort +0 -1nr 32 otchol 23 opchor 13 ykchdy 7 chols 3 dchdy 2 okcheor Let's look at the herbal pages that have the low-frequency labels: cat f100v-labels-xref-sort.txt \ | gawk \ ' /^[<]/{print;next;} \ /^ *$/{print; next;} \ ($1 !~ /hea|heb/) { next; } \ ($6 ~ /^(chols|dchdy|okcheor)$/) {print;} \ ' \ | sort +5 -6 +1 -2 hea 005 f3r P.16 F chols .otchom.oporar-oteol.chol.s.cheol.ekshy.qokeom hea 010 f5v P.1 U chols ...char.ytchey.pshod.chols.chodaiin.ytoiiin hea 045 f24r P.18 F chols ..-ycheol.chol.daiin.chol.s-yol.tol.chol.shom hea 051 f27r P.7 F chols .cheol.pchy.schey.ly-chals.cham-ytchy.chy hea 082 f42v P.1 F chols ...sheey.qocho.taiin.shols-chol.chor.dain- hea 189 f93r P.19 U chols .hodaiin.shody-tchor.shol.s.sheoky-ychockhy heb 195 f95v2 P.5 F chols ......ar-daiin.ykaly.chals.shedaiin.olaiiny No obvious resemblance. heb 075 f39r P.5 U dchdy ..chees.aly.okalchem-dchdy.chdy.ykaiin= heb 079 f41r P.8 F dchdy ..chedy.chckhy.qokey.dchdy-qokedy.qokyl.cheked heb 196 f95v1 P.6 F dchdy ......=tshdal.qokshy.dchdy.shedy.dkshey.chefar No obvious resemblance. hea 042 f22v P.5 U okcheor ..-odaiin.ytaiin-dor.ykcheor.daii**= hea 088 f45v P.8 F okcheor ...dshy.otyol-ytchom-ykcheor.odal.sho.dy.pchom- No obvious resemblance.