Hacking at the Voynich manuscript - Side notes 046 Producing an interlinear for the EVMT team with EVMT loci Last edited on 1999-08-27 06:19:48 by stolfi The goal of this note is to produce an interim version of the interlinear with EVMT loci and in EVMT order, so that they can compare it to their files. This conversion requires discarding most comments and losing some text attribute information, hence it is not something I want to do now to my "official" interlinear. ESTABLISHING THE MAPPING BETWEEN STOLFI LOCATORS AND EVMT LOCATORS On 1998-12-22 Rene sent me a file with the first 30 characters in each line of the upcoming EVMT file, with the almost-definitive EVMT locators. From this I proceeded to create a table mapping Stolfi line locators to EVMT ones. The first step is to reformat Rene's file, which has the locator followed immediately by the text, all truncated to 30 columns. To reuse the scripts, we need to have the locator on columns 1-19 and the first 20 chars of text on columns 20-39: cat RZ-locations.cut \ | sed \ -e 's/ //g' \ -e 's/>/> /g' \ | gawk \ ' / zan.clp Then we collect Stolfi's interlinear and build an analogous file with Stolfi's locators: cat L16+H-eva/UNITS \ | gawk -v FS=':' '/./{print $2;}' \ | egrep -v 'f0' \ | egrep '[.]' \ > .data.units set units = ( `cat .data.units` ) Checking: ( cd L16+H-eva && ls f[1-9]*.* | egrep -v '[~]$' ) | sort > .foo cat .data.units | sort > .bar diff .foo .bar ( cd L16+H-eva && cat $units ) \ | basify-weirdos \ | best-pick \ > sto.evt cat sto.evt \ | adjust-sto-pages-like-zan-pages \ | sync-clip-evt -v pageSize=170 \ > sto.clp Let's check whether we have the same set of pages, in the same order, with same number of lines: dicio-wc sto.clp zan.clp grep '##' sto.clp > sto.pages grep '##' zan.clp > zan.pages diff sto.pages zan.pages 168a169 > ## 174d174 < ## This difference is due to a different ordering of pages. Eventually I must fix my UNITS table to match the EVMT order. OK, now we paste these two files side-by-side: /n/gnu/bin/paste -d' ' sto.clp zan.clp > sto-zan.clp We then edit sto-zan.clp manually, shifting and permuting the right half of each page (locators included) until the two truncated texts on each line are two versions of the same VMS line. Unmatched half-lines are mapped to fantasy locators on the other file. [ More editing done on 1999-08-28 ] Then we delete the truncated text columns, leaving only the locators (preliminary and interlinear). The resulting file is saved as sto-zan-locs.tbl. CONVERTING THE INTERLINEAR [ What follows needs to be redone due to edits of 1999-08-28 ] Next, we create a preliminary file sto-m.evt with Stolfi's text and EVMT line locators. We discard all unit and page headers, since there isn't a 1-1 correspondence between Stolfi and EVMT text units. We will rebuild those lines later, after sorting the file in EVMT order. ( cd L16+H-eva && cat $units ) \ | unbasify-weirdos \ > inter.evt cat inter.evt \ | egrep -v '^##' \ | grep -v 'Last edited' \ | grep -v ' transcription by ' \ | map-locations \ -v table=sto-zan-locs.tbl \ -v pedantic=1 \ > inter-m.evt Next we need to sort the lines to match Rene's order. First we create a file with all EVM locators in order: cat RZ-locations.cut \ | egrep '<.*[.].*>' \ | sed \ -e 's/>.*$//g' \ -e 's/ zan-orig.locs Nex we list all new EVMT locators that need to be addded to port the interlinear: # cat zan-orig.locs > zan.locs cat inter-m.evt \ | egrep '^<.*[;].*>' \ | sed \ -e 's/[;].>.*$//g' \ -e 's/ zan-extra.locs Next we insert by hand these new locations in the list zan-orig.locs, in the proper order, and save the result as zan.locs. We now create a table that maps EVMT line locators to serial numbers: cat zan.locs \ | gawk '/./{n++; print $1, n;}' \ > zan-loc-to-order.tbl We now write a script that inserts the reading order and transcriber code in front of each line of the interlinear. The script is not entirely trivial because it must try to attach the order also to the #-comments that immediately precede each text line. cat inter-m.evt \ | attach-reading-order \ -v table=zan-loc-to-order.tbl \ | sort +0 -3 \ | gawk '/./{print substr($0,15);}' \ > inter-ms.evt Gzipped it and mailed URL to Gabriel and Rene.