Hacking at the Voynich manuscript - Side notes 074 Revising the "U" (Stolfi) transcription Last edited on 2026-01-10 06:45:13 by stolfi INTRO This note was about an attempt at preparing a new version of the EVMT ("European Voynich Multi-Transctiption") file, to be version 25e1 (meaning 2.5 release 1). It would take off at the abandoned attempt to build version 20e1 described in Notes/672 and Notes/073. However, Rene Zandbergen has created a much improved version (IVTFF) of the transcription format and software. So this note is now redirected at preparing my own transcriptions (";U" in the interlinear) in a format compatible with that new standard. It requires, among other things, changing the format of all locators, and mapping the encoding I have been used to his chosen one. One reason for doing this effort is investigating the theory that one-leg gallows with hooks are distinct from those without hooks. LINKS ln -s ../.. work ln -s work/Notes ln -s work/fnum_to_pnum.tbl ln -s work/list_weirdos.sh ln -s work/argparser.py ln -s work/compare_ivtff_files.py ln -s work/ivtff_align.py ln -s work/ivtff_format.py ln -s work/ivtff_frac_word_counts.py ln -s work/process_frac_words.py ln -s work/read_table.gawk ln -s work/bare_bones_evt.gawk ln -s work/convert_ligatures_in_evt_file.sed ln -s work/validate_25e1_evt_format.gawk OLD EVT INPUT FILES We start from the file Notes/672/text20e1-03.evt which is the version of EVMT 20e1 that was prepares in 2005-02-03 but never released. It probably needs extensive checking. chmod a-w Notes/672/text20e1-03.evt ln -s Notes/672/text20e1-03.evt We must massage the ".evt" file a bit before automatic conversion. For one thing, it has too many groups like "(cht)", and it is not clear what they should really be like. probably not consistent. The plan is to inspect the groups /[(][ci][ktfpzw]*h+[)]/ and /[(]sh+[)]/. If they ARE ligated in the usual way, just remove the parens, as in the EVMT 16e6 format, since they will be converted to /CI[KTFPZW]*H*h/ and /SH*h/ by the automatic EVA-XEVE conversion scripts. If they are NOT ligated, convert them to symbolic weirdos "*{&...}" where "..." is the XEVA code for the ligated combination cp text20e1-03.evt text20e1-30.evt Inspected and edited text20e1-30.evt by hand, replacing weirdos and weird ligatures by codes like &{OPr} or &{310}. !! NOTE: my weirdo codes differ from Rene's. Mapping will be necessary at some point. EXTRACTING MY TRANSCRIPTION Removing from "text20e1-30.evt" all transcriptions except ";U": cat text20e1-30.evt \ | remove_non_stolfi_transcriptions.gawk \ > text25e1-50.evt Saved the version "text20e1-30.evt" just in case: now=2025-05-31-095000 mkdir SAVE/${now} chmod a-w text20e1-30.evt mv -vi text20e1-30.evt SAVE/${now}/ (It seems that a tentative IVTFF version "text25e1-01.xev" was created in 20205-05 from text20e1-30.evt but then editing continued in "text25e1-51.evt". Saved "text25e1-01.xev" just in case.) Proceeded to manual edit of "text25e1-51.evt". Split off "text25e1-weirdos.txt" - weirdo code definitions. "text25e1-intro.txt" - the introductory comments. "../076/starps-U.eva" - the Starred Parags section (SPS, f103r to f11r line 30). Also extracted from "text20e1-30.evt" the versions by Takeshi (';H') and Rene (';Z') of the SPS: "../076/starps-H.eva" "../076/starps-Z.eva" The files "../076/starps-U.eva" and "../076/starps-Z.eva" were uniformized until the TEXT of the former was a subset of the latter. But the #-COMMENTS were not. Then "../076/starps-U.eva" was saved to "SAVE/2025-07-15-200047/starps-U.eva" and "../076/starps-Z.eva" renamed "../076/starps-U.evt" with ";U" codes replacing the ";Z" codes. See "../076/Note-076.txt" ln -s ../076/starps-U.evt See also: "../073/desc25e1-51.txt" - per-page verbal descriptions. Continued editing "text25e1-51.evt" Replaced all weirdo codes notations "&\{{NNN}\}" by "&{NNN};". Replaced all EVA font notations in comments, from "<{CHAR}>" to "@{CHAR}", and from "<{TEXT}>" to "@'{TEXT}'" Replaced all inline comments "{...}" by "". OBTAINING THE LAST IVTFF FILE Downloaded the latest version of Rene's "reference" transcription, "RF1b-e.txt", from "https://voynich.nu/data/RF1b-e.txt". Renamed it "text25rz-40.txt", made it readonly. Needed to fix the page of the rosette from "fRos" to "f85v2" otherwise all my scripts would break. So made a copy cp text25rz-40.txt text25rz-41.txt chmod u+w text25rz-41.txt Edited replacing "fRos" by "f85v2". Also replaced page "f101v" by "f101v2" for consistency. Also split off the @'okeeey.qokeeey..okeey;.okeey' from line as a separate line , assuming it is a "title", for compatibility with "star25e1.evt". Also removed " or?r.m" which seems to be a duplicate of " osaram". CHANGING THE PARAGRAPH MARKERS In the old EVMT format, parags were marked only by "=" at the end of the tail line and "-" at the end of every other lines. As preparation to fix the paragraph markers, Changing temporarily the line-final "-" by <|>. Prefixing every text line after a <|> with <:>. Replacing every final "=" with "<$>". Looking for lines that do not end with "<$>" or "<|>" and fixing them: cat text25e1-52.evt \ | sed \ -e 's:^[ ]*\([#]\|[@][@]\|$\).*::g' \ -e 's:^.*::g' \ -e 's:^.*<[|$]> *$::g' \ | egrep --color=auto -nH --null -e '.' \ | sed -e 's:[(]standard input[)]:text25e1-52.evt:g' \ > .bugs Adding start-of-parag markers: ./add_parag_markers.gawk text25e1-52.evt \ > .tmp prdiff -Bb text25e1-52.evt .tmp | head -n 200 > .diff # now="`yyyy-mm-dd-hhmmss`" now="2025-07-15-171634" mkdir -p SAVE/${now} mv -vi text25e1-52.evt SAVE/${now}/ chmod a-w SAVE/${now}/text25e1-52.evt mv -vi add_parag_markers.gawk SAVE/${now}/ mv .tmp text25e1-53.evt SIMPLIFYING THE NAMES mv -vi text25e1-53.evt text25e1.evt mv -vi desc25e1-53.evt desc25e1.evt mv -vi star25e1-53.evt star25e1.evt SAVING THE CURRENT VERSION now="`yyyy-mm-dd-hhmmss`"; echo "now=${now}" # now=2025-07-17-140500 # now=2025-12-05-222721 # now=2025-12-24-013917 # now=2026-01-01-074800 # now=2026-01-07-082054 # now=2026-01-10-064459 mkdir -p SAVE/${now} cp -av \ text25e1.evt star25e1.evt \ text25e1-weirdos.txt text25e1-intro.txt \ text25rz-41.txt \ SAVE/${now} cp -av ../073/desc25e1.txt SAVE/${now} chmod a-w SAVE/${now}/*.{evt,txt} LISTING PUFFS IN STARRED PARAGS FILE ./list_one_gallows_loci.sh star25e1.evt > .fplocs REPLACING IMPLICIT LIGATURES BY EXPLICIT ONES We want to replace implicit ligatures like @'ycthhey' by explicit ones like @'y{CTHh}ey'. Piped all four files through "convert_ligatures_in_evt_file.sed" COMPARING STOLFI AND RENE VERSIONS Wrote a python3 program "compare_ivtff_files.py" to compare two files, line by line, using an optimal alignment algorithm: Using it to compare the Starred Parag versions first: make -f compare_star_Z_U.make First run: ??? read 2414 lines from file 0 = starps-Z.eva ??? read 1313 lines from file 1 = starps-U.eva ??? there were 587 loci from file0 missing in file1 ??? ??? read 2414 lines from file 0 = starps-Z.eva ??? read 1655 lines from file 1 = starps-H.eva ??? there were 1 loci from file0 missing in file1 MERGING CURRENT VERSIONS AND PARSING INTO ELEMENTS Let's create a file "join25e1.evt" that has my own transcriptions with the holes completed with Rene's transcription: merge_my_rz_transcriptions.sh 25e1 25rz-41 4373 .join-js.evt 1014 .join-rz.evt 5387 join25e1.evt >>> STOPPED HERE 2026-01-01 TO DO ??? REPLACE <|> <:> BY [=»«] LISTING ALL LOCATORS WITH ONE-LEG GALLOWS Making a list of all line locators in the full EVT that contain [fp] gallows (that may need to be converted to [zw]): ./list_one_gallows_loci_from_Z.sh text20e1-30.evt text25e1-51.evt 0 CONVERTING MORE First stab at converting the old EVMT to Rene's IVTFF format ./convert_evmt_20e1_to_evmt_25e1.sh After many ad-hoc tweaks in the input "text20e1-50.evt", we got the above script to process without errors. * Each /glyph/ occurence in the text is defined as maximal set of strokes that are (or presumably were intended to be) connected by contact or ligatures. In the XEVT format, each glyph is encoded as a pair of parens '()' enclosing a string of one or more XEVA /simple glyph/ codes, like "(v)" or "(Sh)" or "(AKPIHO)" * The XEVA simple glyph codes include the basic lowercase EVA letters [adefik-ty] and the two combinations @{Ch}, @{Sh}, and the platform gallows @{CKh}, @{CTh}, @{CFh}, and @{CPh}. They also include new lowercase codes @b, @g, @u, @j (@e, @a, or @i with plumes or tails), @v (the caret), @x (the picnic table), and @z and @w (versions of @f and @p with an @e-hook at the end of the horizontal arm); thus completing all lowercase letters. They also include @c and capital letters [ACHIOQRSY] denoting the same simple glyphs as the lower case versions with a ligature line added at top right; @E which is an @e that can connect to the bottom of the next glyph; and [KTFPZW] which are the gallows [ktfpzw] with a stroke forming the floor of the platform. * The XEVA simple codes also include weirdo codes like &NNN; where NNN is a 3-digit number. The previous EVMT notation like "s{&123}" or "*{&o'}" is replaced by codes "&123" in XEVA that function as simple letters. Thus the line " ol*{&ol}.ofaiin=" from EVMT 16e6 wluld become " (o)(ll)(&312).(o)(f)(a)(i)(i)(n)=" in the new EVMT file. LISTING WEIRDO USES AND SEFINITIONS The weirdos and non-basic glyphs are encoded as "&{...}". Listing weirdo uses in both files and definitions in the text file: ./list_weirdo_uses_and_defs.sh LISTING LINE-INITIAL AND NON-LINE-INITIAL WORDS cat text25e1.evt \ | egrep -e '^]*[.][^<>]*> TO DO