Hacking at the Voynich manuscript - Side notes 074 Revising the "U" (Stolfi) transcription Last edited on 2026-01-20 05:21:51 by stolfi INTRO This note is about preparing a new version of the VMS transcription into EVA, to be version 25e1 (meaning 2.5 release 1). It takes off at the abandoned attempt to build version 20e1 described in Notes/672 and Notes/073. This version will be mostly my own (transcriber code ";U"). Until it is completed, the holes will be plugged with Rene Zandbergen's IVT transcription "RF1b-e.txt", from "https://voynich.nu/data/RF1b-e.txt". One reason for doing this effort is investigating the theory that one-leg gallows with hooks are distinct from those without hooks. The main files produced by this note are: "text25e1.ivt" My partial trans, minus the "str" section. "star25e1.ivt" My full trans of the "str" section only. "desc25e1.txt" My verbal descriptions of the pages. "full25rz.ivt" Rene's transcription of the whole VMS. See "Note-074-past.txt" for how these files were created. Editing continues on the first three. The following file is mechanically built from the above: "join25e1.ivt" My trans, with "str", completed from Rene's. There are also the following auxiliary files: "text25e1-intro.txt" Introduction to the 25e1 trans. "text25e1-weirdos.txt" List of the weirdo codes I used. !! NOTE: my weirdo codes differ from Rene's. Mapping will be necessary at some point. "loci-evmt16e6-ivtff.tbl" Table that maps old to new loci. "rene-loci-table.tbl" Table that describes Rene's loci. The following files are the full trans "join25e1.ivt" separated by main section {sec} ("hea", "heb", "zod", "bio", etc.) and text type {txty} ("parags", "labels", "titles", etc.). "st_files/{sec}-{txty}.ivt" Subsets of "join25e1.ivt" SETUP ln -s ../.. work ln -s work/Notes ln -s work/fnum_to_pnum.tbl ln -s work/list_weirdos.sh ln -s work/argparser.py ln -s work/compare_ivtff_files.py ln -s work/ivtff_align.py ln -s work/ivtff_format.py ln -s work/ivtff_frac_word_counts.py ln -s work/process_frac_words.py ln -s work/read_table.gawk ln -s work/bare_bones_ivt.gawk ln -s work/convert_ligatures_in_ivt_file.sed ln -s work/validate_25e1_ivt_format.gawk ln -s work/extract_section_from_ivt.sh DIFFERENCES FROM RENE'S IVTFF FORMAT The main text encoding differences between the format of the ".ivt" files above and Rene's IVTFF format are * Unreadable chars are '?'. * No alignment fillers '!' or missing char codes '%'. * Ligatures shown by '{}'. * Weirdo codes in "U" are "&{NNN}" ({NNN} three digits) without the ';'. * Weirdo codes in "Z" are "@{NNN}" ({NNN} three digits) without the ';'. * Capitalized EVA means "ligated to the next char" or "has platf". * All @I-ligatures like @{Ih}, @{ITh} etc. replaced by @C-ligatures. * All @{i'h} and @{ch'} by @{Sh}. * Implicit ligatures like @'ychcthy' marked @'y{Ch}{CTh}y' Other "metadata" differences: * Inline comments are avoided, "" when needed. * Comment "" denotes intruding figure. * Comment "" denotes crease, fold, or other vellum defect. * The locus ID does not have Rene's position code (",P0" etc.). * Parag and title lines have rail alignment codes [«=»]. * Prefix "<%>" denotes parag head. * Suffix "<$>" denotes parag tail. * Prefix "" on "str" parag head denotes star number {NN}. * Prefix "" on "str" parag head denotes "no starlet". * Prefix "" on "str" line denotes "prev linegap was wide". * Suffix "" on "str" line denotes "next linegap is wide". * The Rosettes page f-number is "f85v2" instead of "fRos". * Rene's page "f101v" is renamed "f101v2". Temporary differences: * Many "U" lines have "<:>" prefix and "<|>" suffix instead of alignment codes [«=»] and parag markers "<%>" and "<$>". Localized differences: * @'okeeey.qokeeey..okeey;.okeey' from is split off as a separate line . * Removed " or?r.m" which seems to be a duplicate of " osaram". * Line was moved to the end of line , leaving only a '?' so that it is not empty. The implicit ligatures like @'ycthhey' were convertd to explicit ones like @'y{CTHh}ey' on all files through "convert_ligatures_in_ivt_file.sed". SAVING THE CURRENT VERSION now="`yyyy-mm-dd-hhmmss`"; echo "now=${now}" # now=2025-07-17-140500 # now=2025-12-05-222721 # now=2025-12-24-013917 # now=2026-01-01-074800 # now=2026-01-07-082054 # now=2026-01-10-064459 # now=2026-01-15-191225 # now=2026-01-16-180738 mkdir -p SAVE/${now} cp -av \ Note-074.txt \ text25e1.ivt star25e1.ivt \ text25e1-weirdos.txt text25e1-intro.txt \ full25rz.ivt join25e1.ivt \ SAVE/${now} cp -av ../073/desc25e1.txt SAVE/${now} chmod a-w SAVE/${now}/*.{evt,txt} REBUILDING THE MERGED FILE AND SEC-TXTY FILES Rebuilding: do_note_074.sh 4485 .join-js.ivt 902 .join-rz.ivt 5387 join25e1.ivt LISTING PUFFS IN STARRED PARAGS FILE ./list_one_gallows_loci.sh star25e1.ivt > .fplocs COMPARING STOLFI AND RENE VERSIONS Wrote a python3 program "compare_ivtff_files.py" to compare two files, line by line, using an optimal alignment algorithm: Using it to compare the Starred Parag versions first: make -f compare_star_Z_U.make First run: ??? read 2414 lines from file 0 = starps-Z.eva ??? read 1313 lines from file 1 = starps-U.eva ??? there were 587 loci from file0 missing in file1 ??? ??? read 2414 lines from file 0 = starps-Z.eva ??? read 1655 lines from file 1 = starps-H.eva ??? there were 1 loci from file0 missing in file1 >>> STOPPED HERE 2026-01-01 TO DO ??? REPLACE <|> <:> BY [=»«] LISTING ALL LOCATORS WITH ONE-LEG GALLOWS Making a list of all line locators in the full EVT that contain [fp] gallows (that may need to be converted to [zw]): ./list_one_gallows_loci_from_Z.sh text20e1-30.evt text25e1-51.evt 0 CONVERTING MORE First stab at converting the old EVMT to Rene's IVTFF format ./convert_evmt_20e1_to_evmt_25e1.sh After many ad-hoc tweaks in the input "text20e1-50.evt", we got the above script to process without errors. * Each /glyph/ occurence in the text is defined as maximal set of strokes that are (or presumably were intended to be) connected by contact or ligatures. In the XEVT format, each glyph is encoded as a pair of parens '()' enclosing a string of one or more XEVA /simple glyph/ codes, like "(v)" or "(Sh)" or "(AKPIHO)" * The XEVA simple glyph codes include the basic lowercase EVA letters [adefik-ty] and the two combinations @{Ch}, @{Sh}, and the platform gallows @{CKh}, @{CTh}, @{CFh}, and @{CPh}. They also include new lowercase codes @b, @g, @u, @j (@e, @a, or @i with plumes or tails), @v (the caret), @x (the picnic table), and @z and @w (versions of @f and @p with an @e-hook at the end of the horizontal arm); thus completing all lowercase letters. They also include @c and capital letters [ACHIOQRSY] denoting the same simple glyphs as the lower case versions with a ligature line added at top right; @E which is an @e that can connect to the bottom of the next glyph; and [KTFPZW] which are the gallows [ktfpzw] with a stroke forming the floor of the platform. * The XEVA simple codes also include weirdo codes like &NNN; where NNN is a 3-digit number. The previous EVMT notation like "s{&123}" or "*{&o'}" is replaced by codes "&123" in XEVA that function as simple letters. Thus the line " ol*{&ol}.ofaiin=" from EVMT 16e6 wluld become " (o)(ll)(&312).(o)(f)(a)(i)(i)(n)=" in the new EVMT file. LISTING WEIRDO USES AND SEFINITIONS The weirdos and non-basic glyphs are encoded as "&{...}". Listing weirdo uses in both files and definitions in the text file: ./list_weirdo_uses_and_defs.sh LISTING LINE-INITIAL AND NON-LINE-INITIAL WORDS cat text25e1.evt \ | egrep -e '^]*[.][^<>]*> TO DO