Hacking at the Voynich manuscript - Side notes 063 Merging the EVMT files Last edited on 2004-09-10 22:59:19 by stolfi On September 10, 2004 7:10 Gabriel Landini sent me a copy of several files that he and Rene Zandbergen prepared as a result of the EVMT transcription effort. I had offered help with merging/transcribing them. LINK SETUP ln -s /home/staff/stolfi/voynich/work ln -s /home/staff/stolfi/voynich/docs ln -s docs/GabrielLandini/evmt-files mkdir merged TRASCRIBER CODE ASSIGNMENT Transcriber codes A-Z are already assigned except for E,M,S. While working on the interlinear I will need more codes, so let's use lowercase letters. This will require fixing many scripts... Text to be added to the transcriber code list (file f0.I of the interlinear): # These lower-case codes were reserved for the EVMT project: # # a: EVMT official transcription # b: Second choice from [|] in EVMT # c: Third choice from [|] in EVMT # g: Readings by Gabriel Landini for the EVMT project # r: Readings by Rene Zandbergen for the EVMT project PLANNING Rough plan, to be thought over: In alpha03.txt: * Map EVMT locators to Stolfi locators. * Unfold [|] into separate readings, adding version codes. * Remove page-specific comments to separate files. * Mark pre-comments and post-comments somehow. * Attach sequence numbers to each line (especially to comments) according to desired order of units and lines. * Sort file according to new order. In the interlinear: * Convert my ad-hoc weirdo codes to new codes. * Remove page-specific comments to separate files. * Attach sequence numbers to each line according to desired order of units and lines. Both: * Merge text from alpha03 and interlinear using the line sequence numbers. * Merge manually the page-specific comments. * Uniformize conventions for inline comments, "-", etc. * Align mechanically with interlinear. * Inspect alignment and fix manually. * Compute consensus and majority versions. Verification: * Extract the EVMT version from the new interlinear. * Convert this file to the order and format of the original alpha03.txt * Diff with original alpha03.txt, show to Rene and Gabriel. ASSIGNING SEQ NUMBERS Sequence numbers can be PPP.LLLL.CCC where PPP is the pnum, LLL is the contents line number within the page, and CCC is a sequence number for comments (001-499 for pre-comments, 500 for the contents line, 501-999 for post-comments.)