Rene warned me that Gabriel has released verion 1.7 of the interlinear file a few months ago. I fetched it (intrln17.zip) and will try to merge my changes into his file. Perhaps I should also split the text into one unit for each paragraph (preserving line numbers as much as possible). And I may use a less insane structure than one file per unit. It is time that I check out Rene's VTT... unzip intrln17.zip cat INTRLN17.EVT \ | tr -d '\015' \ > landini-intrln17.evt dicio-wc landini-*.evt lines words bytes file ------ ------- --------- ------------ 9454 22765 523013 landini-intrln16.evt 9499 24015 532608 landini-intrln17.evt mkdir L17-eva cp -p landini-intrln17.evt L17-eva/intrln17.evt cd L17-eva cat intrln17.evt \ | egrep -e '^' \ > pages.vars Massaged pages.vars into a list of vtt extraction commands: vtt +QA +PA < intrnl17.evt > f001a.evt vtt +QA +PB < intrnl17.evt > f001b.evt vtt +QA +PC < intrnl17.evt > f002a.evt ... vtt +QT +PV < intrnl17.evt > f115b.evt vtt +QT +PW < intrnl17.evt > f116a.evt vtt +QT +PX < intrnl17.evt > f116b.evt However I decided to first do all the edits to the single file, and split only at the end. cat intrln17.evt | unfold-alternatives > intrln17-u.evt diff intrln17.evt intrln17-u.evt > .diff 198,199c198,200 < [O82|*]OM.OHCCG.OHCOR.2OEOHG.HZ*AR.8AM.ODAM.OE.ODAL- < !!!!!*!OM.OHCCG.OHCAR.ROEOHG.HZAAR.8AM.ODAM.OR.ODAL- --- > O82OM.OHCCG.OHCOR.2OEOHG.HZ*AR.8AM.ODAM.OE.ODAL- > *OM.OHCCG.OHCOR.2OEOHG.HZ*AR.8AM.ODAM.OE.ODAL- > *OM.OHCCG.OHCAR.ROEOHG.HZAAR.8AM.ODAM.OR.ODAL- 265,268c266,271 < 4O.OE%TO[I|C]C2.TCOE.8OE.HZCG.GDOE.8OE.8OEO.GDOE.8OE%T[IC|*]O8G- < 4O.OE.TO!I!!!C2.TCOE.8OE.HZCG.GDOE.8OE.8OEO.GDOE.8OE.T!%%!!!O8G- < [O|A]DOE%SOE.DOE%DCT[G|A].TOE%DG.TOE.HZOE.TO8G.TOE.8AM- < !O!!!DOE.SOE.DOE.DCT!G!!!.TOE.DG.TOE.HZOE.TO8G.TOE.8AM- --- > 4O.OE%TOIC2.TCOE.8OE.HZCG.GDOE.8OE.8OEO.GDOE.8OE%TICO8G- > 4O.OE%TOCC2.TCOE.8OE.HZCG.GDOE.8OE.8OEO.GDOE.8OE%T*O8G- > 4O.OE.TOIC2.TCOE.8OE.HZCG.GDOE.8OE.8OEO.GDOE.8OE.T%%O8G- > ODOE%SOE.DOE%DCTG.TOE%DG.TOE.HZOE.TO8G.TOE.8AM- > ADOE%SOE.DOE%DCTA.TOE%DG.TOE.HZOE.TO8G.TOE.8AM- > ODOE.SOE.DOE.DCTG.TOE.DG.TOE.HZOE.TO8G.TOE.8AM- und so wieter. cat intrln17-u.evt | fsg2eva > intrln17-ue.evt Adding unit code "P" to anonymous locations, and neutralizing bare location codes: cat intrln17-ue.evt \ | gawk \ ' /^) /, "\\1P.\\2", "g", $0); \ $0 = gensub(/^()/, "## \\1", "g", $0); \ print; next; \ } \ /./ {print; next;} \ ' \ > intrln17-uep.evt Validating format: cat intrln17-uep.evt \ | validate-new-evt-format \ -v chars='aoeilmnrchtpkfsqgjdvxy' \ >>& .bugs Removing location codes and line numbers for comparison with interln16e3: cat interln16e3s.evt \ | sed \ -e '1,/Start of synchronised versions/d' \ -e 's/## //' \ -e 's/ *{[$]/ {$/' \ -e '/;[UV]>/d' \ -e '/^# *$/d' \ | gawk \ ' /^) */, "\\1\\2", "g", $0); \ print; next; \ } \ /./ {print; next;} \ ' \ > interln16e3s-uepx.evt cat intrln17-uep.evt \ | sed \ -e '1,/Start of synchronised versions/d' \ -e 's/## //' \ -e 's/ *{[$]/ {$/' \ -e '/^# *$/d' \ | gawk \ ' /^) */, "\\1\\2", "g", $0); \ print; next; \ } \ /./ {print; next;} \ ' \ > intrln17-uepx.evt diff interln16e3s-uepx.evt intrln17-uepx.evt \ | prettify-diff-output \ > .diff grep '{[$]' interln16e3s.evt \ | sed \ -e 's/## *//' \ -e 's/ */ /g' \ -e 's/ *$//' \ > vars-16e3.txt grep '{[$]' intrln17.evt \ | sed \ -e 's/ */ /g' \ -e 's/ *$//' \ > vars-17.txt diff vars-{16e3,17}.txt \ > .diff It seems that the only changes from intrln16.evt to intrln17.evt were comments and the inclusion of Rene's VTT page variables.