Hacking at the Voynich manuscript - Side notes 100 Preparing a clean Voynichese sample for analysis Last edited on 2023-05-14 15:54:42 by stolfi SUMMARY We prepare clean Voynichese samples of prose and labels (without weirdos, unreadable characters, or contentious readings) for the statistical analyses that will go into the "word structure" technical report. SETTING UP THE ENVIRONMENT Links: ln -s ../tr-stats/dat ln -s ../tr-stats/exp ln -s ../../../work ln -s work/basify-weirdos ln -s work/compute-cum-cum-freqs ln -s work/compute-cum-freqs ln -s work/compute-freqs ln -s work/combine-counts ln -s work/totalize-fields ln -s work/select-units ln -s work/words-from-evt ln -s work/format-counts-packed ln -s work/update-paper-include REFERENCE DATA The source data will be the interlinear release 1.6e6, or the majority edition derived from, already chopped into sections. DIRECTORY STRUCTURE The data files (text, word counts, tables, etc.) for each sample text will live in the subdirectory dat/LANG/BUK/DIV.K/,where LANG the sample's language. Two samples should have the same LANG only if they use the same spelling for shared words. Thus, English and Italian are different LANGs. Different encodings of Chinese (pinyin, GR, RomanNum) are different LANGs. Medieval French and Modern French are different LANGs. The Bible (with modern spelling) and War of the World are the same LANG. After much analysis, it seems that we can assign a single LANG ("voyn") to all parts of the VMS. BUK the book. Two samples with the same LANG and BUK should be by the same author and part of the same book. For Voynichese, BUK is "tak" - the whole text, Takeshi's version. "maj" - the whole text, majority vote version. "prs" - running prose (parags, lines, etc.) from "maj". "lab" - labels, titles, word lists, etc. from "maj". "ini" - first word after each break, from "prs". "fin" - last word before each break, from "prs". "mid" - all of "prs" except "ini" and "fin". For the other languages, BUK is a book tag (e.g. "wow" for War of the Worlds, "ptt" for the Pentateuch). DIV the major division within the book. The divisions must be disjoint. Partition of the book into divisions is worth the trouble only if the usage of common words is expected to vary significantly between divisions (due to differences in subject matter and/or style), and those differences are considered relevant for the analysis. For the VMS, each classical section (Biological, Pharmaceutical, etc.) is a separate division, except that we split the Herbal section into two divisions "hea" and "heb". In the Culpeper herbal, the preamble, plant descriptions, and recipes could be in three separate divisions. In the Pentateuch, we could let each of the five books be a separate division. And so on. K the sub-division of DIV. For Voynichese, a subdivision is a maximal string of *consecutive* pages that belong to the same DIV; e.g. the Herbal-A consists of two separate sets of pages, "hea.1" and "hea.2". For other languages, we usually don't need to have more than one subdivision. In this note, each sub-division will be called a "section". Whether a sample is partitioned into sections or not, it always has a section "tot.1" which is the entire sample (hence the union of all other sections). EXTRACTING THE RAW EVT-FORMATTED TEXT List of Voynichese "books": lang=voyn books=( maj prs lab ini mid fin tak ) Let DIR be the directory "dat/LANG/BUK/DIV.K". The file "DIR/raw.evt" contains the text of that section extracted from the source EVT file, with each weirdo converted to an equivalent basic EVA char, or to "*" if impossible. (This file is not generated for the "ini", "mid", and "fin" books, which use the same EVT source as the "prs" book.) The file "DIR/raw.tlw" contains the raw sequence of tokens and paragraph delimiters, one per line, in the format TYPE LOC STRING, where LOC is the line location code and TYPE is the type of STRING ("p" = punctuation, "s" = symbol, "a" = alpha word). The file "DIR/raw.wfr" contains the corresponding words with occurrence counts and relative frequencies. REMOVING BAD WORDS Next we separate the "bad" words, those with unreadable characters weirdos, or combinations that are considered "invalid" for some reason. The excluded words are saved in "DIR/bad.wfr", and the balance is saved in "DIR/gud.wfr". Most other files are derived from `gud.wfr', the frequency file for good words. Weirdos are defined as characters and combinations that are not part of the basic glyph set e i a o q y d l r s n m k t f p ch sh ckh cth cfh cph Note that we exclude { g j u v x z } as well as any { c h } that are not part of the compound glyphs listed above. We believe that this selection will not introduce a significant bias in the grammar-fitting percentages. Tokens that contain weirdos are probably abbreviations or symbols, which should not be counted in the totals; or embellished words, which are likely to be chosen for embellishment independently of their fitness or not to the grammar. As for tokens that have discrepant readings, the divergence should not be strongly correlated to their fitness to the grammar. DO IT Do it. (See the output at end of this note.) make -f vms-samples.make data Create links DIR/whole.tlw and DIR/trunc.tlw to DIR/raw.tlw for the benefit of other notes, e.g. see ../../Notes/110/Note-110.txt for book in ${books[@]}; do for sec in `cat dat/${lang}/${book}/sections.tags` tot.1 ; do smpsec="${lang}/${book}/${sec}" create-whole-trunc-tlw-files.sh ${smpsec} done done Create files DIR/{raw,gud,bad}.wdf with the words in the corresponding .tlw file, without type and location, formatted as running text: for book in ${books[@]}; do for sec in `cat dat/${lang}/${book}/sections.tags` tot.1 ; do smpsec="${lang}/${book}/${sec}" create-wdf-files.sh ${smpsec} done done Exporting TeX files: make -f vms-samples.make export Computing the fraction of bad words/tokens that were excluded because of not-so-weird weirdos like , , , , , , or nonstandard uses of and : for book in ${books[@}}; do for sec in tot.1 ; do printf "\n%s" "voyn/${book}/${sec}" cat dat/voyn/${book}/${sec}/bad.wfr \ | gawk '/./{w++;t+=$1;} END{printf " %5d%-7s", w, sprintf("(%d)",t);}' cat dat/voyn/${book}/${sec}/bad.wfr \ | gawk '($3 ~ /[*?]/){ print; }' \ | gawk '/./{w++;t+=$1;} END{printf " %5d%-7s", w, sprintf("(%d)",t);}' cat dat/voyn/${book}/${sec}/bad.wfr \ | gawk '($3 ~ /^[a-z]*$/){ print; }' \ | gawk '/./{w++;t+=$1;} END{printf " %5d%-7s", w, sprintf("(%d)",t);}' cat dat/voyn/${book}/${sec}/bad.wfr \ | gawk '($3 ~ /^[a-z]*[ao][i]*([?][i]|[i][?])[i]*[n]$/){ print; }' \ | gawk '/./{w++;t+=$1;} END{printf " %5d%-7s", w, sprintf("(%d)",t);}' printf "\n" done done Here are the numbers. Column "nbad" is the count of rejected words(tokens). Column "[?]" is the number of words(tokens) that are unreadable, contentious, or contain non-basic weirdos (non-lowercase EVA characters). Column "[bchv...]" is the number of words(tokens) that were rejected only because they contain some of the forbidden characters above. Column "[ai?n]" are the words(tokens) that were rejected mainly because of disagreement of the "iin/iiin" type. type nbad [?] [bchv...] [ai?n] -------------- ------------ ------------ ------------ ------------ voyn/maj/tot.1 1708(2526) 1612(2407) 96(119) 114(396) voyn/prs/tot.1 1580(2358) 1501(2257) 79(101) 114(396) voyn/lab/tot.1 161(168) 143(150) 18(18) 0(0) voyn/ini/tot.1 246(282) 241(277) 5(5) 29(44) voyn/mid/tot.1 1147(1698) 1100(1646) 47(52) 86(316) voyn/fin/tot.1 294(335) 267(306) 27(29) 26(36) voyn/tak/tot.1 497(626) 127(154) 370(472) 2(2) Listing the bad words that were rejected only because of lowercase weirdos: for book in prs lab ; do for sec in tot.1 ; do printf "\nFrom %s:\n\n" "voyn/${book}/${sec}.wfr" cat dat/voyn/${book}/${sec}/bad.wfr \ | gawk '($3 ~ /^[a-z]*$/){print $1, $3; }' \ | sort -b +0 -1nr +1 -2 \ | format-counts-packed \ | sed -e 's/^/ /' done done From voyp/vms/tot.1.wfr: v(7) x(7) c(4) cheg(3) xar(3) amg(2) cto(2) g(2) aikhckhy(1) aithy(1) arg(1) arxor(1) axor(1) chckshy(1) chcpar(1) chcs(1) checta(1) chepchx(1) chocty(1) chodalg(1) choekchcey(1) choikhy(1) chokolg(1) cholxy(1) chxar(1) ckcho(1) ckchol(1) ckshy(1) cky(1) coy(1) cpheeg(1) cseo(1) ctar(1) ctchy(1) ctechy(1) ctoiin(1) ctos(1) dag(1) daing(1) dchog(1) dkeeeg(1) docodal(1) doithy(1) gaiin(1) kedarxy(1) lxor(1) ockey(1) ockhh(1) oetalchg(1) ogam(1) olgy(1) org(1) oxar(1) oxor(1) oxy(1) pchocty(1) qocky(1) qodaikhy(1) qokeefcy(1) qokg(1) rokaix(1) salxar(1) sarg(1) shecphhedy(1) shhy(1) shokog(1) shxam(1) soleeg(1) teyteg(1) todashx(1) vo(1) vr(1) vs(1) xoiin(1) xol(1) yhal(1) ykceol(1) ypcheg(1) ytcharg(1) From voyl/vms/tot.1.wfr: cfhhy(1) chockhhy(1) chodalg(1) ddsschx(1) docfhhy(1) gy(1) oalcheg(1) ocsesy(1) oecs(1) ofacfom(1) okaramog(1) okeeog(1) opalg(1) opchaldg(1) oteedyg(1) soshxar(1) ydashgarain(1) yskhy(1) OUTPUT OF "MAKE" Sample voyn/maj: lines words bytes file ------- ------- --------- ------------ 1066 2132 64512 dat/voyn/maj/hea.1/raw.evt 134 268 8660 dat/voyn/maj/hea.2/raw.evt 316 632 24711 dat/voyn/maj/heb.1/raw.evt 61 122 4644 dat/voyn/maj/heb.2/raw.evt 13 26 1132 dat/voyn/maj/cos.1/raw.evt 393 786 19115 dat/voyn/maj/cos.2/raw.evt 186 372 9994 dat/voyn/maj/cos.3/raw.evt 902 1804 62353 dat/voyn/maj/bio.1/raw.evt 335 670 15343 dat/voyn/maj/zod.1/raw.evt 174 348 10021 dat/voyn/maj/pha.1/raw.evt 284 568 15718 dat/voyn/maj/pha.2/raw.evt 80 160 6158 dat/voyn/maj/str.1/raw.evt 1084 2168 90650 dat/voyn/maj/str.2/raw.evt 28 56 1835 dat/voyn/maj/unk.1/raw.evt 26 52 1801 dat/voyn/maj/unk.2/raw.evt 7 14 461 dat/voyn/maj/unk.3/raw.evt 48 96 2972 dat/voyn/maj/unk.4/raw.evt 35 70 2844 dat/voyn/maj/unk.5/raw.evt 45 90 3845 dat/voyn/maj/unk.6/raw.evt 39 78 3002 dat/voyn/maj/unk.7/raw.evt 1 2 67 dat/voyn/maj/unk.8/raw.evt 5514 11901 360159 dat/voyn/maj/tot.1/raw.evt lines words bytes file ------- ------- --------- ------------ 7047 20961 164869 dat/voyn/maj/hea.1/raw.tlw 882 2632 21493 dat/voyn/maj/hea.2/raw.tlw 2959 8819 70279 dat/voyn/maj/heb.1/raw.tlw 570 1697 13835 dat/voyn/maj/heb.2/raw.tlw 205 605 4403 dat/voyn/maj/cos.1/raw.tlw 2032 5810 44962 dat/voyn/maj/cos.2/raw.tlw 1123 3252 26380 dat/voyn/maj/cos.3/raw.tlw 7171 21317 174100 dat/voyn/maj/bio.1/raw.tlw 1674 4718 36687 dat/voyn/maj/zod.1/raw.tlw 1123 3269 26971 dat/voyn/maj/pha.1/raw.tlw 1763 5114 42370 dat/voyn/maj/pha.2/raw.tlw 763 2281 19088 dat/voyn/maj/str.1/raw.tlw 11056 32880 283328 dat/voyn/maj/str.2/raw.tlw 220 653 5215 dat/voyn/maj/unk.1/raw.tlw 142 424 3534 dat/voyn/maj/unk.2/raw.tlw 49 145 1129 dat/voyn/maj/unk.3/raw.tlw 337 991 7977 dat/voyn/maj/unk.4/raw.tlw 351 1044 8939 dat/voyn/maj/unk.5/raw.tlw 492 1473 12624 dat/voyn/maj/unk.6/raw.tlw 392 1171 9897 dat/voyn/maj/unk.7/raw.tlw 2 6 49 dat/voyn/maj/unk.8/raw.tlw 40372 119300 978182 dat/voyn/maj/tot.1/raw.tlw lines file ------- ------------ 2132 dat/voyn/maj/hea.1/raw.wfr 554 dat/voyn/maj/hea.2/raw.wfr 1189 dat/voyn/maj/heb.1/raw.wfr 331 dat/voyn/maj/heb.2/raw.wfr 83 dat/voyn/maj/cos.1/raw.wfr 1019 dat/voyn/maj/cos.2/raw.wfr 620 dat/voyn/maj/cos.3/raw.wfr 1597 dat/voyn/maj/bio.1/raw.wfr 884 dat/voyn/maj/zod.1/raw.wfr 561 dat/voyn/maj/pha.1/raw.wfr 808 dat/voyn/maj/pha.2/raw.wfr 483 dat/voyn/maj/str.1/raw.wfr 3225 dat/voyn/maj/str.2/raw.wfr 162 dat/voyn/maj/unk.1/raw.wfr 103 dat/voyn/maj/unk.2/raw.wfr 46 dat/voyn/maj/unk.3/raw.wfr 239 dat/voyn/maj/unk.4/raw.wfr 246 dat/voyn/maj/unk.5/raw.wfr 297 dat/voyn/maj/unk.6/raw.wfr 235 dat/voyn/maj/unk.7/raw.wfr 2 dat/voyn/maj/unk.8/raw.wfr 8591 dat/voyn/maj/tot.1/raw.wfr lines file ------- ------------ 1981 dat/voyn/maj/hea.1/gud.wfr 509 dat/voyn/maj/hea.2/gud.wfr 1111 dat/voyn/maj/heb.1/gud.wfr 288 dat/voyn/maj/heb.2/gud.wfr 72 dat/voyn/maj/cos.1/gud.wfr 868 dat/voyn/maj/cos.2/gud.wfr 429 dat/voyn/maj/cos.3/gud.wfr 1382 dat/voyn/maj/bio.1/gud.wfr 555 dat/voyn/maj/zod.1/gud.wfr 483 dat/voyn/maj/pha.1/gud.wfr 694 dat/voyn/maj/pha.2/gud.wfr 402 dat/voyn/maj/str.1/gud.wfr 2779 dat/voyn/maj/str.2/gud.wfr 153 dat/voyn/maj/unk.1/gud.wfr 97 dat/voyn/maj/unk.2/gud.wfr 43 dat/voyn/maj/unk.3/gud.wfr 228 dat/voyn/maj/unk.4/gud.wfr 214 dat/voyn/maj/unk.5/gud.wfr 247 dat/voyn/maj/unk.6/gud.wfr 208 dat/voyn/maj/unk.7/gud.wfr 2 dat/voyn/maj/unk.8/gud.wfr 6883 dat/voyn/maj/tot.1/gud.wfr lines file ------- ------------ 151 dat/voyn/maj/hea.1/bad.wfr 45 dat/voyn/maj/hea.2/bad.wfr 78 dat/voyn/maj/heb.1/bad.wfr 43 dat/voyn/maj/heb.2/bad.wfr 11 dat/voyn/maj/cos.1/bad.wfr 151 dat/voyn/maj/cos.2/bad.wfr 191 dat/voyn/maj/cos.3/bad.wfr 215 dat/voyn/maj/bio.1/bad.wfr 329 dat/voyn/maj/zod.1/bad.wfr 78 dat/voyn/maj/pha.1/bad.wfr 114 dat/voyn/maj/pha.2/bad.wfr 81 dat/voyn/maj/str.1/bad.wfr 446 dat/voyn/maj/str.2/bad.wfr 9 dat/voyn/maj/unk.1/bad.wfr 6 dat/voyn/maj/unk.2/bad.wfr 3 dat/voyn/maj/unk.3/bad.wfr 11 dat/voyn/maj/unk.4/bad.wfr 32 dat/voyn/maj/unk.5/bad.wfr 50 dat/voyn/maj/unk.6/bad.wfr 27 dat/voyn/maj/unk.7/bad.wfr 0 dat/voyn/maj/unk.8/bad.wfr 1708 dat/voyn/maj/tot.1/bad.wfr Good/bad statistics for voyn/maj: # tokens words # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- hea.1 6867 6704 976 163 23 2132 1981 928 151 70 hea.2 868 823 947 45 51 554 509 917 45 81 heb.1 2901 2820 971 81 27 1189 1111 933 78 65 heb.2 557 510 913 47 84 331 288 867 43 129 cos.1 195 155 790 40 204 83 72 857 11 130 cos.2 1746 1590 910 156 89 1019 868 850 151 148 cos.3 1006 795 789 211 209 620 429 690 191 307 bio.1 6975 6697 960 278 39 1597 1382 864 215 134 zod.1 1370 988 720 382 278 884 555 627 329 371 pha.1 1023 944 921 79 77 561 483 859 78 138 pha.2 1588 1452 913 136 85 808 694 857 114 140 str.1 755 670 886 85 112 483 402 830 81 167 str.2 10768 10097 937 671 62 3225 2779 861 446 138 unk.1 213 202 943 11 51 162 153 938 9 55 unk.2 140 134 950 6 42 103 97 932 6 57 unk.3 47 44 916 3 62 46 43 914 3 63 unk.4 317 306 962 11 34 239 228 950 11 45 unk.5 342 309 900 33 96 246 214 866 32 129 unk.6 489 431 879 58 118 297 247 828 50 167 unk.7 387 357 920 30 77 235 208 881 27 114 unk.8 2 2 666 0 0 2 2 666 0 0 tot.1 38556 36030 934 2526 65 8591 6883 801 1708 198 Sample voyn/prs: lines words bytes file ------- ------- --------- ------------ 1065 2130 64485 dat/voyn/prs/hea.1/raw.evt 134 268 8660 dat/voyn/prs/hea.2/raw.evt 316 632 24711 dat/voyn/prs/heb.1/raw.evt 61 122 4644 dat/voyn/prs/heb.2/raw.evt 4 8 870 dat/voyn/prs/cos.1/raw.evt 206 412 13662 dat/voyn/prs/cos.2/raw.evt 85 170 7150 dat/voyn/prs/cos.3/raw.evt 775 1550 58885 dat/voyn/prs/bio.1/raw.evt 36 72 6945 dat/voyn/prs/zod.1/raw.evt 89 178 7635 dat/voyn/prs/pha.1/raw.evt 135 270 11650 dat/voyn/prs/pha.2/raw.evt 80 160 6158 dat/voyn/prs/str.1/raw.evt 1084 2168 90650 dat/voyn/prs/str.2/raw.evt 28 56 1835 dat/voyn/prs/unk.1/raw.evt 26 52 1801 dat/voyn/prs/unk.2/raw.evt 7 14 461 dat/voyn/prs/unk.3/raw.evt 33 66 2563 dat/voyn/prs/unk.4/raw.evt 35 70 2844 dat/voyn/prs/unk.5/raw.evt 45 90 3845 dat/voyn/prs/unk.6/raw.evt 39 78 3002 dat/voyn/prs/unk.7/raw.evt 0 0 0 dat/voyn/prs/unk.8/raw.evt 4540 9953 332777 dat/voyn/prs/tot.1/raw.evt lines words bytes file ------- ------- --------- ------------ 7045 20956 164841 dat/voyn/prs/hea.1/raw.tlw 882 2632 21493 dat/voyn/prs/hea.2/raw.tlw ,2959 8819 70279 dat/voyn/prs/heb.1/raw.tlw 570 1697 13835 dat/voyn/prs/heb.2/raw.tlw 186 557 4105 dat/voyn/prs/cos.1/raw.tlw 1606 4703 37794 dat/voyn/prs/cos.2/raw.tlw 904 2692 22690 dat/voyn/prs/cos.3/raw.tlw 6915 20658 170013 dat/voyn/prs/bio.1/raw.tlw 1015 3040 25966 dat/voyn/prs/zod.1/raw.tlw 942 2810 24107 dat/voyn/prs/pha.1/raw.tlw 1455 4336 37509 dat/voyn/prs/pha.2/raw.tlw 763 2281 19088 dat/voyn/prs/str.1/raw.tlw 11056 32880 283328 dat/voyn/prs/str.2/raw.tlw 220 653 5215 dat/voyn/prs/unk.1/raw.tlw 142 424 3534 dat/voyn/prs/unk.2/raw.tlw 49 145 1129 dat/voyn/prs/unk.3/raw.tlw 307 916 7559 dat/voyn/prs/unk.4/raw.tlw 351 1044 8939 dat/voyn/prs/unk.5/raw.tlw 492 1473 12624 dat/voyn/prs/unk.6/raw.tlw 392 1171 9897 dat/voyn/prs/unk.7/raw.tlw 0 0 0 dat/voyn/prs/unk.8/raw.tlw 38269 113923 943994 dat/voyn/prs/tot.1/raw.tlw lines file ------- ------------ 2131 dat/voyn/prs/hea.1/raw.wfr 554 dat/voyn/prs/hea.2/raw.wfr 1189 dat/voyn/prs/heb.1/raw.wfr 331 dat/voyn/prs/heb.2/raw.wfr 73 dat/voyn/prs/cos.1/raw.wfr 868 dat/voyn/prs/cos.2/raw.wfr 533 dat/voyn/prs/cos.3/raw.wfr 1536 dat/voyn/prs/bio.1/raw.wfr 641 dat/voyn/prs/zod.1/raw.wfr 485 dat/voyn/prs/pha.1/raw.wfr 684 dat/voyn/prs/pha.2/raw.wfr 483 dat/voyn/prs/str.1/raw.wfr 3225 dat/voyn/prs/str.2/raw.wfr 162 dat/voyn/prs/unk.1/raw.wfr 103 dat/voyn/prs/unk.2/raw.wfr 46 dat/voyn/prs/unk.3/raw.wfr 226 dat/voyn/prs/unk.4/raw.wfr 246 dat/voyn/prs/unk.5/raw.wfr 297 dat/voyn/prs/unk.6/raw.wfr 235 dat/voyn/prs/unk.7/raw.wfr 0 dat/voyn/prs/unk.8/raw.wfr 8105 dat/voyn/prs/tot.1/raw.wfr lines file ------- ------------ 1980 dat/voyn/prs/hea.1/gud.wfr 509 dat/voyn/prs/hea.2/gud.wfr 1111 dat/voyn/prs/heb.1/gud.wfr 288 dat/voyn/prs/heb.2/gud.wfr 63 dat/voyn/prs/cos.1/gud.wfr 733 dat/voyn/prs/cos.2/gud.wfr 380 dat/voyn/prs/cos.3/gud.wfr 1325 dat/voyn/prs/bio.1/gud.wfr 379 dat/voyn/prs/zod.1/gud.wfr 418 dat/voyn/prs/pha.1/gud.wfr 587 dat/voyn/prs/pha.2/gud.wfr 402 dat/voyn/prs/str.1/gud.wfr 2779 dat/voyn/prs/str.2/gud.wfr 153 dat/voyn/prs/unk.1/gud.wfr 97 dat/voyn/prs/unk.2/gud.wfr 43 dat/voyn/prs/unk.3/gud.wfr 216 dat/voyn/prs/unk.4/gud.wfr 214 dat/voyn/prs/unk.5/gud.wfr 247 dat/voyn/prs/unk.6/gud.wfr 208 dat/voyn/prs/unk.7/gud.wfr 0 dat/voyn/prs/unk.8/gud.wfr 6525 dat/voyn/prs/tot.1/gud.wfr lines file ------- ------------ 151 dat/voyn/prs/hea.1/bad.wfr 45 dat/voyn/prs/hea.2/bad.wfr 78 dat/voyn/prs/heb.1/bad.wfr 43 dat/voyn/prs/heb.2/bad.wfr 10 dat/voyn/prs/cos.1/bad.wfr 135 dat/voyn/prs/cos.2/bad.wfr 153 dat/voyn/prs/cos.3/bad.wfr 211 dat/voyn/prs/bio.1/bad.wfr 262 dat/voyn/prs/zod.1/bad.wfr 67 dat/voyn/prs/pha.1/bad.wfr 97 dat/voyn/prs/pha.2/bad.wfr 81 dat/voyn/prs/str.1/bad.wfr 446 dat/voyn/prs/str.2/bad.wfr 9 dat/voyn/prs/unk.1/bad.wfr 6 dat/voyn/prs/unk.2/bad.wfr 3 dat/voyn/prs/unk.3/bad.wfr 10 dat/voyn/prs/unk.4/bad.wfr 32 dat/voyn/prs/unk.5/bad.wfr 50 dat/voyn/prs/unk.6/bad.wfr 27 dat/voyn/prs/unk.7/bad.wfr 0 dat/voyn/prs/unk.8/bad.wfr 1580 dat/voyn/prs/tot.1/bad.wfr Good/bad statistics for voyn/prs: # tokens words # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- hea.1 6866 6703 976 163 23 2131 1980 928 151 70 hea.2 868 823 947 45 51 554 509 917 45 81 heb.1 2901 2820 971 81 27 1189 1111 933 78 65 heb.2 557 510 913 47 84 331 288 867 43 129 cos.1 185 146 784 39 209 73 63 851 10 135 cos.2 1491 1353 906 138 92 868 733 843 135 155 cos.3 884 713 805 171 193 533 380 711 153 286 bio.1 6828 6555 959 273 39 1536 1325 862 211 137 zod.1 1010 701 693 309 305 641 379 590 262 408 pha.1 926 858 925 68 73 485 418 860 67 137 pha.2 1426 1309 917 117 81 684 587 856 97 141 str.1 755 670 886 85 112 483 402 830 81 167 str.2 10768 10097 937 671 62 3225 2779 861 446 138 unk.1 213 202 943 11 51 162 153 938 9 55 unk.2 140 134 950 6 42 103 97 932 6 57 unk.3 47 44 916 3 62 46 43 914 3 63 unk.4 302 292 963 10 33 226 216 951 10 44 unk.5 342 309 900 33 96 246 214 866 32 129 unk.6 489 431 879 58 118 297 247 828 50 167 unk.7 387 357 920 30 77 235 208 881 27 114 unk.8 0 0 0 0 0 0 0 0 0 0 tot.1 37385 35027 936 2358 63 8105 6525 804 1580 194 Sample voyn/lab: lines words bytes file ------- ------- --------- ------------ 1 2 27 dat/voyn/lab/hea.1/raw.evt 0 0 0 dat/voyn/lab/hea.2/raw.evt 0 0 0 dat/voyn/lab/heb.1/raw.evt 0 0 0 dat/voyn/lab/heb.2/raw.evt 9 18 262 dat/voyn/lab/cos.1/raw.evt 187 374 5453 dat/voyn/lab/cos.2/raw.evt 101 202 2844 dat/voyn/lab/cos.3/raw.evt 127 254 3468 dat/voyn/lab/bio.1/raw.evt 299 598 8398 dat/voyn/lab/zod.1/raw.evt 85 170 2386 dat/voyn/lab/pha.1/raw.evt 149 298 4068 dat/voyn/lab/pha.2/raw.evt 0 0 0 dat/voyn/lab/str.1/raw.evt 0 0 0 dat/voyn/lab/str.2/raw.evt 0 0 0 dat/voyn/lab/unk.1/raw.evt 0 0 0 dat/voyn/lab/unk.2/raw.evt 0 0 0 dat/voyn/lab/unk.3/raw.evt 15 30 409 dat/voyn/lab/unk.4/raw.evt 0 0 0 dat/voyn/lab/unk.5/raw.evt 0 0 0 dat/voyn/lab/unk.6/raw.evt 0 0 0 dat/voyn/lab/unk.7/raw.evt 1 2 67 dat/voyn/lab/unk.8/raw.evt 1231 3335 37703 dat/voyn/lab/tot.1/raw.evt lines words bytes file ------- ------- --------- ------------ 1 3 24 dat/voyn/lab/hea.1/raw.tlw 0 0 0 dat/voyn/lab/hea.2/raw.tlw 0 0 0 dat/voyn/lab/heb.1/raw.tlw 0 0 0 dat/voyn/lab/heb.2/raw.tlw 18 46 294 dat/voyn/lab/cos.1/raw.tlw 425 1105 7164 dat/voyn/lab/cos.2/raw.tlw 218 558 3687 dat/voyn/lab/cos.3/raw.tlw 255 657 4084 dat/voyn/lab/bio.1/raw.tlw 658 1676 10717 dat/voyn/lab/zod.1/raw.tlw 180 457 2862 dat/voyn/lab/pha.1/raw.tlw 307 776 4859 dat/voyn/lab/pha.2/raw.tlw 0 0 0 dat/voyn/lab/str.1/raw.tlw 0 0 0 dat/voyn/lab/str.2/raw.tlw 0 0 0 dat/voyn/lab/unk.1/raw.tlw 0 0 0 dat/voyn/lab/unk.2/raw.tlw 0 0 0 dat/voyn/lab/unk.3/raw.tlw 29 73 415 dat/voyn/lab/unk.4/raw.tlw 0 0 0 dat/voyn/lab/unk.5/raw.tlw 0 0 0 dat/voyn/lab/unk.6/raw.tlw 0 0 0 dat/voyn/lab/unk.7/raw.tlw 2 6 49 dat/voyn/lab/unk.8/raw.tlw 2102 5375 34187 dat/voyn/lab/tot.1/raw.tlw lines file ------- ------------ 1 dat/voyn/lab/hea.1/raw.wfr 0 dat/voyn/lab/hea.2/raw.wfr 0 dat/voyn/lab/heb.1/raw.wfr 0 dat/voyn/lab/heb.2/raw.wfr 10 dat/voyn/lab/cos.1/raw.wfr 225 dat/voyn/lab/cos.2/raw.wfr 112 dat/voyn/lab/cos.3/raw.wfr 127 dat/voyn/lab/bio.1/raw.wfr 303 dat/voyn/lab/zod.1/raw.wfr 92 dat/voyn/lab/pha.1/raw.wfr 155 dat/voyn/lab/pha.2/raw.wfr 0 dat/voyn/lab/str.1/raw.wfr 0 dat/voyn/lab/str.2/raw.wfr 0 dat/voyn/lab/unk.1/raw.wfr 0 dat/voyn/lab/unk.2/raw.wfr 0 dat/voyn/lab/unk.3/raw.wfr 15 dat/voyn/lab/unk.4/raw.wfr 0 dat/voyn/lab/unk.5/raw.wfr 0 dat/voyn/lab/unk.6/raw.wfr 0 dat/voyn/lab/unk.7/raw.wfr 2 dat/voyn/lab/unk.8/raw.wfr 882 dat/voyn/lab/tot.1/raw.wfr lines file ------- ------------ 1 dat/voyn/lab/hea.1/gud.wfr 0 dat/voyn/lab/hea.2/gud.wfr 0 dat/voyn/lab/heb.1/gud.wfr 0 dat/voyn/lab/heb.2/gud.wfr 9 dat/voyn/lab/cos.1/gud.wfr 208 dat/voyn/lab/cos.2/gud.wfr 72 dat/voyn/lab/cos.3/gud.wfr 122 dat/voyn/lab/bio.1/gud.wfr 233 dat/voyn/lab/zod.1/gud.wfr 81 dat/voyn/lab/pha.1/gud.wfr 136 dat/voyn/lab/pha.2/gud.wfr 0 dat/voyn/lab/str.1/gud.wfr 0 dat/voyn/lab/str.2/gud.wfr 0 dat/voyn/lab/unk.1/gud.wfr 0 dat/voyn/lab/unk.2/gud.wfr 0 dat/voyn/lab/unk.3/gud.wfr 14 dat/voyn/lab/unk.4/gud.wfr 0 dat/voyn/lab/unk.5/gud.wfr 0 dat/voyn/lab/unk.6/gud.wfr 0 dat/voyn/lab/unk.7/gud.wfr 2 dat/voyn/lab/unk.8/gud.wfr 721 dat/voyn/lab/tot.1/gud.wfr lines file ------- ------------ 0 dat/voyn/lab/hea.1/bad.wfr 0 dat/voyn/lab/hea.2/bad.wfr 0 dat/voyn/lab/heb.1/bad.wfr 0 dat/voyn/lab/heb.2/bad.wfr 1 dat/voyn/lab/cos.1/bad.wfr 17 dat/voyn/lab/cos.2/bad.wfr 40 dat/voyn/lab/cos.3/bad.wfr 5 dat/voyn/lab/bio.1/bad.wfr 70 dat/voyn/lab/zod.1/bad.wfr 11 dat/voyn/lab/pha.1/bad.wfr 19 dat/voyn/lab/pha.2/bad.wfr 0 dat/voyn/lab/str.1/bad.wfr 0 dat/voyn/lab/str.2/bad.wfr 0 dat/voyn/lab/unk.1/bad.wfr 0 dat/voyn/lab/unk.2/bad.wfr 0 dat/voyn/lab/unk.3/bad.wfr 1 dat/voyn/lab/unk.4/bad.wfr 0 dat/voyn/lab/unk.5/bad.wfr 0 dat/voyn/lab/unk.6/bad.wfr 0 dat/voyn/lab/unk.7/bad.wfr 0 dat/voyn/lab/unk.8/bad.wfr 161 dat/voyn/lab/tot.1/bad.wfr Good/bad statistics for voyn/lab: # tokens words # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- hea.1 1 1 500 0 0 1 1 500 0 0 hea.2 0 0 0 0 0 0 0 0 0 0 heb.1 0 0 0 0 0 0 0 0 0 0 heb.2 0 0 0 0 0 0 0 0 0 0 cos.1 10 9 818 1 90 10 9 818 1 90 cos.2 255 237 925 18 70 225 208 920 17 75 cos.3 122 82 666 40 325 112 72 637 40 353 bio.1 147 142 959 5 33 127 122 953 5 39 zod.1 360 287 795 73 202 303 233 766 70 230 pha.1 97 86 877 11 112 92 81 870 11 118 pha.2 162 143 877 19 116 155 136 871 19 121 str.1 0 0 0 0 0 0 0 0 0 0 str.2 0 0 0 0 0 0 0 0 0 0 unk.1 0 0 0 0 0 0 0 0 0 0 unk.2 0 0 0 0 0 0 0 0 0 0 unk.3 0 0 0 0 0 0 0 0 0 0 unk.4 15 14 875 1 62 15 14 875 1 62 unk.5 0 0 0 0 0 0 0 0 0 0 unk.6 0 0 0 0 0 0 0 0 0 0 unk.7 0 0 0 0 0 0 0 0 0 0 unk.8 2 2 666 0 0 2 2 666 0 0 tot.1 1171 1003 855 168 143 882 721 816 161 182 Statistics for voyn/tak: lines words bytes file ------- ------- --------- ------------ 5391 11713 361531 dat/voyn/tak/tot.1/raw.evt lines words bytes file ------- ------- --------- ------------ 39548 116936 960572 dat/voyn/tak/tot.1/raw.tlw lines file ------- ------------ 8150 dat/voyn/tak/tot.1/raw.wfr lines file ------- ------------ 7653 dat/voyn/tak/tot.1/gud.wfr lines file ------- ------------ 497 dat/voyn/tak/tot.1/bad.wfr Good/bad statistics for voyn/tak: # tokens words # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- tot.1 37840 37214 983 626 16 8150 7653 938 497 60 Statistics for voyn/ini: lines words bytes file ------- ------- --------- ------------ lines words bytes file ------- ------- --------- ------------ 5726 16301 126670 dat/voyn/ini/tot.1/raw.tlw lines file ------- ------------ 2159 dat/voyn/ini/tot.1/raw.wfr lines file ------- ------------ 1913 dat/voyn/ini/tot.1/gud.wfr lines file ------- ------------ 246 dat/voyn/ini/tot.1/bad.wfr Good/bad statistics for voyn/ini: # tokens words # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- tot.1 4849 4567 941 282 58 2159 1913 885 246 113 Statistics for voyn/fin: lines words bytes file ------- ------- --------- ------------ lines words bytes file ------- ------- --------- ------------ 5726 16301 122721 dat/voyn/fin/tot.1/raw.tlw lines file ------- ------------ 2042 dat/voyn/fin/tot.1/raw.wfr lines file ------- ------------ 1748 dat/voyn/fin/tot.1/gud.wfr lines file ------- ------------ 294 dat/voyn/fin/tot.1/bad.wfr Good/bad statistics for voyn/fin: # tokens words # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- tot.1 4849 4514 930 335 69 2042 1748 855 294 143 Statistics for voyn/mid: lines words bytes file ------- ------- --------- ------------ lines words bytes file ------- ------- --------- ------------ 28240 83880 694625 dat/voyn/mid/tot.1/raw.tlw lines file ------- ------------ 5633 dat/voyn/mid/tot.1/raw.wfr lines file ------- ------------ 4486 dat/voyn/mid/tot.1/gud.wfr lines file ------- ------------ 1147 dat/voyn/mid/tot.1/bad.wfr Good/bad statistics for voyn/mid: # tokens words # ----------------------------- ----------------------------- # sec raw gud ppt bad ppt raw gud ppt bad ppt # ------ ----- ----- ---- ----- ---- ----- ----- ---- ----- ---- tot.1 27400 25702 937 1698 61 5633 4486 796 1147 203 voyn/{prs,lab}/hea.1/raw.wfr: 6867 voyn/maj/hea.1/raw.wfr: 6867 voyn/{prs,lab}/hea.2/raw.wfr: 868 voyn/maj/hea.2/raw.wfr: 868 voyn/{prs,lab}/heb.1/raw.wfr: 2901 voyn/maj/heb.1/raw.wfr: 2901 voyn/{prs,lab}/heb.2/raw.wfr: 557 voyn/maj/heb.2/raw.wfr: 557 voyn/{prs,lab}/cos.1/raw.wfr: 195 voyn/maj/cos.1/raw.wfr: 195 voyn/{prs,lab}/cos.2/raw.wfr: 1746 voyn/maj/cos.2/raw.wfr: 1746 voyn/{prs,lab}/cos.3/raw.wfr: 1006 voyn/maj/cos.3/raw.wfr: 1006 voyn/{prs,lab}/bio.1/raw.wfr: 6975 voyn/maj/bio.1/raw.wfr: 6975 voyn/{prs,lab}/zod.1/raw.wfr: 1370 voyn/maj/zod.1/raw.wfr: 1370 voyn/{prs,lab}/pha.1/raw.wfr: 1023 voyn/maj/pha.1/raw.wfr: 1023 voyn/{prs,lab}/pha.2/raw.wfr: 1588 voyn/maj/pha.2/raw.wfr: 1588 voyn/{prs,lab}/str.1/raw.wfr: 755 voyn/maj/str.1/raw.wfr: 755 voyn/{prs,lab}/str.2/raw.wfr: 10768 voyn/maj/str.2/raw.wfr: 10768 voyn/{prs,lab}/unk.1/raw.wfr: 213 voyn/maj/unk.1/raw.wfr: 213 voyn/{prs,lab}/unk.2/raw.wfr: 140 voyn/maj/unk.2/raw.wfr: 140 voyn/{prs,lab}/unk.3/raw.wfr: 47 voyn/maj/unk.3/raw.wfr: 47 voyn/{prs,lab}/unk.4/raw.wfr: 317 voyn/maj/unk.4/raw.wfr: 317 voyn/{prs,lab}/unk.5/raw.wfr: 342 voyn/maj/unk.5/raw.wfr: 342 voyn/{prs,lab}/unk.6/raw.wfr: 489 voyn/maj/unk.6/raw.wfr: 489 voyn/{prs,lab}/unk.7/raw.wfr: 387 voyn/maj/unk.7/raw.wfr: 387 voyn/{prs,lab}/unk.8/raw.wfr: 2 voyn/maj/unk.8/raw.wfr: 2 # END