Hacking at the Voynich manuscript - Side notes 052 Reordering the biological pages by shared key words Last edited on 1999-07-21 08:31:21 by stolfi INTRODUCTION In this note we try to determine the original page order of the biological section, based on the sharing of key words. WHY DOUBT THE CURRENT ORDER? It has long been noted that pages f78v and f81r, which belong to the same face of bifolio bM4, are spanned by a single picture. Therefore the present situation of that bifolio is incorrect; it should have been originally at the center of its quire. If that bifolio is incorrect, we cannot assume that the other bifolios are in the correct order, or even that they were nested into a single quire. Indeed, the VMS may be what remais of one or more larger works, parts of which were lost or destroyed. HOW COULD WE DISCOVER THE ORIGINAL ORDER? First, a caveat: there may not be a "correct" order. It is quite possible that the biological pages (and the rest of the VMS) were created incrementally in random order, possibly with local replacements (such as seem to have happened to the first pages of the Zodiac section). On the other hand, the text of the biological section looks very much like flowing prose (unlike that of other sections, where the text is generally secondary to the drawings, or seems to be a bunch of items.) It is not unreasonable to hope that the text is more or less continuous across the whole section, and is meant to be read in a definite sequence. With that assumption, we may be able to to recover the original order by looking at the distributin of words. Specifically, if a word occurs several times on two different folios, and not elsewhere, there is some chance that it is a "key word" that occur only on one place of the text. Therefore, such a word is statistical evidence that those two folios were adjacent in the original order. (Of course the evidence is not conclusive, since a key word is occasionally mentioned several times at two different places in the text.) Another useful clue (that provides a consistency check) is that bifolios bM3 (f77+f82) and bM4 (f78+f81) seem to have the same shape, dimensions, and cuved outlines; so that they are almost congruent when the book is opened with one folio over the other. It seems therefore that those two bifolios, at least, are correctly nested. (Of course, the two pages may have been re-trimmed after binding, so this evidence is not very strong.) BIFOLIO NUMBERING To identify a VMS bifolio we generally use a "b-number" like bM5, where the "b" is constant, the "M" is Rene's quire code, and the "5" is the nesting order of the bifolio in the quire (1 = outermost). To save space, here we will denote the five bifolios of the biological section with letters A-E, according to the following table: A bM5 f79+f80 B bM3 f77+f82 C bM2 f76+f83 D bM1 f75+f84 E bM4 f78+f81 Bifolio E (bM4) is the one that has the figure spanning two pages (f78v and f81r), and thus cannot have any other bifolio nested within it. For the same token, it must be folded just as in the VMS, with f78 before f81. Similarly, we will denote the biological folios by letters "ABCDE" for the low-numbered half of the bifolio, and "abcde" for the high-numbered half. Note that a permutation of "ABCDEabcde" completely specifies the folio order, and hence the binding. The order is physically possible only if the pairs "Aa", "Bb", etc. are properly nested, excluding patterns like "ABab". The number of physically possible bindings for n bifolios is 2^n (the number of front/back flips) times ??? (the number of properly nested sequences of n pairs of parentheses) times n! (for the ways to assign the bifolios to the pairs of parenteses). Tabulated for n=1..5: 1 2^1 * 1! * 1 = 2 () 2 2^2 * 2! * 2 = 16 (()), ()() 3 2^3 * 3! * 5 = 240 ((())), (())(), ()(()), (()()), ()()() 4 2^4 * 4! * 14 = 5376 (((()))), ((()))(), ... 5 2^5 * 5! * 210 = 161280 ((((())))), (((())))(), ... GABRIEL'S DENDROGRAM Gabriel generated a dendrogram with SPSS for all VMS pages, based on Rene's digraph counts. The Biological pages ended up in a single branch (with a lone intruder from the Stars section), as shown below: 79v MJ Av -+ 82v MP bv -+ 83v MR cv -+ 75v MB Dv -+ 77r ME Br -+--+ 77v MF Bv -+ | 82r MO br -+ | 75r MA Dr -+ | 76r MC Cr -+ | | | 76v MD Cv -+ | 83r MQ cr -+--+--+ 108r TK *r -+ | | | | | | 78v MH Ev -+ | | 84v MT dv -+ | | 81r MM er -+--+ | 78r MG Er -+ | 84r MS dr -+ +--- 81v MN ev -+ | | | 79r MI Ar -+ | 80v ML av -+-----+ 80r MK ar -+ BIFOLIO ORDERING BASED ON SHARED KEY WORDS Let's extract from the word occurrence maps for the biological section (note 051) the set of all words that occur in only two of the five bifolios, but at least twice in each of them. This last requirement tries to distinguish topic-specific "key words" from words that have a uniform distribution with very low probability. Here are those words, and their counts on each bifolio: ---------------- ----- ----- bbbbb MMMMM 53214 ---------------- ----- ----- word(s) totct ABCDE ---------------- ----- ----- shar 7 52... pcheol 5 23... sheeol 4 22... chear 5 2.3.. orol 6 4..2. olkey 4 2..2. shek 4 2...2 chedaiin 5 .32.. lkedy 7 .25.. chees 5 .23.. cheal 4 .22.. chedar 4 .2..2 qokeor 6 ..33. pchedar 4 ..22. otor 4 ..2.2 ytedy 7 ...25 am 5 ...23 ykedy 5 ...23 soiin 4 ...22 ---------------- ----- ----- The following matrix (bifolio-0.matrix) summarizes this data. Each entry is the number of words from the above list that occur in both of the two corresponding bifolios. A B C D E - - - - - A - 3 1 2 1 B 3 - 4 . 1 C 1 4 - 2 1 D 2 . 2 - 4 E 1 1 1 4 - S0 = 6* S1 = 11* S2 = 23 The numbers at the bottom are numerical scores that we use to evaluate proposed linear orderings of the bifolios, given their pairwise similarities. The indicator Sk, for some non-negative number k, is defined as Sk = sum over j>i of M[i,j]*(j-i-1)^k We will mark a score "*" if it is the minimum score over all permutations of the bifolios. Here are some other orderings, and the respective scores: Alternate Found with Found with Current ordering computer computer ordering E D A B C B C A D E B C A E D D C B E A - - - - - - - - - - - - - - - - - - - - E - 4 1 1 1 B - 4 3 . 1 B - 4 3 1 . D - 2 . 4 2 D 4 - 2 . 2 C 4 - 1 2 1 C 4 - 1 1 2 C 2 - 4 1 1 A 1 2 - 3 1 A 3 1 - 2 1 A 3 1 - 1 2 B . 4 - 1 3 B 1 . 3 - 4 D . 2 2 - 4 E 1 1 1 - 4 E 4 1 1 - 1 C 1 2 1 4 - E 1 1 1 4 - D . 2 2 4 - A 2 1 3 1 - S0 = 6* S0 = 8 S0 = 9 S0 = 11 S1 = 11* S1 = 11* S1 = 12 S1 = 19 S2 = 23 S2 = 19 S2 = 18* S2 = 42 Let's relax the requirements, and consider also words that occur four or more times in only two bifolios, but only once in one of the bifolios. Here they are: ---------------- ----- ----- bbbbb MMMMM 53214 ---------------- ----- ----- word(s) totct ABCDE ---------------- ----- ----- rchey 7 61... tshedy 4 31... ysheey 4 31... o 5 4.1.. qo 10 1.9.. sy 5 1.4.. opchey 4 1.3.. aror 4 3..1. tor 4 3..1. shedaiin 4 .31.. shedal 8 .17.. lsheedy 5 .14.. lr 4 .1..3 shdy 7 ..6.1 ---------------- ----- ----- Counting these pairs too, the matrix of shared key words (bifolio-1.matrix) becomes Proposed Better Current ordering ordering ordering A B C D E E D A B C D C B E A - - - - - - - - - - - - - - - A - 6 5 4 1 E - 4 1 2 2 D - 2 . 4 4 B 6 - 7 . 2 D 4 - 4 . 2 C 2 - 7 2 5 C 5 7 - 2 2 A 1 4 - 6 5 B . 7 - 2 6 D 4 . 2 - 4 B 2 . 6 - 7 E 4 2 2 - 1 E 1 2 2 4 - C 2 2 5 7 - A 4 5 6 1 - S0 = 14 S0 = 12* S0 = 21 S1 = 22 S1 = 20 S1 = 38 S2 = 40 S2 = 40 S2 = 80 Note that the second solution above is slightly better than ABCDE, by the S0 criterion. The first two solutions below, that optimize S1 and S2, respectively, were found by exhaustive search over all 5! = 120 permutations of ABCDE. The third matrix is a slight variation of the second one. Found with Found with Close call computer computer B C A D E B C A E D B A C E D - - - - - - - - - - - - - - - B - 7 6 . 2 B - 7 6 2 . B - 6 7 2 . C 7 - 5 2 2 C 7 - 5 2 2 A 6 - 5 1 4 A 6 5 - 4 1 A 6 5 - 1 4 C 7 5 - 2 2 D . 2 4 - 4 E 2 2 1 - 4 E 2 1 2 - 4 E 2 2 1 4 - D . 2 4 4 - D . 4 2 4 - S0 = 13 S0 = 16 S0 = 16 S1 = 19* S1 = 20 S1 = 22 S2 = 35 S2 = 28* S2 = 34 Note that B-D and A-E are maximally distant from each other, but A and E are roughly equidistant to B,C,D. There is however a slight bias - A is closer to B, E is closer to D. Also A-B, B-C, C-D, and D-E are closer than B-D. I read this data as suggesting the following relationship diagram: A |\\ | \ \ | \ \ | \ \ B -- C -- D . . | . . | . . | ..| E These relationships could be explained, for example, by assuming that the five bifolios are not nested, but have the following structure: A bM5 f79-80 introduction (survey of topic, enumeration of contents) B bM3 f77-82 section 1 C bM2 f76-83 section 2 D bM1 f75-84 section 3 E bM4 f78-81 conclusion (partly on section 3) Of course one could also have the reverse structure, with E as the "introduction" (continuing into "d") and A as the conclusion aand summary. In any case, note that this ordering separates B (bM3) from E (bM4), contradictiong their phisical congruence. But this seems inevitable in this approach, since the vocabularies of B and E seem quite different. Gabriel's dendrogram too keeps them separate. FOLIO ORDER FROM KEY WORDS The following table lists the analogous data for folios, rather than bifolios. First, lets look at the words that occur on only two folios, but at least twice on each folio. ---------------- ----- -- -- -- -- -- bb bb bb bb bb MM MM MM MM MM 55 33 22 11 44 ff ff ff ff ff 78 78 78 78 78 90 72 63 54 81 ---------------- ----- -- -- -- -- -- word(s) totct Aa Bb Cc Dd Ee ---------------- ----- -- -- -- -- -- qokl 4 22 .. .. .. .. sheeol 4 2. 2. .. .. .. shek 4 2. .. .. .. .2 chees 5 .. 2. 3. .. .. cheal 4 .. .2 .2 .. .. pchedar 4 .. .. .2 2. .. otor 4 .. .. .2 .. 2. ---------------- ----- -- -- -- -- -- Note that folios B and C share a word, and so do c and b; that suggests the two bifolios may be nested like BCcb or CBbc (or their reversals). Let's put the data in matrix form (folio-0.matrix). Proposed order Alternate order Alternate order Current order (unnested) (unnested) (B C D nested) (all nested) AaBbCcDdEe EeDdAaBCcb AaBCDdcEeb DCBEAaebcd ---------- ---------- ---------- ---------- A -11......1 E -.......1. A -11.....1. D -.......1. a 1-........ e .-..1..... a 1-........ C .-1....... B 1.-.1..... D ..-.....1. B 1.-1...... B .1-.1..... b ...-.1.... d ...-...... C ..1-...... E ...-....1. C ..1.-..... A .1..-11... D ....-.1... A ..1.-11... c ...1.-1.1. a ....1-.... d .....-.... a ....1-.... D .....1-... B ....1.-1.. c ....1.-1.1 e ....1.-... d .......-.. C ......1-.. E ......1-.. b .......-1. E .....1..-. c 1.1.....-1 e 1.......-. c 1..1...1-. e 1........- b ........1- b ......1..- d .........- S0 = 5 S0 = 4 S0 = 4 S0 = 4 S1 = S1 = S1 = S1 = S2 = 71 S2 = 79 S2 = 55 S2 = 67 After fixing the matrix to force Ee to be adjacent, the computer finds 200 solutions with optimal S0 score = 2. Here are the 100 solutions which have "Ee" rather than "eE" BCEeAadDcb BCdDcEeAab BCdDcbEeAa BCdEeAaDcb CBAaDdbcEe CBAaEebcDd CBAabcDEed CBAabcDdEe CBAabcEeDd CBAabcEedD CBAabdDcEe CBAadDbcEe CBDdbcEeAa CBEeAabcDd CBbcDEeAad CBbcDdEeAa CBbcEeAaDd CBbcEeAadD CBbdDcEeAa CBdDbcEeAa DCBAabcEed DCBbcEeAad DbcEeAaCBd DbcEeCBAad DcEeAaCBbd DcEeAabBCd DcEeCBAabd DcEebaABCd DcbBCEeAad DcbBCdEeAa DcbEeAaBCd DcbEeaABCd DcbaABCEed DcbaABCdEe DcbaEeABCd DdCBAabcEe DdCBbcEeAa DdbcEeAaCB DdbcEeCBAa EeABCdDcba EeAaBCdDcb EeAaCBbcDd EeAaDcbBCd EeAabcDdCB EeAadCBbcD EeAadDcbBC EeCBAabcDd EeDcbaABCd EeaABCdDcb EebcDdCBAa EedCBAabcD EedDcbaABC aABCEedDcb aABCdDcEeb aABCdDcbEe aABCdEeDcb aEeABCdDcb bBCdDcEeAa baABCdDcEe bcDEeAadCB bcDEedCBAa bcDdCBAaEe bcDdCBEeAa bcDdEeAaCB bcDdEeCBAa bcEeAaCBDd bcEeAaCBdD bcEeAaDdCB bcEeAadDCB bcEeCBAaDd bcEeCBAadD bcEeDdCBAa bcEedDCBAa bdDcEeAaCB bdDcEeCBAa dCBAaEebcD dCBAabcDEe dCBAabcEeD dCBEeAabcD dCBbcDEeAa dCBbcEeAaD dDCBAabcEe dDCBbcEeAa dDbcEeAaCB dDbcEeCBAa dDcEeAaCBb dDcEeAabBC dDcEeCBAab dDcEebaABC dDcbBCEeAa dDcbEeAaBC dDcbEeaABC dDcbaABCEe dDcbaEeABC dEeAaCBbcD dEeAaDcbBC dEeCBAabcD dEeDcbaABC dbcEeAaCBD dbcEeCBAaD For S2, the computer finds 8 optimum solutions, of which 4 have bifolio E folded as "Ee" rather than "eE": DcbEeAaBCd DcbEeaABCd dDcbEeAaBC dDcbEeaABC (((()()))) (((()()))) ()((()())) ()((()())) It seems that all of these solutions have B and C nested within each other. Here are the matrices of these permutations. The one on the left is S0-optimal and has all bifolios folded in the "official" way. The one on the right is both S0- and S2-optimal. Found with Found with computer computer EeAaCBbcDd DcbEeAaBCd ()()(())() (((()()))) ---------- ---------- E -......1.. D -1........ e .-1....... c 1-11...... A .1-1.1.... b .1-....... a ..1-...... E .1.-...... C ....-1.... e ....-1.... B ..1.1-.... A ....1-11.. b ......-1.. a .....1-... c 1.....1-1. B .....1.-1. D .......1-. C .......1-. d .........- d .........- S0 = 2* S0 = 2* S1 = S1 = 2 S2 = 40 S2 = 2* Again these solutions have bifolios B and C nested within each other. Moreover the last solution above reveals that folios D,c,b,E and e,A,a,B,C are unrelated by this data, and d is not related to either group. The solution AaBCdDcbEe should therefore be Now let's extend the list with words that have three or more occurrences, restricted to only two folios, but which occur only once in one of them: ---------------- ----- -- -- -- -- -- bb bb bb bb bb MM MM MM MM MM 55 33 22 11 44 ff ff ff ff ff 78 78 78 78 78 90 72 63 54 81 ---------------- ----- -- -- -- -- -- word(s) totct Aa Bb Cc Dd Ee ---------------- ----- -- -- -- -- -- o 5 4. .. 1. .. .. aror 4 3. .. .. .1 .. tor 4 .3 .. .. .1 .. qokeeey 4 .. 13 .. .. .. lr 4 .. .1 .. .. 3. ---------------- ----- -- -- -- -- -- We get the matrix folio-1.matrix on the left below. Proposed order Optimal (unnested) solution AaBbCcDdEe DcbEeBCAad ()()()()() (((()))()) ---------- ---------- A -11.1..1.1 D -1........ a 1-.....1.. c 1-11...... B 1.-11..... b .1-1.1.... b ..1-.1..1. E .11-...... C 1.1.-..... e ....-..1.. c ...1.-1.1. B ..1..-11.. D .....1-... C .....1-1.. d 11.....-.. A ....111-11 E ...1.1..-. a .......1-1 e 1........- d .......11- S0 = 9 S0 = 5* S1 = 31 S1 = S2 = 157 S2 = 11* The matrix on the right is the S2-optimal arrangement. There were 22 S0-optimal solutions (S0 = 5), including that one: BCAadDcbEe ((()()))() bBCAadDcEe ()(()())() CBbcDEeAad (())(()()) CBbcEeAadD (())()()() CBbEeAadDc (()()()()) cDdaACBbEe (()())()() DdaACBbcEe ()()(())() DCBbcEeAad ((())()()) DcCBbEeAad (()()()()) DcbBCAadEe ((())())() DcbBCEeAad ((())()()) DcbEeBCAad (((()))()) DcEebBCAad ((()())()) daACBbcDEe (()(()))() daACBbcEeD (()(())()) daACBbEecD (()(()())) daEeACBbcD ((())(())) EeAadDcbBC ()()()(()) EeAadCBbcD ()()((())) EeACBbcDda ()((())()) EeDcbBCAad ()((())()) EedaACBbcD ()(()(())) Let's now add the words that occur three times in two different folios: ---------------- ----- -- -- -- -- -- bb bb bb bb bb MM MM MM MM MM 55 33 22 11 44 ff ff ff ff ff 78 78 78 78 78 90 72 63 54 81 ---------------- ----- -- -- -- -- -- word(s) totct Aa Bb Cc Dd Ee ---------------- ----- -- -- -- -- -- olkeeey 3 21 .. .. .. .. solchey 3 12 .. .. .. .. tshey 3 2. .1 .. .. .. keeedy 3 1. .2 .. .. .. yteey 3 2. .. 1. .. .. otshdy 3 2. .. .1 .. .. dshey 3 1. .. .. .2 .. lcheol 3 2. .. .. .. 1. yteedy 3 1. .. .. .. 2. lom 3 .2 1. .. .. .. qopshedy 3 .2 1. .. .. .. qotchy 3 .2 1. .. .. .. dcheol 3 .1 2. .. .. .. ty 3 .2 .. .. 1. .. tchdy 3 .2 .. .. .1 .. oroly 3 .1 .. .. .2 .. ytal 3 .. 1. 2. .. .. lched 3 .. .2 .1 .. .. lcheckhy 3 .. .2 .. .. 1. sair 3 .. .. 21 .. .. lal 3 .. .. 12 .. .. cthedy 3 .. .. 2. 1. .. sshey 3 .. .. 1. 2. .. okedar 3 .. .. 1. .2 .. keed 3 .. .. .. 12 .. kchdy 3 .. .. .. .2 1. cthdy 3 .. .. .. .. 21 ---------------- ----- -- -- -- -- -- We get the matrix folio-2.matrix shown below left. Proposed order Alternate order Alternate order Current order (unnested) (unnested) (B C D nested) (all nested) AaBbCcDdEe EeDdAaBbCc AaBCDdcEeb DCBEAaebcd ()()()()() ()()()()() ()((())()) ((((())))) ---------- ---------- ---------- ---------- A -31221.221 E -1.12..2.1 A -312.21212 D -2...1..11 a 3-4...13.. e 1-..1..... a 3-4.13.... C 2-2.2...21 B 14-12..... D ..-1.1..21 B 14-2.....1 B .2-.14.1.. b 2.1-.2..2. d 1.1-23..1. C 2.2-212... E ...-2.1211 C 2.2.-221.. A 21.2-31221 D .1.2-11... A .212-31212 c 1..22-1.1. a ..133-4... d 23.11-.1.. a 1.4.3-...3 D .1..21-1.. B ....14-12. c 1..21.-1.2 e ...11.-... d 23..1.1-1. b 2...2.1-.2 E 2....11-12 b ..122..-2. E 2..2.1.1-1 C ..212.2.-2 e 1......1-. c 12.11..2-. e 1.......1- c 1.1.1..22- b 2.1...22.- d 11.123...- S0 = 25 S0 = 25 S0 = 25 S0 = 10026 S1 = S1 = S1 = S1 = S2 = 414 S2 = 318 S2 = 403 S2 = 40423 The computer found only one S2-optimal solution: BCaDdAcbEe "(((())))()" (and its reverse). Here it is: Found with computer BCaDdAcbEe (((())))() ---------- B -24..1.1.. C 2-.2122... a 4.-133.... D .21-1.1... d .131-2..1. A 123.2-1221 c .2.1.1-21. b 1....22-2. E ....1212-1 e .....1..1- S0 = S1 = S2 = 160 Let's add also the words that occur just twice, in different folios. Most of them may be noise (non-specific words of low-frequency which happened to occur twice only), but all together they may contribute some useful information. ---------------- ----- -- -- -- -- -- bb bb bb bb bb MM MM MM MM MM 55 33 22 11 44 ff ff ff ff ff 78 78 78 78 78 90 72 63 54 81 ---------------- ----- -- -- -- -- -- word(s) totct Aa Bb Cc Dd Ee ---------------- ----- -- -- -- -- -- ockhy 2 11 .. .. .. .. olteedy 2 11 .. .. .. .. cham 2 1. 1. .. .. .. dchy 2 1. 1. .. .. .. qotas 2 1. 1. .. .. .. chcthey 2 1. .1 .. .. .. lkchedy 2 1. .1 .. .. .. olkair 2 1. .1 .. .. .. oroiiin 2 1. .1 .. .. .. qotshedy 2 1. .1 .. .. .. shoky 2 1. .1 .. .. .. chedchey 2 1. .. 1. .. .. ldaiin 2 1. .. 1. .. .. okchy 2 1. .. 1. .. .. opar 2 1. .. 1. .. .. qokaldy 2 1. .. 1. .. .. olcheey 2 1. .. .1 .. .. pchor 2 1. .. .1 .. .. sheckhdy 2 1. .. .1 .. .. dykeedy 2 1. .. .. 1. .. olsheey 2 1. .. .. 1. .. por 2 1. .. .. 1. .. oram 2 1. .. .. .1 .. qokees 2 1. .. .. .1 .. lshdy 2 1. .. .. .. 1. otaldy 2 1. .. .. .. 1. dchol 2 1. .. .. .. .1 okchedy 2 1. .. .. .. .1 qoteesy 2 1. .. .. .. .1 cheety 2 .1 1. .. .. .. ldol 2 .1 1. .. .. .. olcheedy 2 .1 1. .. .. .. oloky 2 .1 1. .. .. .. rchy 2 .1 1. .. .. .. cheoky 2 .1 .1 .. .. .. cthor 2 .1 .1 .. .. .. dolshedy 2 .1 .1 .. .. .. tcheol 2 .1 .1 .. .. .. lkl 2 .1 .. 1. .. .. oraiin 2 .1 .. 1. .. .. ot 2 .1 .. 1. .. .. psheoldy 2 .1 .. 1. .. .. qokas 2 .1 .. 1. .. .. shees 2 .1 .. 1. .. .. shl 2 .1 .. 1. .. .. soin 2 .1 .. 1. .. .. olal 2 .1 .. .1 .. .. key 2 .1 .. .. 1. .. lsheckhy 2 .1 .. .. 1. .. okalol 2 .1 .. .. 1. .. olom 2 .1 .. .. 1. .. orchey 2 .1 .. .. 1. .. otam 2 .1 .. .. 1. .. qolshey 2 .1 .. .. 1. .. shepchy 2 .1 .. .. 1. .. chetain 2 .1 .. .. .1 .. olar 2 .1 .. .. .1 .. olkchy 2 .1 .. .. .1 .. pshol 2 .1 .. .. .1 .. rcheky 2 .1 .. .. .1 .. keol 2 .1 .. .. .. 1. otalor 2 .1 .. .. .. 1. kair 2 .1 .. .. .. .1 sh 2 .. 11 .. .. .. chedal 2 .. 1. 1. .. .. cphey 2 .. 1. 1. .. .. qeedy 2 .. 1. 1. .. .. qolal 2 .. 1. 1. .. .. sheolol 2 .. 1. 1. .. .. cheeol 2 .. 1. .1 .. .. daiiin 2 .. 1. .1 .. .. qokear 2 .. 1. .1 .. .. solkeey 2 .. 1. .1 .. .. chealy 2 .. 1. .. 1. .. kshey 2 .. 1. .. 1. .. oqokain 2 .. 1. .. 1. .. ycheedy 2 .. 1. .. 1. .. chetey 2 .. 1. .. .1 .. ycheey 2 .. 1. .. .1 .. qolshedy 2 .. 1. .. .. 1. oltedy 2 .. 1. .. .. .1 lcheedy 2 .. .1 1. .. .. lshed 2 .. .1 1. .. .. lteedy 2 .. .1 1. .. .. qolaiin 2 .. .1 1. .. .. cthol 2 .. .1 .1 .. .. okair 2 .. .1 .1 .. .. qokedal 2 .. .1 .1 .. .. chsdy 2 .. .1 .. 1. .. ckhey 2 .. .1 .. 1. .. olkol 2 .. .1 .. 1. .. qokeol 2 .. .1 .. .1 .. olfchedy 2 .. .1 .. .. 1. tchal 2 .. .1 .. .. 1. teeol 2 .. .1 .. .. 1. chetedy 2 .. .1 .. .. .1 chtedy 2 .. .1 .. .. .1 qokechedy 2 .. .1 .. .. .1 chedain 2 .. .. 11 .. .. ollchy 2 .. .. 11 .. .. chcthedy 2 .. .. 1. 1. .. dedy 2 .. .. 1. 1. .. ytain 2 .. .. 1. 1. .. alol 2 .. .. 1. .1 .. oro 2 .. .. 1. .1 .. qoteed 2 .. .. 1. .1 .. shtal 2 .. .. 1. .1 .. da 2 .. .. 1. .. 1. deedy 2 .. .. 1. .. 1. okeed 2 .. .. 1. .. 1. roiin 2 .. .. 1. .. 1. ykal 2 .. .. 1. .. 1. lshety 2 .. .. 1. .. .1 olsheol 2 .. .. 1. .. .1 shecphy 2 .. .. 1. .. .1 qotchdy 2 .. .. .1 1. .. chkeedy 2 .. .. .1 .1 .. dsheey 2 .. .. .1 .1 .. oqol 2 .. .. .1 .1 .. otchdy 2 .. .. .1 .1 .. soiiin 2 .. .. .1 .1 .. chckhal 2 .. .. .1 .. 1. ytaiin 2 .. .. .1 .. 1. dsheol 2 .. .. .1 .. .1 sheckhal 2 .. .. .1 .. .1 chekar 2 .. .. .. 11 .. kol 2 .. .. .. 11 .. olshed 2 .. .. .. 11 .. qokydy 2 .. .. .. 11 .. kary 2 .. .. .. 1. 1. ltedy 2 .. .. .. 1. .1 aldy 2 .. .. .. .1 1. dkedy 2 .. .. .. .1 1. okeshy 2 .. .. .. .1 1. shekedy 2 .. .. .. .1 1. ykain 2 .. .. .. .1 1. dchey 2 .. .. .. .1 .1 ykaiin 2 .. .. .. .1 .1 ---------------- ----- -- -- -- -- -- It turns out that all entries are strictly positive. So let's subtract 1 from everybody; that will subtract the same constant from the score, for all permutations. The result is folio-3.matrix below left: Proposed order Alternate order Alternate order Current order (unnested) (unnested) (B C D nested) (all nested) AaBbCcDdEe EeDdAaBbCc AaBCDdcEeb DCBEAaebcd ()()()()() ()()()()() ()((())()) ((((())))) ---------- ---------- ---------- ---------- A -437632343 E -..541.442 A -436233437 D -43.28.214 a 4-837.871. e .-.13..221 a 4-8787.1.3 C 4-74672334 B 38-17331.. D ..-4283241 B 38-7313..1 B 37-.38.131 b 731-342.42 d 514-371.44 C 677-443423 E .4.-41.425 C 6773-34442 A 4323-43763 D 2834-41..2 A 2634-43733 c 3.343-1421 a 1.874-837. d 37144-451. a 87814-.3.7 D 283241-4.. B ..3138-173 c 3.3314-214 e .2..3.-211 d 371.444-51 b 422.731-34 E 41.4.52-.4 b 2314732-4. E 41.442.5-. C 42446773-3 e 3..2.11.-2 c 13323.14-4 e 3..221.1.- c 21143.343- b 73132.442- d 4415371.4- Hm, this is still not good... Here is the computer's solution for this data, BDaCcAdbEe: BDaCcAdbEe (((())))() ---------- B -3873311.. D 3-841242.. a 88-7.4731. C 747-364342 c 31.3-34421 A 32463-3743 d 147443-.51 b 123347.-42 E ..142454-* e ...21312*- S2 = 742* PARTIAL FOLIO ORDER If we ask the computer to solve the problem for bifolios B,C,D,E only, without A, it finds the solution DdCcBbEe for folio-2.matrix. and BDCcdbEe for folio-3.matrix: Found with computer Found with computer folio-2.matrix folio-3.matrix DdCcBbEe BDCcdbEe ()()()() ((()))() -------- -------- D -121.... B -37311.. d 1-1...1. D 3-4142.. C 21-22... C 74-34342 c 1.2-.21. c 313-4421 B ..2.-1.. d 1444-.51 b ...21-2. b 1234.-42 E .1.1.2-* E ..4254-* e ......*- e ..2112*- S2 = 30* S2 = 195* Let's make another attempt with bifolios BCD only. It found dDCcBb with both folio-2.matrix and folio-3.matrix Computer found Computer found tbl parts 1-3 tbl parts 1-4 d D C c B b d D C c B b - - - - - - - - - - - - d - 1 1 . . . d - 4 4 4 1 . D 1 - 2 1 . . D 4 - 4 1 3 2 C 1 2 - 2 2 . C 4 4 - 3 7 3 c . 1 2 - . 2 c 4 1 3 - 3 4 B . . 2 . - 1 B 1 3 7 3 - 1 b . . . 2 1 - b . 2 3 4 1 - S2 = 6* S2 = 83* FOLIO ORDER FROM THREE-FOLIO WORDS Let's now look at the words that occur four or more times, but confined to three bifolios (tight equivalence): bbbbb MMMMM 53214 ---------------- ----- ----- word(s) totct ABCDE ---------------- ----- ----- tain 5 311.. rchedy 6 222.. lkaiin 5 221.. lsheey 5 221.. shear 5 221.. shckhey 5 113.. shed 5 113.. ---------------- ----- ----- qotol 9 53.1. ory 7 51.1. qoly 7 41.2. qokam 6 31.2. ---------------- ----- ----- kaiin 8 23..3 orain 5 21..2 arol 5 12..2 ---------------- ----- ----- otol 10 4.15. shal 6 4.11. teey 6 4.11. checthy 17 9.44. qokshedy 8 3.32. qolkeedy 6 3.21. otchedy 5 2.21. lkar 5 1.31. ---------------- ----- ----- ---------------- ----- ----- ykeey 7 4..12 olkal 5 3..11 teol 5 3..11 kar 7 2..23 pchey 5 2..21 ---------------- ----- ----- char 5 .221. aly 5 .212. olain 8 .152. lo 6 .132. ---------------- ----- ----- rain 6 .41.1 qopchedy 8 .23.3 lkeedy 5 .13.1 ---------------- ----- ----- sheety 6 .2.22 ---------------- ----- ----- shckhedy 6 ..312 qokeed 6 ..231 chckhedy 5 ..221 lchdy 5 ..221 dl 5 ..122 ---------------- ----- ----- Note that, if we delete the first and last column from this table (i.e we ignore the bifolios A and E), we practically get only occurrences in B-C or C-D, and almost none in B-D: B C D -- -- -- B - 14 5 C 14 - 17 D 5 17 - COLORIZING THE KEY WORDS I collected manually the words (in tight equivalence) that occur four or more times in the biological section, but are confined to only two bifolios, separately for each paragraph, and saved them to bio-04-02-words.txt, in the reading order as conjectured above. Next I colorized them with colors whose hue varies gradually from bifolio A to bifolio E: cat bio-04-02-words.txt \ | ../050/colorize-text -f ../../eva2erg.gawk \ -v indent=2 \ -v colorTable=../050/color-tables/bio-04-02-words.cdic \ -v commentColor="5588bb" \ -v headers=1 \ -v comments=1 \ > bio-04-02-words.html