Notes on the Voynich Manuscript - Part 17 [1992 February 5] ----------------------------------------- Where have all the Consonants gone? On the basis of our current hypotheses, the Voynich text has too few consonant occurrences and too few consonantal letters. Here's the statistics, for the entire Currier text: count permill C/V ?? O: 10435 156 V 9: 8186 122 V C: 7188 107 V 8: 6001 90 C A: 5115 76 V S: 5058 75 C E: 4362 65 V F: 4090 61 R: 2686 40 4: 2564 38 P: 2394 35 Z: 2270 34 M: 1717 25 2: 1005 15 N: 749 11 Q: 603 9 B: 503 7 X: 463 6 J: 350 5 V: 156 2 T: 134 2 W: 122 1 D: 98 1 3: 82 1 ---------------- U: 56 0 I: 55 0 6: 53 0 Y: 33 0 K: 16 0 7: 13 0 G: 13 0 0: 7 0 H: 7 0 L: 7 0 5: 2 0 I've inserted a cutoff line at 0.1% - everything below occurs less than one time in 1000 characters. Now, based on our current guess, that gives as % of text: 52.6% vowels 47.4% consonants And that's assuming everything not identified is a consonant. For a sample of English text, I found e: 2730 127 V o: 1717 80 V a: 1683 78 V t: 1681 78 n: 1504 70 r: 1469 68 s: 1409 65 h: 1401 65 i: 1315 61 V l: 1010 47 d: 901 42 f: 743 34 u: 558 26 V c: 519 24 w: 469 21 m: 449 20 b: 413 19 y: 344 16 V/2 g: 333 15 p: 260 12 v: 221 10 k: 84 3 j: 45 2 x: 36 1 ---------------- q: 17 0 z: 8 0 And that gives 38.0% vowels 62.0% consonants Yes, but English has all those consonant clusters. True, but an analysis of some mediaeval Latin gave: 43.6% vowels 56.4% consonants [Note: the Latin in question was most of the text of the Mass, simply concatenated. It was all I could find at the time.] Now, when you look at supposed romanised Voynich, the cause is clear: an awful lot - an humungous lot - of vowel clusters, whether long vowels, diphthongs, or just EIEIO chanting is unclear. Why should that be? I have a speculation, maybe for another note. [Note: the vowel/consonant ratio is a property of three things: the language, the script, and the rules of orthography. For example, if the language is Russian, you get a much higher vowel ratio with the Cyrillic script than with romanised text, because the former uses single letters where the latter must use clusters of 2, 3 or even 4 consonants. As another example, Sanskrit in conventional Devanagara script gives a low vowel ratio, because the most common vowel ('a') is usually elided in traditional orthography. Really weird ratios can occur when a script devised for one language is used to transcribe a very different language - a point discussed in a later note.] But a bigger trouble, to my mind, is that there aren't enough symbols to make up a decent set of consonants. If we set the cutoff at the 1 per mill, there are just 24 Voynich letters worth counting. We suspect at least 5 to be vowels, and 4 more (the cXXt) to be some form of fusion, and that leaves only 15 genuine consonants. At the same cutoff, English has 19 (counting Y), and Latin has 17 (all its consonants occur at least at the 0.2% level), or 18 if you recall that C does double duty as kappa and chi. Again, one explanation is that many consonantal sounds are written with two letters. In English, for example, we could use ph, zh, bh, ks, kw for f,j,v,x,qu so reducing the alphabet to 14 consonantal letters. But I don't see much sign of that in Voynich; apart from 8, S(ct) and Z(c't), few consonants seem to like to cluster. And the sheer frequency counts suggest that letter pairs do not form single consonants - the reason English scores so low on vowels is precisely that a lot of our consonantal sounds take several letters; with Voynich the inference is the opposite, that their consonants are single letters but their vowels are multiple letters. So, if this is a European language, where have those consonants gone? (Plump nymphs picked them every one?) Robert [Note: and, again - if this is a European language, which language might the phenomena imply? Yet again, the lack of a large sample of machine-readable texts in diverse languages brought the work to a halt.]