Last edited on 1998-09-26 15:35:54 by stolfi
Last major edits on 97-11-23 by stolfi.

The Generalized Chinese Theory

A while ago Jacques Guy proposed, as a remote possibility, that the VMs might be written in Chinese or some related tonal language. (Guesses in that general direction were also made by Denis Mardle and others.)

I think my prefix-midfix-suffix decomposition of Voynich words makes Jacques Guy's Chinese Theory look better than before. With the eyes half-closed, and a generous dose of optimism, I can almost believe that the statistics are vaguely suggestive of a possible fit. (Note to new readers: In Voynich circles, this could be considered a bold and assertive statement.)

[ See also the restatement fo the prefix-midfix-suffix decomposition in terms of the OKOKOKO fine structure paradigm. ]

Basics of Chinese morphology

I have fetched from the archives an old message by Jacques Guy that describes some features of the Chinese language that seem pertinent to the VMs.

Briefly, in Chinese the basic language unit is the syllable. A Chinese syllable consists of an initial consonant (24 choices in modern Mandarin) and a final part (some 30 choices) consisiting basically of vowels and nasals. In addition, the pitch of the final part can vary in one of four ways (flat, rising, falling, and falling-then-rising). For example, the word "chuang4" (in the modern pinyin notation) has initial "ch", final "uang", and tone "4".

The final part can be further broken down into an optional medial vowel or glide (three choices), a main vowel (three or four choices), an optional secondary vowel (two choices), and an optional terminator (n", "ng", or "r"); but only 30 or so combinations of these elements are valid.

All these numbers are valid only for the Mandarin dialect in recent times. From the little I have read, I believe they may have changed a lot in the last 500 years.

Here is a sample of modern Mandarin Chinese (in pinyin) that I stole from somewhere in Netland:

  lu3 xun4 shi4 zhong1 guo2 jin4 dai4 shi3 shang4 zui4 you3 ying3 xiang3 li4
  de wen2 xue2 jia1 gen1 pi1 ping2 jia1 zhi1 yi1 yi1 ba1 ba1 yi1 nian2
  chu1 sheng1 zai4 zhe4 jiang1 shao4 xing1 yi2 ge xiang1 dang1 fu4 yu4 de
  jia1 ting2 li3 tong2 nian2 de shi2 hou yin1 wei4 zu3 fu4 ru4 yu4 fu4 qin
  sheng1 bing4 jia1 ting2 de jing1 ji4 qing2 kuang4 tu1 ran2 bian4 de hen3
  qiong2 kun4 zhe4 zhong3 you2 fu4 yu4 bian4 dao4 qiong2 kun4 de jing1 li4
  rang4 lu3 xun4 ti3 yan4 le bu4 tong2 de sheng1 huo2 zhe4 dui4 ta1 yi3 hou4
  de wen2 xue2 chuang4 zuo4 you3 hen3 da4 de ying3 xiang3 ta1 tong2 nian2 de
  sheng1 huo2 he2 hui2 yi4 dou1 cheng2 le ta1 xie3 zuo4 zui4 hao3 de cai2
  liao4 yin1 wei4 ta1 fu4 qin jing1 chang2 sheng bing4 lu3 xun4 cong2 xiao3
  jiu4 ren4 shi le bu4 shao3 zhong1 yi1

All together there are about 1200 valid syllables. These are the basic "simple words" of Chinese, the smallest units of text that have a recognizable meaning. The typical syllable usually has a very broad and ambiguous meaning, or several unrelated meanings (much as the English "get"). Therefore, syllables are often used as parts of two- or three-syllable compounds---such as "dian4 hua4" = "electric-speech" for "telephone"---whose meaning is established by convention and cannot be entirely deduced from the meanings of the individual syllables. (Compare with the English "get even" or "big deal"). In fact, since the set of syllables is essentially fixed, new concepts are usually named with new compounds of existing words.

All elements of the syllable are semantically significant, including the tone: so "ba1" is "eight", "ba2" is "pull out", "ba3" is "grasp", and "ba4" is "dam". As the example shows, similar sounds do not imply related meanings: those four words are just as unrelated as the English words "cat", "kit", "cot" and "cut".

In fact, Chinese has virtually no inflections or derivation in the Indo-European sense. There is no grammatical gender. Nouns do not change depending on their function in the sentence. Plural is indicated only when really needed, by a separate word (such as "all", "several", "some", etc.). Verbs do not change with person or tense; rather, time and other aspects are indicated, as need, by separate adverbial words. In fact, given this lack of inflection, the traditional Western distinction between nouns, verbs, adjectives and adverbs is hardly applicable: many Chinese words can play several of these roles.

Voynichese as Chinese

The "Chinese Theory" proposes that the VMs is written in Chinese, with an original phonetic or semi-phonetic alphabet.

The Chinese Theory seems to be a goot fit to my prefix-midfix-suffix decomposition. The prefixes could denote tones: assuming EVA "y" and "o" are equivalent there, we get roughly four significant prefixes. The midfixes could be the 24 initial consonants, and the suffixes would be the final parts.

The consonants are a bit of a problem, though. The "fine structure" of the midfixes suggests that a nontrivial fraction of them consist of two or three consonants; whereas in Chinese every syllable begins with a single consonant. Perhaps those multiletter groups are just a verbose encoding of a single consonant. Or perhaps my decomposition rule is assigning the medial vowel to the midfix instead of the suffix.

The same problems apply to the suffixes. While there are only a dozen or so suffixes with frequencies over 1%, there is a long list of suffixes that occur only a few times in the whole manuscript. We could explain some of them as the combination of a verbose encoding (where groups like "or" and "aiin" represent single vowels), meaningless calligraphic variation (such as, perhaps, the writing of "ol" for "al"), and transcription errors (some "-o" and "-a" endings may be misreadings of "-y")..

Another problem is posed by the all-soft words ("unifixes" in my notebooks), because the initial consonant is mandatory in the Chinese syllable. So, if Voynichese is Chinese, it means that some of my "soft" letters must be hard, at least sometimes.

One possibility is the "n" sound, which in modern standard Chinese can be both an initial consonant and a syllable terminator. If "n" was been represented with EVA "d" in both cases, then many of my "unifixes" can be re-parsed as ordinary words with an "-n-" midfix. For example, the most frequent unifix "daiin" would be split as "- -d- -aiin", all three components being already very common.

Another possible explanation for the unifixes, within the Chinese theory, that they are certain common unstressed syllables (like Chinese "de" = Enlish "'s") that function like our prepositions, and tend to be phonetically attached to previous word. Indeed, some of the longer rare suffixes mentioned above can be parsed as a common suffix attached to a common unifix.

And where are the numbers?

One vexing question about the VMS is why it contains no recognizable numbers, or any distinctive symbols or "words" that might be numbers. Even if the author had good reasons to encrypt the main text, why encrypt the numbers---all of them? We would expect at least a few numbers in the astro/cosmo diagrams; if they are there, they are encoded or spelled out---why?

The Chinese Theory may provide a reasonable explanation. Note the repetitions "yi1 yi1 ba1 ba1 yi1" on line 2 of the Mandarin sample above. There is a sentence boundary break between the first and second "yi1"; and the remaining "yi1 ba1 ba1 yi1" is a year, namely "1881". Note that, except for the chance repetition of digits, the numeral does not look different from the surrounding text.

This situation is very different from what happens in Western languages, where the distinction between "words" and "symbols" (including digits) is very clear cut. Even in classic Latin texts, numbers like "MDCCCLXXXI" are clearly not words. Moreover, the reason for using symbols is that a phonetic transcription of the numbers ("one thousand, eight hundred and eighty one") would be extremely prolix and cumbersome. Note that even though we write numbers in positional notation, we still read them with the archaic position-dependent words, with a mini-grammar all of their own ("forty seven" instead of "four seven", "quatre-ving treize" instead of "neuf trois", "quinhentos" instead of "cinco zero zero").

This distinction is rather meaningless in Chinese, because of its monosyllabic and non-inflecting character, and its syllable-based script. Each Chinese digit is a single syllable and a single character, and its reading does not depend on its position within the number. Even when a large number is written in decimal positional notation, the "digits" are still ordinary words, and are read as such. (As far as I know, this was true even before the introduction of positional notation).

So, if we assume that the VMs was written in phonetic Chinese, either by or for Chinese speakers, we should expect its numerals to be spelled out, and look just like normal text. (We might be able to detect them by careful analysis of word correlations, but I won't bet on it.) Conversely, if we try to explain the Vms within a purely European context, we are stuck with the problem of explaining why there are no number-like or symbol-like "words" of any sort.

Feature list

In summary, I see an impressive list of features in the Chinese language that, within the limits of my ignorance, seem to match what we see in the Vms:

Most common words consist of a single syllable.
The standard script has no punctuation.
Spaces delimit syllables, not compound words.
Line breaks can occur between any two syllables.
The syllables have a narrow range of lengths.
There are only 400 or so phonetically distinct syllables.
Very similar words generally have unrelated meanings.
The same word enters in many compounds, with different meanings.
Repeated words are relatively common.
There are no inflections.
Numbers look like ordinary words.
Syllables have a rigid internal structure.
The syllable has three main phonetic components.
The components have roughly 4,25, and 30 choices, respectively.

Historical possibility

There were quite a few contacts between China and Europe in the 200(?) years from Marco Polo to John Dee. Among these there was at least one Catholic missionary who spent several years at the Chinese Emperor's court (I owe you the names and dates). Thus I can think of several possible scenarios that would end up with the VMs in the hands of John Dee.

As Jacques suggested, the author may have been a Chinese traveller who came to Europe, by himself or as part of a returning western expedition. However, it is not clear why he would write a book like the VMs. (And one would expect him to use at least some chinese characters.)

Another possibility is that the book was written by a missionary in China. Inventing phonetic alphabets for foreign languages was a typical activity of of Christian priests, and in fact that is the origin of the present writing systems for many languages---from Russian to Vietnamese. I imagine that the first missionaries to China would have thought it impossible to master the 50,000 characters of learned Chinese dictionaries, and would have thought it necessary to develop a phonetic script.

Late-breaking news: While browsing through a book on the Chinese language (Jerry Norman, "Chinese", Cambride U. Press) I tripped, and almost fell over, the following sentence:

The idea of writing Chinese in an alphabetic form was not new. ... In the late Ming dynasty [1368-1644], Matteo Ricci, the famous Jesuit missionary, devised a scheme for romanizing Chinese indented mainly as an aid for Europeans in the learning of Chinese.

Now there is a lead that we must follow...

The VMs would be the product of such an effort: both an exercise to test and practice the author's "medieval pinyin", and a useful handbook that would motivate the Chinese to learn the system. But soon the priests must have realized that learning the standard Chinese writing was hard but not impossible, and that the Chinese would never think of giving up their script for any other system. So the VMs went back to Europe with its author, as a useless souvenir; and eventually it was sold to Dee as a Bacon original.

One problem with this sub-theory is that the subject matter of the book does not seem proper for a Catholic priest---quite the opposite. Also, one must explain the complete lack of any symbols other than the Voynich characters.

A variant of this scenario is that the book was written for an European missionary, by his Chinese disciples or friends. They wrote in Chinese, in the priest's invented script, because that was the only way they could communicate. The subject matter of the VMs seems just the sort of things that a Catholic missionary would want to know, record, and report back to Rome: their herbal recipes, medical theories, calendars, cosmology, etc.. (I believe there were several known instances of this scenario in the New World, starting from the 16th century.)

This last scenario would be consistent with the "alien" subject of the VMs drawings, the lack of western symbols, the good material quality of the book. And, finally, it would vindicate Rene's "1058 Supernova" theory...

The "Chinese Theory" and the Currier languages

The Chinese theory could also provide a natural explanation for the N hands and two languages that can be discerned in the VMs (first observed by Currier).

While the issue of different calligraphic hands is still open, there are mant strong statistical differences between the so-called "A" and "B" pages. These differences cannot be explained by transcription errors, since they are equally evident in the Currier and Friedman versions.

The prefix-midfix-suffix paradigm makes it easier to describe the differences, as reported in aseparate page. As far as I can tell, the differences seem consistent with the thesis that A and B are two dialects of Chinese.

From what I have read it seems that all Chinese dialects have the same general structure: monosyllabic words, consisting of an initial consonant and a tail of vowels and auxiliary consonants, the latter pronounced with a specific tone (pitch pattern). But different dialects have different sets of alternatives for each of these three components.

Most dialects have four or five distinct and semantically meaningful tones (a few have as many as eight). The other two components are more variable: both the set of sounds and their frequencies can be quite different. For example, Cantonese apparently has only 20 initial consonants, instead of Mandarin's 24; but on the other hand, it has three six possible terminals instead of three. (My reference is not very clear, but it seems that they are -k -t -p -n -ng and null, the last three being shared with Mandarin.)

It seems also that differences between Chinese dialects cannot be described as a simple phonemic substitution. That is, one cannot make two replacement tables, one of consonants and one of finals, that together will transform a Mandarin word to its Cantonese equivalent. Translation depends on the whole syllable.

These differences are surprisingly similar to the differences we observe between Voynichese-A and Voynichese-B, when analyzed in terms of the prefix-midfix-suffix decompositon. What we see is that

the basic structure of the word is the same;
the prefix distributions are basically the same;
the midfix and suffix distributions show many large differences;
the tail (midfix+suffix) distributions are very different.

The "Extended Chinese Theory"

Of course, Chinese is not the only language that would fit the general characteristics of Voynichese. Many other languages of East Asia have non-inflecting monosyllabic words, tones, repetitions, etc. Jacques made a case for Vietnamese (which, by the way, has syllables with two consonants, like "tr"; but there are several other candidates.

Historical support for a non-Chinese East Asian connection is a bit weaker than for China itself---but not much. Of the Western travelers and missionaries who visited China before 1500, many also visited Vietnam, Malasia, Indonesia, and other countries in Southeast Asia. If a missionary stayed at some of those places long enough to learn the language (as several did in China), he would probably have interacted with local scholars, and possibly taken local students. I think that any of those people is a serious candidate for being the VMs author.

[ back to Stolfi's Voynich pages ]