Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

Synthetic text imitating [[Vietnamese language|Vietnamese]]. Text created by a software implementation of the 'grille' method proposed by Gordon Rugg for the Voynich Manuscript.  The three columns of his table are obtained from 520 actual Vietnamese words, split into initial consonants, vowels, and final consonants.

* Whole generated text. Sample: ''<nowiki>cho cu?a ca?i ngu+o+`i va` nha^.m co^ng vie^.c cu?a tay ngu+o+`i la`m</nowiki>'' [...] ''<nowiki>tha`m co+i no+?c su?a nhu+o+ing ho+? lu+o+'i qo' nha't cay mo+`it tro+`i</nowiki>''. File viep/grs/tot.1/gud.wfr (31200 words, ''N'' = 7760 distinct).

[[Vietnamese language|Vietnamese]]. The first five books (the ''Pentateuch'') from the [[Cadman Vietnamese Bible]] (1934). Probably translated from the English [[King James Bible]].  In the ASCII VIQR encoding, mapped to lowercase, without hyphens.

* All five books. Sample: ''<nowiki>ban dda^`u ddu+'c chu'a tro+`i du+.ng ne^n tro+`i dda^'t va? dda^'t la`</nowiki>'' [...] ''<nowiki>da.y la.i cho dde^? ca'c ngu+o+i la`m theo no' trong xu+' ma` ca'c</nowiki>''. File viet/ptt/tot.1/gud.wfr (original 169480 words, truncated/filtered to 35027 words, ''N'' = 1631 distinct).

Synthetic text imitating [[Vietnamese language|Vietnamese]]. Text created by a [[Markov chain]] of order 3, trained on the Cadman Vietnamese Pentateuch.

* Whole generated text. Sample: ''<nowiki>ddo' no+i cha(`ng ra(`ng mi`nh xo^'p da^~ng ddi dde^` ca'ch tro+`i</nowiki>'' [...] ''<nowiki>dda(.c</nowiki>''. File viep/mky/tot.1/gud.wfr (original 39293 words, truncated/filtered to 35027 words, ''N'' = 3341 distinct).

[[English language|English]]. Text of [[H. G. Wells]]'s novel ''[[The War of the Worlds]]'' (1898), excluding numbers, mapped to lowercase, encrypted by replacing each distinct word, in order of decreasing frequency, with a distinct Vietnamese syllable or two-syllable compound from the Vietnamese Bible, also in order of decreasing frequency: like 'the' ⟶ 'ngu+o+i', 'and' ⟶ 'va`', 'dead' ⟶ 'le^~-chuo^.c', etc..

* Whole text. Sample: ''<nowiki>ddo`n ddu+'c cha thue^' no^ ra le^~ con lo`ng le'c ra phe^n va` van dda~</nowiki>'' [...] ''<nowiki>ddo^` ddu+a pha'n va` thu+o+ng dda(.ng cho.c co' ba cha(n co' ca'c da^u</nowiki>''. File envt/wow/tot.1/gud.wfr (original 42098 words, truncated/filtered to 35027 words, ''N'' = 1650 distinct).

[[English language|English]]. Text of [[H. G. Wells]]'s novel ''[[The War of the Worlds]]'' (1898), excluding numbers, mapped to lowercase.

* Whole text. Sample: ''<nowiki>no one would have believed in the last years of the nineteenth century</nowiki>'' [...] ''<nowiki>there were already a couple of score of passengers aboard some of</nowiki>''. File engl/wow/tot.1/gud.wfr (original 60293 words, truncated/filtered to 35027 words, ''N'' = 4869 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts, before truncation/filtering, are in the companion files */*/org/main.src.  The truncated/filtered texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.