Zipf law plot (frequency as function of frequency rank) for various texts.

The languages, texts and the word frequency files are:

[[Hebrew language|Hebrew]]. The first five books (''[[Torah]]'', ''Pentateuch'') of the Hebrew Bible (''Tanak''). From the 10th century version (the [[Masoretic text]]) of the original, probably composed mainly around ~500 BCE from earlier texts.  From the ''Sacred Texts'' site, maintained by John B. Hare.  In an ad-hoc single-byte encoding designed to look vaguely phonetic under an ISO-Latin-1 font.  '''With''' vowel points but '''without''' cantillation marks.

* Book 1, ''Bereis'' (''Genesis''). Sample: ''<nowiki>b¤°rë¡s¹ïy± b¤ârâ¡ ¡°êlöhïym ¡ë± häs¤¹âmäyïm w°¡ë± hâ¡ârêþ w°hâ¡ârêþ</nowiki>'' [...] ''<nowiki>wäy¤äçän°tw¤ ¡ö±wö wäy¤ïys²êm b¤â¡ârwön b¤°mïþ°râyïm</nowiki>''. File hebr/tav/gen.1/gud.wfr (17211 words, ''N'' = 7212 distinct).


* Book 2, ''Shmot'' (''Exodus''). Sample: ''<nowiki>w°¡ël¤êh s¹°mwö± b¤°nëy yïs²°râ¡ël häb¤â¡ïym mïþ°rây°mâh ¡ë± yä¿°äqöb</nowiki>'' [...] ''<nowiki>b¤wöl°¿ëynëy kâlb¤ëy±yïs²°râ¡ël b¤°kâlmäs°¿ëyhêm</nowiki>''. File hebr/tav/exo.1/gud.wfr (13870 words, ''N'' = 5711 distinct).


* Book 4, ''Bamidbar'' (''Numeri''). Sample: ''<nowiki>wäy°däb¤ër y°hwâh ¡êlmös¹êh b¤°mïd°b¤är sïynäy b¤°¡öhêl mwö¿ëd b¤°¡êçâd</nowiki>'' [...] ''<nowiki>b¤°¿är°bö± mwö¡âb ¿äl yär°d¤ën y°rëçwö</nowiki>''. File hebr/tav/num.1/gud.wfr (13573 words, ''N'' = 5306 distinct).


* Book 3, ''Vaykra'' (''Leviticus''). Sample: ''<nowiki>wäy¤ïq°râ ¡êlmös¹êh wäy°däb¤ër y°hwâh ¡ëlâyw më¡öhêl mwö¿ëd lë¡mör</nowiki>'' [...] ''<nowiki>¡ê±mös¹êh¡êlb¤°nëy yïs²°râ¡ël b¤°här sïynây</nowiki>''. File hebr/tav/lev.1/gud.wfr (9650 words, ''N'' = 3860 distinct).


* Book 5, ''Devarim'' (''Deuteronomium''). Sample: ''<nowiki>¡ël¤êh häd¤°bârïym ¡°äs¹êr d¤ïb¤êr mös¹êh ¡êlk¤âlyïs²°râ¡ël b¤°¿ëbêr</nowiki>'' [...] ''<nowiki>häm¤wörâ¡ häg¤âdwöl ¡°äs¹êr ¿âs²âh mös¹êh l°¿ëynëy k¤âlyïs²°râ¡ël</nowiki>''. File hebr/tav/deu.1/gud.wfr (12007 words, ''N'' = 5455 distinct).

The word frequency files '*/*/*/gud.wfr' are available at the [https://www.ic.unicamp.br/~stolfi/EXPORT/projects/voynich/Notes/tr-stats/dat/ UNICAMP website].  The original annotated full texts are in the companion files */*/org/main.src.  The extracted texts -- one word per line, without punctuation -- are in */*/*/gud.tlw.