It may be that Herbal-B is a mixture of the other two languages, so the words that we see in the list above is mostly the intersection of the two vocabularies. To see what I mean, consider mixing equal parts of two hypothetical languages X and Y with the following (Zipf-like) word distributions: Lang.X Lang.Y Mixture -------- -------- --------- 0.44 xuxa 0.44 yoyo 0.22 xuxa 0.22 xenon 0.22 young 0.22 yoyo 0.14 xiang 0.14 yaks 0.12 foobar 0.12 foobar 0.12 foobar 0.11 xenon 0.08 xerox 0.08 yield 0.11 young 0.04 xerox 0.04 yield Note that mixing caused the shared word "foobar" to became more popular than "xexeo" and "yang". Of the non-shared words, only the front-runners "xuxa" and "yoyo" retained their leading positions, because of their large Zipf advantage