Last edited on 1998-01-27 11:52:48 by stolfi

The names of the plants

Contents

Description

Introduction

The herbal section has one plant drawing per page. It is reasonable to expect that the plant name be mentioned somewhere on that page. Moreover, it is a bit unlikely that the name be mentioned on other herbal pages (atlought it may be mentioned in other sections).

So, we can hope to narrow down the candidates for plant names by looking for page-specific words: words that occur in only oneherbal page. That is what I did, and here is what I got.

Word equivalence

Actually, I did something a bit more complicated. First, when counting the occurrences of a word, I ignored certain differences that I believe are not significant, or due to transcription errors.

Specifically, I counted two words as being the same if they were identical after the following substitutions:

Note that by using equivelence instead of identitiy we can only remove words from the page-specific set, never add them.

Broken words

Also, I detected and counted an occurrence of a word even when it was broken up by spaces. Thus, for example, I counted the sequence char.dy as one occurence each of char, dy, and chardy.

Input data

For this processing, I used the interlinear transcription of the Vms, created by G. Landini from various files originally compiled by Jim Reeds, anc converted to EVA by me.

Actually, I used only the "F" (Friedman/First Study Group) part of the interlinear file. That meant excluding page f65v, which is available only in the Currier transcription. Page f65r is also omitted, because it contains no running text---only a "title" or label.

It should be kept in mind that all the versions contained in the interlinear file have many transcription errors by the original authors, which may be the origin of some of the page-specific words.

Procedures

If you are curious, you can browse my Notebook with the actual Unix commands I used to generate these files, and the directory with all the relevant scripts and files.

Results

Colorized pages

Based on these counts, I prepared colorized versions of the herbal pages, where each page-specific word is highlighted with purple or red. (Purple if it occurs once, red if it occurs twice or more, but always on one page only).

Also, words that occur in two or more pages, but have at least half of those occurrences concentrated in one single page, are painted blue (on all pages where they occur, even on minoritary ones.)

List of "plant names"

Even with the loose equivalence above, there are typically 5--20 page-specific words on each page; so we can't identify the plant names by that criterion alone.

However, it turns out that the first word of each page is almost always page-specific. I take this fact as a sign that, as a rule, the first word of the page is the plant's name.

Moreover, most exceptions to this rule seem to be due to the name being broken by a questionable word space. In that case, we can usually obtain a page-specific word by joining the first two (or three) words of the page. Based on this principle, I have prepared a list of likely names for the plant in most Vms pages page.