Last edited on 1998-07-16 06:47:07 by stolfi
Romance languages
- Spanish
Miguel de Cervantes's Don Quijote
[ full ]
[ page ]
The entropy h3
computed for this sample (2.64 bits/character), is a trifle
lower than that of the English sample (2.68 b/c).
As in English, some of the most
conspicuous blue spots are spelling system ``bugs'' such as
qu (which sounds like k or
kw) and ha (the h
here being silent). Other dark spots are some common short
words, and word breaks --- especially after the
word y (English and;
parctically the only Spanish word that uses the letter y),
and after the common endings
os, as.
Among the bright spots one notices
several words with diacritics on the first syllable,
e.g. ténganse, óiganme,
águila: those words have the antepenultimate
syllable stress, which is rare in Spanish. Other hot spots are the
oblique pronouns attached to verbs, like the
-me in óiganme or the
-nos in póngannos, and
compounds like todopoderoso: although
traditionally written without space, those junctions still
sound discontinuous.
l = 1 r = 0
[ colorized page ]
[ bits per tuple ]
l = 2 r = 0
[ colorized page ]
[ bits per tuple ]
l = 3 r = 0
[ colorized page ]
[ bits per tuple ]
l = 1 r = 1
[ colorized page ]
[ bits per tuple ]
- Portuguese
Raul Pompéia's O Ateneu
[ full ]
[ page ]
This Portuguese sample has a higher
entropy than the Don Quijote text (h3 =
3.034), probably because Portuguese has a bigger alphabet (40
letters here) and a more irregular spelling. Otherwise its
entropy pattern looks a lot like that of Spanish, which may not
be very surprising. Conspicuous low-entropy spots are the
endings -çao and -ente (English
-tion, -ent), and short common
words such as que (that),
de (of), do
(of the) um
(one, a).
One notable high spot is
à (English to the), which is
practically the only word in Portuguese with a grave accent.
l = 1 r = 0
[ colorized page ]
[ bits per tuple ]
l = 2 r = 0
[ colorized page ]
[ bits per tuple ]
l = 3 r = 0
[ colorized page ]
[ bits per tuple ]
l = 1 r = 1
[ colorized page ]
[ bits per tuple ]
- Italian
Alessandro Manzoni's I Promessi Sposi
[ full ]
[ page ]
The entropy pattern seems more even
than Spanish and Portuguese. There are expected blue spots in
the u after q, and in the
-ment- suffix, as in the other languages, but
also on the n after word-initial
u (because of the indefinite articles
un, una, etc.), and
e after ch (because of the
common che (English that).
Note that, as in Spanish, there is
usually a bright spot where an oblique pronoun is attached to a
verb, e.g. the si in
ristringersi; and where a final vowel has been
supressed for euphony, as in insegnavan.
The very bright spots on
è (is) are intriguing, since
that's a very common word in Italian. A possible explanation
is that this historical novel is narrated mostly in the past
tense --- except on this opening paragraph, where the author
describes the setting as it looked in his day.
l = 1 r = 0
[ colorized page ]
[ bits per tuple ]
l = 2 r = 0
[ colorized page ]
[ bits per tuple ]
l = 3 r = 0
[ colorized page ]
[ bits per tuple ]
l = 1 r = 1
[ colorized page ]
[ bits per tuple ]
- French
Voltaire's Micromégas
[ full ]
[ page ]
Again we see the blue spots on
u after q. Typically French
« bluisms » are the e after ll and
d, and l after
i.
l = 1 r = 0
[ colorized page ]
[ bits per tuple ]
l = 2 r = 0
[ colorized page ]
[ bits per tuple ]
l = 3 r = 0
[ colorized page ]
[ bits per tuple ]
l = 1 r = 1
[ colorized page ]
[ bits per tuple ]