Hacking at the Voynich manuscript - Side notes 023 Plotting QOKOKOKO element frequencies per page Last edited on 1999-01-31 06:42:31 by stolfi 1998-06-20 stolfi ================= [ First version done on 1998-05-04, now redone with fresher data. ] [ Removed element counting part to Notes/022 on 1999-01-31. ] In Note 021, I tried to classify pages according to the frequencies of certain keywords. John Grove pointed out that the transcription which I used (Friedman's) has inconsistencies which may masquerade as language differences, e.g. "dain" in place of "daiin" or vice-versa. Also, it seems that spacing (word division) is quite inconsistent. Some of these problems were removed on 1999-01-30 by redoing the analysis with the majority edition instead of a best pick. Still, to remove the problem even further, I thought of using, instead of words, the "elements" of the QOKOKOKO paradigm. See Notes 017 and 018. I. EXTRACTING AND COUNTING ELEMENTS I will use the majority edition of the interlinear, release 1.6e6: ln -s ../045/pages-m text-pages ln -s ../045/sections-m text-sections I will borrow the element counts computes in Notes/022: mkdir -p RAW EQV ( cd RAW && ln ../../022/RAW/efreqs ) ( cd EQV && ln ../../022/EQV/efreqs ) II. PAGE SCATTER-PLOTS See Notes/021 for explanation of these plots. Let's now compute the frequencies of these keywords in each page and section: foreach dic ( vald ) foreach etag ( RAW EQV ) foreach utype ( pages sections ) set frdir = "${etag}/efreqs/${utype}" set ptdir = "${etag}/plots/${dic}/${utype}" echo "${frdir}" "${ptdir}" /bin/rm -rf ${ptdir} mkdir -p ${ptdir} cp -p ${frdir}/all.names ${ptdir} foreach fnum ( tot `cat ${frdir}/all.names` ) printf "%30s/%-7s " "${ptdir}" "${fnum}:" cat ${frdir}/${fnum}.frq \ | gawk '/./{print $1, $3;}' \ | est-dic-probs -v dic=${etag}/plots/${dic}/keys.dic \ > ${ptdir}/${fnum}.pos end end end end Let's plot them: set sys = "tot-hea" foreach dic ( vald ) foreach etag ( RAW EQV ) set ptdir = "${etag}/plots/${dic}/pages" set scdir = "${etag}/plots/${dic}/sections" set fgdir = "${etag}/plots/${dic}/${sys}" /bin/rm -rf ${fgdir} mkdir -p ${fgdir} cp -p ${ptdir}/all.names ${fgdir}/all.names make-3d-scatter-plots \ ${ptdir} \ ${fgdir} \ ${scdir}/{tot,tot,hea,heb,bio}.pos end end Again, trying to separate Herbal-A from Pharma: set sys = "hea-pha" foreach dic ( vald ) foreach etag ( RAW EQV ) set ptdir = "${etag}/plots/${dic}/pages" set scdir = "${etag}/plots/${dic}/sections" set fgdir = "${etag}/plots/${dic}/${sys}" /bin/rm -rf ${fgdir} mkdir -p ${fgdir} cp -p ${ptdir}/all.names ${fgdir}/all.names make-3d-scatter-plots \ ${ptdir} \ ${fgdir} \ ${scdir}/{tot,hea,pha,heb,bio}.pos end end The scatter plots made with colapsed letters still show the main sections as separate clusters, but touching each other.