Hacking at the Voynich manuscript - Side notes 063 Correlations between first and last letters across spaces (See Notes/663 for the former obsolete Notes/063) This note investigates the frequencies of first and last letters of consecutive words in the Starred Parags section (SPS) of the VMS and in the modern Mandarin pinyin version of the Shennong Bencao Jing (SBJ). SETUP ln -s ../.. work ln -s work/convert_pinyin_to_numeric.py ln -s ../077 ln -s 077/convert_starps_raw_to_lin_ivt.py ln -s 077/convert_starps_lin_to_par_ivt.py ln -s ../074 THE SPS FILE The SPS file "in/2026-06-29-starps-wc.ivp" is derived of my own transcription "../074/star25e1.ivt" as of 2026-03-15 15:59:31. It is all lowercase with no ligature brackets {}, no online comments , no locus IDs , no alignment markers [-«»], no parag markers <%> <$>, and with all weirdos and strange characters converted to '?'. It has one parag per line, with all word separators [-,.] converted to single blanks, and one blank at start and enf of each line. The encoding is nominally Uicode in UTF-8 but it actually uses only ascii characters. There is another version "in/2026-06-29-starps-wp.ivp" that is obtained in the same way but with commas deleted, so that only [-.] become word breaks. However, most statistics are done on the "-wc" version. create_starps_raw_file.sh 2026-06-29 utype "wc" 330 parags 11205 tokens 2850 lexemes entropy (min ct = 1): 2850 11205 9.5603 1 entropy (min ct = 2): 941 9296 8.4917 1 entropy (min ct = 3): 608 8630 8.0996 1 utype "wp" 330 parags 9892 tokens 3323 lexemes entropy (min ct = 1): 3323 9892 10.0786 1 entropy (min ct = 2): 923 7492 8.6547 1 entropy (min ct = 3): 567 6780 8.1729 1 THE SBJ FILE The SBJ file "in/2026-06-27-bencao-py.utf" is derived from two files obtained from the internet (one from the Chinese Texts Project, one from the Chinese Wikisource), with many corrections, converted to modern Mandarin pinyin by Google Translate. (It is not the right version as would be seen in the white-on-black text of the Zhenghe Bencao in 1400 CE. In particular, it usually has [zhǔ zhì] instead of just [zhǔ]. The vowels in this file are "[aeiouü]". Tones are indicated by diacritics (acute, grave, macron, and caron) on vowels, including on "ü". The encoding is Unicode in UTF-8. ofile="in/2026-06-27-bencao-py.utf" wfile="in/.wfile-sbj" cat ${ofile} \ | egrep -v -e '^[ ]*([#]|$)' \ | tr ' ' '\012'| egrep -e '.' \ | sort | uniq -c \ > ${wfile} for n in 1 2 3 ; do printf "entropy (min ct = ${n}): " 1>&2 cat ${wfile} \ | gawk -v n=${n} '//{ if ($1 >= n) { printf "%6d 1 %s\n", $1, $2 }}' \ | work/compute_cond_entropy.gawk \ 1>&2 done entropy (min ct = 1): 13266 7.5340 1 entropy (min ct = 2): 13134 7.4577 1 entropy (min ct = 3): 12982 7.3797 1 SCRIPTS The following scripts do the counting of initial-final letters: compute_abs_counts.sh compute_cond_counts.sh To run them all, run "do_note_063.sh".