Discovered that the smooth gradient in Denis's page counts is not surprising: since two of the counts dominate, and my routine normalizes them to unit sum, the data is inherently unidimensional. Here is an attempt to reorder the stars pages by hand so as to make the ratio count(-iin)/count(-in) more uniform: page words -iiin -iin -in -n ratio -------- ----- ----- ----- ----- ----- ----- f103r TA 526 0 33 41 2 0.446 f103v TB 454 1 34 37 4 0.479 f108r TK 494 0 39 22 1 0.639 f108v TL 581 0 52 39 1 0.571 f104r TC 448 1 66 17 1 0.795 f104v TD 477 3 59 24 0 0.711 f107r TI 487 4 93 30 1 0.756 f107v TJ 462 1 84 43 0 0.661 f114r TS 460 5 91 23 0 0.798 f114v TT 376 2 68 23 0 0.747 f106r TG 432 1 65 24 0 0.730 f106v TH 444 1 67 23 0 0.744 f113r TQ 528 4 79 21 0 0.790 f113v TR 502 5 84 20 0 0.808 f105r TE 379 6 48 1 1 0.980 f105v TF 399 5 85 4 0 0.955 f112r TO 401 3 32 21 0 0.604 f112v TP 420 7 60 33 1 0.645 f115r TU 461 1 40 21 2 0.656 f115v TV 410 2 32 33 0 0.492 f111r TM 623 1 44 51 0 0.463 f111v TN 568 1 41 113 6 0.266 f116r TW 554 1 25 90 8 0.217 f116v TW 0 0 0 0 0 0.000 Creating a picture of this sorted data: sort-distr -s 18 -n 4 -d -p - -r 0 | pnmscale 8 | ppmtogif > .stars-bh-dist.gif xv .stars-bh-dist.gif Another attempt: page words -iiin -iin -in -n ratio -------- ----- ----- ----- ----- ----- ----- f103r TA 526 0 33 41 2 0.446 f103v TB 454 1 34 37 4 0.479 f108r TK 494 0 39 22 1 0.639 f108v TL 581 0 52 39 1 0.571 f104r TC 448 1 66 17 1 0.795 f104v TD 477 3 59 24 0 0.711 f107r TI 487 4 93 30 1 0.756 f107v TJ 462 1 84 43 0 0.661 f113r TQ 528 4 79 21 0 0.790 f113v TR 502 5 84 20 0 0.808 f105r TE 379 6 48 1 1 0.980 f105v TF 399 5 85 4 0 0.955 f114r TS 460 5 91 23 0 0.798 f114v TT 376 2 68 23 0 0.747 f106r TG 432 1 65 24 0 0.730 f106v TH 444 1 67 23 0 0.744 f112r TO 401 3 32 21 0 0.604 f112v TP 420 7 60 33 1 0.645 f115r TU 461 1 40 21 2 0.656 f115v TV 410 2 32 33 0 0.492 f111r TM 623 1 44 51 0 0.463 f111v TN 568 1 41 113 6 0.266 f116r TW 554 1 25 90 8 0.217 f116v TW 0 0 0 0 0 0.000 sort-distr -s 18 -n 4 -d -p - -r 0 | pnmscale 8 | ppmtogif > .stars-h2-dist.gif xv .stars-h2-dist.gif Yet nother attempt: page words -iiin -iin -in -n ratio -------- ----- ----- ----- ----- ----- ----- f103r TA 526 0 33 41 2 0.446 f103v TB 454 1 34 37 4 0.479 f108r TK 494 0 39 22 1 0.639 f108v TL 581 0 52 39 1 0.571 f104r TC 448 1 66 17 1 0.795 f104v TD 477 3 59 24 0 0.711 f112r TO 401 3 32 21 0 0.604 f112v TP 420 7 60 33 1 0.645 f113r TQ 528 4 79 21 0 0.790 f113v TR 502 5 84 20 0 0.808 f105r TE 379 6 48 1 1 0.980 f105v TF 399 5 85 4 0 0.955 f114r TS 460 5 91 23 0 0.798 f114v TT 376 2 68 23 0 0.747 f106r TG 432 1 65 24 0 0.730 f106v TH 444 1 67 23 0 0.744 f107r TI 487 4 93 30 1 0.756 f107v TJ 462 1 84 43 0 0.661 f115r TU 461 1 40 21 2 0.656 f115v TV 410 2 32 33 0 0.492 f111r TM 623 1 44 51 0 0.463 f111v TN 568 1 41 113 6 0.266 f116r TW 554 1 25 90 8 0.217 f116v TW 0 0 0 0 0 0.000 sort-distr -s 18 -n 4 -d -p - -r 0 | pnmscale 8 | ppmtogif > .stars-h3-dist.gif xv .stars-h3-dist.gif Let's look at f58r/f58v too: foreach s ( n in iin iiin ) cat L16-eva/f58r.P | egrep '[^i]'"$s"'[-,. =]' > .f58r-$s.evt end page words -iiin -iin -in -n ratio -------- ----- ----- ----- ----- ----- ----- f058r HB 362 0 29 1 0 0.967 Let's have a closer look at the occurrences of "daiin" in the stars section: rm -f .daiin-stars.occs foreach f ( L16-eva/f{103,104,105,106,107,108,111,112,113,114,115,116}{r,v}.P* ) echo $f echo '# '$f >> .daiin-stars.occs cat $f | egrep '[-= ,.]daiin|^#' >> .daiin-stars.occs end Edited .daiin-stars.occs by hand, removing/adding adjacent words until each occurrence of "daiin" is on a separate line with 2 words on either side. Result: 208 occurrences of "daiin" in the stars section. Let's look also at "saiin": rm -f .saiin-stars.occs foreach f ( L16-eva/f{103,104,105,106,107,108,111,112,113,114,115,116}{r,v}.P* ) echo $f echo '# '$f >> .saiin-stars.occs cat $f | egrep '[-= ,.]saiin|^#' >> .saiin-stars.occs end Many of the "daiin" and "saiin" are at the beginning of a line (but not the first of the paragraph). Some of them are at the end of paragraph. These are the words that occur near "daiin": ct rfreq cfreq word ct rfreq cfreq word ct rfreq cfreq word -- ----- ----- ----------- -- ----- ----- ----------- -- ----- ----- ----------- 7 0.034 0.034 cheo 8 0.040 0.040 chedy 6 0.029 0.029 chedy 5 0.024 0.058 chedy 7 0.035 0.075 chey 5 0.024 0.053 okeey 5 0.024 0.082 qokeey 5 0.025 0.100 cheey 5 0.024 0.077 qokeey 3 0.014 0.096 oteo 5 0.025 0.125 shey 4 0.019 0.096 daiin 3 0.014 0.111 sheeo 3 0.015 0.140 al 3 0.014 0.111 ar 2 0.010 0.120 chckhy 3 0.015 0.155 ar 3 0.014 0.125 chol 2 0.010 0.130 chdy 3 0.015 0.170 daiin 3 0.014 0.139 lchedy 2 0.010 0.139 cheeo 3 0.015 0.185 sheol 3 0.014 0.154 lshey 2 0.010 0.149 chey 2 0.010 0.195 char 3 0.014 0.168 okaiin 2 0.010 0.159 daiin 2 0.010 0.205 chedar 3 0.014 0.183 qokaiin 2 0.010 0.168 dal 2 0.010 0.215 cheol 3 0.014 0.197 qokal 2 0.010 0.178 dalam 2 0.010 0.225 chl 3 0.014 0.212 qokeedy 2 0.010 0.188 keeo 2 0.010 0.235 okar 3 0.014 0.226 qotchedy 2 0.010 0.197 llchey 2 0.010 0.245 okeey 2 0.010 0.236 aiin 2 0.010 0.207 okal 2 0.010 0.255 ol 2 0.010 0.245 chedal 2 0.010 0.216 ol 2 0.010 0.265 otal 2 0.010 0.255 chodaiin 2 0.010 0.226 otal 2 0.010 0.275 otaral 2 0.010 0.264 dal 2 0.010 0.236 qokchedy 2 0.010 0.285 otedy 2 0.010 0.274 lkeey 2 0.010 0.245 qokeeal 2 0.010 0.295 oteey 2 0.010 0.284 lshedy 2 0.010 0.255 qokeeo 2 0.010 0.305 qokchdy 2 0.010 0.293 oky 2 0.010 0.264 qopchedy 2 0.010 0.315 qotalal 2 0.010 0.303 otar 2 0.010 0.274 sheey 2 0.010 0.325 shaiin 2 0.010 0.312 otedy 2 0.010 0.284 shockhy 2 0.010 0.335 shedy 2 0.010 0.322 oteey 2 0.010 0.293 ycheo 2 0.010 0.345 sheed 2 0.010 0.332 oteody 1 0.005 0.298 acthy 2 0.010 0.355 sheey 2 0.010 0.341 qodaiin 1 0.005 0.303 aiin 2 0.010 0.365 shek 2 0.010 0.351 qokar 1 0.005 0.308 ainkam 2 0.010 0.375 sheody 2 0.010 0.361 qokedy 1 0.005 0.312 al 2 0.010 0.385 shody 2 0.010 0.370 qokeol 1 0.005 0.317 alky 2 0.010 0.395 shol 2 0.010 0.380 qoky 1 0.005 0.322 alol 1 0.005 0.400 aiin 2 0.010 0.389 qoty 1 0.005 0.327 am 1 0.005 0.405 airols 2 0.010 0.399 saiin 1 0.005 0.332 ar 1 0.005 0.410 aky 2 0.010 0.409 sheey 1 0.005 0.337 aralary 1 0.005 0.415 alaiin 2 0.010 0.418 tchedy 1 0.005 0.341 archcthy 1 0.005 0.420 alal 2 0.010 0.428 teeedy 1 0.005 0.346 chcphydy 1 0.005 0.425 aldair 1 0.005 0.433 *asor 1 0.005 0.351 chdaly 1 0.005 0.430 alsar 1 0.005 0.438 akaiin 1 0.005 0.356 chea 1 0.005 0.435 aral 1 0.005 0.442 chckhaiin 1 0.005 0.361 chedaiin 1 0.005 0.440 aroteey 1 0.005 0.447 chcphedy 1 0.005 0.365 chedal 1 0.005 0.445 chckhy 1 0.005 0.452 chdar 1 0.005 0.370 chedyrl 1 0.005 0.450 chcthar 1 0.005 0.457 chdor 1 0.005 0.375 cheeey 1 0.005 0.455 chcthdy 1 0.005 0.462 chdy 1 0.005 0.380 cheey 1 0.005 0.460 chcthed 1 0.005 0.466 cheal 1 0.005 0.385 cheky 1 0.005 0.465 chcthy 1 0.005 0.471 chear 1 0.005 0.389 cheoda* 1 0.005 0.470 cheaiin 1 0.005 0.476 checkhey 1 0.005 0.394 cheody 1 0.005 0.475 cheal 1 0.005 0.481 checkhy 1 0.005 0.399 cheol 1 0.005 0.480 checkhy 1 0.005 0.486 cheeal 1 0.005 0.404 cheot 1 0.005 0.485 checthal 1 0.005 0.490 cheedy 1 0.005 0.409 chllkeey 1 0.005 0.490 ched 1 0.005 0.495 cheeky 1 0.005 0.413 cho 1 0.005 0.495 chedaiin 1 0.005 0.500 cheocthy 1 0.005 0.418 chockhey 1 0.005 0.500 chedal 1 0.005 0.505 cheodaiin 1 0.005 0.423 chodeeal 1 0.005 0.505 cheedy 1 0.005 0.510 chey 1 0.005 0.428 chody 1 0.005 0.510 cheeeo 1 0.005 0.514 chocthy 1 0.005 0.433 chotam 1 0.005 0.515 cheeir 1 0.005 0.519 chody 1 0.005 0.438 chotchedy 1 0.005 0.520 cheeteey 1 0.005 0.524 chokedair 1 0.005 0.442 chy 1 0.005 0.525 chekeek 1 0.005 0.529 choty 1 0.005 0.447 cphaiin 1 0.005 0.530 cheo 1 0.005 0.534 dchedy 1 0.005 0.452 dail 1 0.005 0.535 cheocthy 1 0.005 0.538 deeedy 1 0.005 0.457 dala 1 0.005 0.540 cheodaiin 1 0.005 0.543 dol 1 0.005 0.462 dched 1 0.005 0.545 cheodar 1 0.005 0.548 dsheeo 1 0.005 0.466 dcheo 1 0.005 0.550 cheolor 1 0.005 0.553 eedol 1 0.005 0.471 dchol 1 0.005 0.555 chkaiin 1 0.005 0.558 eeykeody 1 0.005 0.476 decthdy 1 0.005 0.560 choaiin 1 0.005 0.562 kair 1 0.005 0.481 eedy 1 0.005 0.565 chocfhdy 1 0.005 0.567 kal 1 0.005 0.486 kar 1 0.005 0.570 chody 1 0.005 0.572 kchdy 1 0.005 0.490 kchedy 1 0.005 0.575 chol 1 0.005 0.577 kchedy 1 0.005 0.495 kcheo 1 0.005 0.580 cholchey 1 0.005 0.582 keedal 1 0.005 0.500 keesho 1 0.005 0.585 chopchy 1 0.005 0.587 keeo 1 0.005 0.505 keol 1 0.005 0.590 chotaiin 1 0.005 0.591 kolkair 1 0.005 0.510 ky 1 0.005 0.595 chsd 1 0.005 0.596 lcheeol 1 0.005 0.514 l 1 0.005 0.600 ckheol 1 0.005 0.601 lechody 1 0.005 0.519 larorol 1 0.005 0.605 dal 1 0.005 0.606 lkar 1 0.005 0.524 lchedam 1 0.005 0.610 dam 1 0.005 0.611 lkeeol 1 0.005 0.529 lkaiiir 1 0.005 0.615 dar 1 0.005 0.615 lkol 1 0.005 0.534 lkal 1 0.005 0.620 daram 1 0.005 0.620 lky 1 0.005 0.538 lkam 1 0.005 0.625 daryom 1 0.005 0.625 oain 1 0.005 0.543 lkeeeady 1 0.005 0.630 dchdos 1 0.005 0.630 oar 1 0.005 0.548 lkeo 1 0.005 0.635 dchedar 1 0.005 0.635 ocheey 1 0.005 0.553 lkeol 1 0.005 0.640 dckhy 1 0.005 0.639 octhd 1 0.005 0.558 lklor 1 0.005 0.645 dshedal 1 0.005 0.644 odair 1 0.005 0.562 llod 1 0.005 0.650 lkchedy 1 0.005 0.649 okchey 1 0.005 0.567 lm 1 0.005 0.655 lor 1 0.005 0.654 okechey 1 0.005 0.572 lteedy 1 0.005 0.660 ochedaiin 1 0.005 0.659 okedy 1 0.005 0.577 ochedaiin 1 0.005 0.665 ockhedy 1 0.005 0.663 okeedaiin 1 0.005 0.582 ochedal 1 0.005 0.670 octhd 1 0.005 0.668 okeedy 1 0.005 0.587 ocheey 1 0.005 0.675 octhdy 1 0.005 0.673 okeeedy 1 0.005 0.591 ofam 1 0.005 0.680 octhy 1 0.005 0.678 okeeshy 1 0.005 0.596 ofar 1 0.005 0.685 ofchedaiin 1 0.005 0.683 okol 1 0.005 0.601 okaiin 1 0.005 0.690 okaiin 1 0.005 0.688 ol 1 0.005 0.606 okchedy 1 0.005 0.695 okairdy 1 0.005 0.692 oldaiin 1 0.005 0.611 okchey 1 0.005 0.700 okal 1 0.005 0.697 olkchey 1 0.005 0.615 okchy 1 0.005 0.705 okchey 1 0.005 0.702 olkeedaiin 1 0.005 0.620 okeedy 1 0.005 0.710 okedal 1 0.005 0.707 olkeeey 1 0.005 0.625 okey 1 0.005 0.715 okedy 1 0.005 0.712 olshy 1 0.005 0.630 oleedy 1 0.005 0.720 okeedaky 1 0.005 0.716 opailo 1 0.005 0.635 olkaey 1 0.005 0.725 okeedy 1 0.005 0.721 opchedaiin 1 0.005 0.639 olky 1 0.005 0.730 okey 1 0.005 0.726 opcheed 1 0.005 0.644 olr 1 0.005 0.735 olaiin 1 0.005 0.731 oraiin 1 0.005 0.649 oly 1 0.005 0.740 olam 1 0.005 0.736 otaiin 1 0.005 0.654 om 1 0.005 0.745 olkaiin 1 0.005 0.740 otair 1 0.005 0.659 opaiin 1 0.005 0.750 olkaiir 1 0.005 0.745 otarar 1 0.005 0.663 opaik 1 0.005 0.755 oly 1 0.005 0.750 otchedy 1 0.005 0.668 opalam 1 0.005 0.760 opairam 1 0.005 0.755 otchod 1 0.005 0.673 opam 1 0.005 0.765 opal 1 0.005 0.760 otechdy 1 0.005 0.678 opchdy 1 0.005 0.770 or 1 0.005 0.764 otedal 1 0.005 0.683 opchy 1 0.005 0.775 oraiin 1 0.005 0.769 oteor 1 0.005 0.688 or 1 0.005 0.780 otar 1 0.005 0.774 oteoy 1 0.005 0.692 oram 1 0.005 0.785 oteedaiin 1 0.005 0.779 pcheol 1 0.005 0.697 ore 1 0.005 0.790 oteedo 1 0.005 0.784 pchor 1 0.005 0.702 orkchdy 1 0.005 0.795 oteol 1 0.005 0.788 pdal 1 0.005 0.707 os 1 0.005 0.800 por 1 0.005 0.793 pdaro 1 0.005 0.712 osh*o 1 0.005 0.805 qkair 1 0.005 0.798 qckheey 1 0.005 0.716 oshey 1 0.005 0.810 qkeodaiin 1 0.005 0.803 qlky 1 0.005 0.721 otaiin 1 0.005 0.815 qoair 1 0.005 0.808 qoeedaiin 1 0.005 0.726 otaiinodaly 1 0.005 0.820 qoeedaiin 1 0.005 0.812 qoek 1 0.005 0.731 otaik 1 0.005 0.825 qoek 1 0.005 0.817 qokairar 1 0.005 0.736 otam 1 0.005 0.830 qofchdar 1 0.005 0.822 qokchdy 1 0.005 0.740 otar 1 0.005 0.835 qokaiin 1 0.005 0.827 qokchey 1 0.005 0.745 otary 1 0.005 0.840 qokchedy 1 0.005 0.832 qokechy 1 0.005 0.750 otaryly 1 0.005 0.845 qokchey 1 0.005 0.837 qokedar 1 0.005 0.755 otcham 1 0.005 0.850 qokeeo 1 0.005 0.841 qokeeey 1 0.005 0.760 otcheo 1 0.005 0.855 qokeeody 1 0.005 0.846 qokeeo 1 0.005 0.764 otchey 1 0.005 0.860 qokeey 1 0.005 0.851 qotaiin 1 0.005 0.769 oteeey 1 0.005 0.865 qopol 1 0.005 0.856 qotal 1 0.005 0.774 oteey 1 0.005 0.870 saiin 1 0.005 0.861 qotar 1 0.005 0.779 oteol 1 0.005 0.875 shal 1 0.005 0.865 qotchy 1 0.005 0.784 otey 1 0.005 0.880 shechy 1 0.005 0.870 qotear 1 0.005 0.788 oto 1 0.005 0.885 sheckhy 1 0.005 0.875 qoteey 1 0.005 0.793 pcha 1 0.005 0.890 shecthey 1 0.005 0.880 qoteody 1 0.005 0.798 pchal 1 0.005 0.895 shecthy 1 0.005 0.885 qoteol 1 0.005 0.803 pcheo 1 0.005 0.900 shedaiin 1 0.005 0.889 r 1 0.005 0.808 qckhey 1 0.005 0.905 sheeal 1 0.005 0.894 raiin 1 0.005 0.812 qekor 1 0.005 0.910 sheedy 1 0.005 0.899 rain 1 0.005 0.817 qodaiin 1 0.005 0.915 sheekchy 1 0.005 0.904 ralom 1 0.005 0.822 qokairy 1 0.005 0.920 sheeky 1 0.005 0.909 sair 1 0.005 0.827 qokam 1 0.005 0.925 sheet 1 0.005 0.913 sar 1 0.005 0.832 qokaram 1 0.005 0.930 sheor 1 0.005 0.918 saraiin 1 0.005 0.837 qokchy 1 0.005 0.935 shl 1 0.005 0.923 sheckhy 1 0.005 0.841 qokeedaram 1 0.005 0.940 tair 1 0.005 0.928 shedar 1 0.005 0.846 qokol 1 0.005 0.945 tchar 1 0.005 0.933 sheed 1 0.005 0.851 qopchdy 1 0.005 0.950 teodaiin 1 0.005 0.938 sheedy 1 0.005 0.856 qotaiin 1 0.005 0.955 ychedal 1 0.005 0.942 sheeky 1 0.005 0.861 qotam 1 0.005 0.960 ycheeo 1 0.005 0.947 sheeodar 1 0.005 0.865 qotar 1 0.005 0.965 ydaiin 1 0.005 0.952 sheeol 1 0.005 0.870 qotchy 1 0.005 0.970 ykchedy 1 0.005 0.957 solpchd 1 0.005 0.875 qotedar 1 0.005 0.975 ykeedan 1 0.005 0.962 tchar 1 0.005 0.880 qoteody 1 0.005 0.980 ykeedy 1 0.005 0.966 teeoar 1 0.005 0.885 qotey 1 0.005 0.985 yokoey 1 0.005 0.971 teody 1 0.005 0.889 qoty 1 0.005 0.990 ytam 1 0.005 0.976 ty 1 0.005 0.894 r 1 0.005 0.995 ytar 1 0.005 0.981 ykchedy 1 0.005 0.899 raiin 1 0.005 1.000 yteedy 1 0.005 0.986 ykeey 1 0.005 0.904 rodam 1 0.005 0.990 yteedy 1 0.005 0.909 rol 1 0.005 0.995 yteeody 1 0.005 0.913 ry 1 0.005 1.000 yteody 1 0.005 0.918 sham 1 0.005 0.923 shchy 1 0.005 0.928 sheal 1 0.005 0.933 sheoked 1 0.005 0.938 shey 1 0.005 0.942 shod 1 0.005 0.947 ssheo 1 0.005 0.952 tchedaiin 1 0.005 0.957 tedam 1 0.005 0.962 teeo 1 0.005 0.966 tolpchy 1 0.005 0.971 tsho 1 0.005 0.976 ycheeo 1 0.005 0.981 yka*om 1 0.005 0.986 ykcheo 1 0.005 0.990 ykeeo 1 0.005 0.995 ykeo 1 0.005 1.000 ysheo Second word before "daiin", sorted by shape: -- ---------------------------------------------------------------- 7 aiin 6 okaiin 6 qokaiin 2 lkaiin 6 chedy 5 lchedy 3 cheey 2 lfchedy 4 otar 3 qotar 4 oteedy 3 qokey 2 okeey 3 daiin 2 dair 3 dar 3 otchedy first 10 words account for 23% of all "daiin"s first 20 words account for 34% of all "daiin"s First word before "daiin", sorted by "shape" -- ---------------------------------------------------------------- 23 cheo sheeo cheeo chedy chdy chey llchey 9 qokeey qokeeo keeo 3 oteo 2 daiin 2 qokeeal 2 okal 2 otal 2 qokchedy first 10 words account for 15% of all "daiin"s first 20 words account for 25% of all "daiin"s First word after "daiin", Sorted by "shape" -- ---------------------------------------------------------------- 40 chedy chey cheey shey shedy sheed sheey sheody shody sheol cheol 8 al ar ol 3 daiin first 10 words account for 20% of all "daiin"s first 20 words account for 30% of all "daiin"s Second word after "daiin", Sorted by "shape" -- ---------------------------------------------------------------- 30 okeey qokeey qokeedy qotchedy qokedy qokeol lkeey otedy oteey oteody teeedy 16 chedy chedal lchedy lshey lshedy 7 qokal qokar otar 6 daiin qodaiin 6 okaiin qokaiin 6 oky qoky qoty 3 ar 3 chol 2 saiin first 10 words account for 18% of all "daiin"s first 20 words account for 30% of all "daiin"s These words are tentative members of the "daiin constellation: chedy chey shey cheo cheey okeey oteey qokeey qokeedy otedy chedar sheol chol okaiin qokaiin And these may be associate: al ar ol