I got the idea that the \s/ plume may be a stress mark (that shifts when the word is inflected). So I decided to redo the analysis for Portuguese, comparing o-accent with o-plain, etc.: cat port.wds \ | sed -e 's/[óô]/ó/g' \ | compare-contexts -lctx 0 -rctx 0 \ '[aeiouáéíóúàâêôü]*[o][aeiouáéíóúàâêôü]*' \ '[aeiouáéíóúàâêôü]*[ó][aeiouáéíóúàâêôü]*' 6143 0.93 o 125 0.99 ó 135 0.02 io 1 0.01 eó 112 0.02 ou ----- ---- ---- 75 0.01 oi 126 1.00 TOT 55 0.01 ao 36 0.01 eo 32 0.00 oo 12 0.00 aio 5 0.00 eio 4 0.00 oa 2 0.00 oá 2 0.00 oe ----- ---- ---- 6613 1.00 TOT cat port.wds \ | sed -e 's/[áâ]/á/g' \ | compare-contexts -lctx 0 -rctx 0 \ '[aeiouáéíóúàâêôü]*[a][aeiouáéíóúàâêôü]*' \ '[aeiouáéíóúàâêôü]*[á][aeiouáéíóúàâêôü]*' 6878 0.87 a 258 0.75 á 448 0.06 ia 84 0.24 iá 224 0.03 ua 2 0.01 uá 134 0.02 ai 2 0.01 oá 69 0.01 ea ----- ---- ---- 55 0.01 ao 346 1.00 TOT 23 0.00 uai 23 0.00 au 12 0.00 aio 8 0.00 iai 4 0.00 éia 4 0.00 oa 3 0.00 ía 3 0.00 eia 3 0.00 eai 2 0.00 aí 2 0.00 aue 2 0.00 ae ----- ---- ---- 7897 1.00 TOT cat port.wds \ | sed -e 's/[êé]/é/g' \ | compare-contexts -lctx 0 -rctx 0 \ '[aeiouáéíóúàâêôü]*[e][aeiouáéíóúàâêôü]*' \ '[aeiouáéíóúàâêôü]*[é][aeiouáéíóúàâêôü]*' 7385 0.89 e 651 0.97 é 420 0.05 ue 8 0.01 üé 196 0.02 ie 7 0.01 ié 117 0.01 ei 4 0.01 éia 69 0.01 ea 1 0.00 éi 61 0.01 eu ----- ---- ---- 36 0.00 eo 671 1.00 TOT 5 0.00 eio 3 0.00 eia 3 0.00 eai 2 0.00 üe 2 0.00 oe 2 0.00 aue 2 0.00 ae 1 0.00 eó 1 0.00 eí 1 0.00 eiú 1 0.00 ee ----- ---- ---- 8307 1.00 TOT These tables are skewed by the short words "a", "é", "e", "o", etc. So let's require at least one more letter after the accent: cat port.wds \ | sed \ -e 's/[áâ]/á/g' \ -e 's/^/_/g' \ -e 's/$/_/g' \ | compare-contexts -lctx 1 -rctx 0 \ '[a][aeiouáéíóúàâêôü]*[a-záéíóúàâêôü]' \ '[á][aeiouáéíóúàâêôü]*[a-záéíóúàâêôü]' 284 0.06 par 73 0.25 ián 211 0.04 _ar 36 0.12 _án 151 0.03 _as 29 0.10 _ár 148 0.03 cad 15 0.05 tán 144 0.03 lad 12 0.04 rár 128 0.03 tas 12 0.04 rám 116 0.02 dad 11 0.04 sár 111 0.02 tan 10 0.03 ráf 110 0.02 _ap 10 0.03 cál 101 0.02 rad 8 0.03 iáv 95 0.02 das 7 0.02 mát 77 0.02 lar 7 0.02 lás 76 0.02 fac 6 0.02 vár 74 0.02 as 5 0.02 jáv 72 0.02 _al 5 0.02 fác 70 0.01 tam 4 0.01 táv 68 0.01 nal 4 0.01 ráp 66 0.01 mais 4 0.01 pán 61 0.01 ual 4 0.01 nál 58 0.01 tad 3 0.01 máx 56 0.01 cas 3 0.01 lát 55 0.01 ram 3 0.01 iám 48 0.01 car 3 0.01 cáv 47 0.01 tar 2 0.01 uár 46 0.01 nas 2 0.01 tár 46 0.01 mas 2 0.01 rát 46 0.01 ias 1 0.00 tág 46 0.01 _ao 1 0.00 ráv 45 0.01 cal 1 0.00 oáv 44 0.01 tal 1 0.00 oác 43 0.01 ian 1 0.00 nár 41 0.01 am 1 0.00 láv 40 0.01 nad 1 0.00 háv 39 0.01 ran 1 0.00 dáv 37 0.01 sam 1 0.00 cán 36 0.01 lag 1 0.00 bás 36 0.01 ial 1 0.00 _át 35 0.01 _ad ----- ---- ---- 34 0.01 val 291 1.00 TOT 34 0.01 ras ... .... ..... 1 0.00 ab ----- ---- ---- 4785 1.00 TOT Too many cases, let's reduce them: cat port.wds \ | sed \ -e 's/[áâ]/á/g' \ -e 's/^/_/g' \ -e 's/$/_/g' \ | compare-contexts -lctx 0 -rctx 0 \ '_[a][aeiouáéíóúàâêôü]*[bcdfghjklmnpqrstvwxyz]' \ '_[á][aeiouáéíóúàâêôü]*[bcdfghjklmnpqrstvwxyz]' 211 0.27 _ar 36 0.55 _án 151 0.19 _as 29 0.44 _ár 110 0.14 _ap 1 0.02 _át 72 0.09 _al ----- ---- ---- 35 0.05 _ad 66 1.00 TOT 32 0.04 _at 31 0.04 _an 28 0.04 _ac 26 0.03 _ab 15 0.02 _am 13 0.02 _aj 9 0.01 _aos 8 0.01 _aut 7 0.01 _ain 6 0.01 _av 6 0.01 _aq 6 0.01 _af 5 0.01 _aum 4 0.01 _ag 2 0.00 _aux ----- ---- ---- 777 1.00 TOT cat port.wds \ | sed \ -e 's/[óô]/ó/g' \ -e 's/^/_/g' \ -e 's/$/_/g' \ | compare-contexts -lctx 0 -rctx 0 \ '_[o][aeiouáéíóúàâêôü]*[bcdfghjklmnpqrstvwxyz]' \ '_[ó][aeiouáéíóúàâêôü]*[bcdfghjklmnpqrstvwxyz]' 154 0.39 _os 7 0.70 _ót 68 0.17 _or 2 0.20 _ób 54 0.14 _ob 1 0.10 _ór 39 0.10 _out ----- ---- ---- 26 0.07 _ot 10 1.00 TOT 26 0.07 _on 16 0.04 _op 9 0.02 _oc 2 0.01 _oit ----- ---- ---- 394 1.00 TOT Not very impressive. However, that is not surprising, considering that most accents (especially those that shift) are already omitted by the default stress rules.