ITRI-04-10

Adam Kilgarriff

How Dominant is the Commonest Sense of aWord?

Also published in TSD 2004, Text, Speech and Dialogue 7th International Conference, Brno, Czech Republic, September 2004

We present a mathematical model of word sense frequency distributions, and use word distributions to set parameters. The model implies that the expected dominance of the commonest sense rises with the number of corpus instances, and that, particularly for commoner words, highly uneven distributions are to be expected much more often than even ones. The model is compared with the limited evidence available from SEMCOR. The implications for WSD and its evaluation are discussed.