ITRI-03-16

A. Kilgarriff

What computers can and cannot do for lexicography, or Us precision, them recall

Also published in Proceedings of ASIALEX

Computers are good at recall, people are good at precision; that is, computers are good at finding a large set of possibilities, people are good judges of which possibilities are appropriate. Conversely, people are bad at recall and computers are bad at precision; it is hard for people to think, unprompted, of lots of possibilities, and it is hard for computers to work out which candidate answers are good ones. This points to a straight forward division of duties Computer proposes, human disposes. This division of duties is relevant in a number of areas of human-computer interaction, and lexicography is one. For lexicography, the items in question are facts about a word, and they are %91right%92 if they are the facts that are wanted in the dictionary. A fact about a word may be a collocation, a grammatical pattern, a synonym, an antonym, a set or semi-set phrase, an idiom, a domain, a sense, or a translation. All of these can be (and have been) found by computer, with varying degrees of accuracy and completeness. In this paper I first sketch the history of the corpus as a source of lexicographic evidence and then present "word sketches", which use a corpus to propose a set of facts about a word%92s grammatical and collocational behaviour. I then outline the work that has been done within computational linguistics towards identifying facts of each of the varieties listed above. I conclude with a consideration of the prospects for roles of people and computers within a wider socio-cultural perspective.