Lynne Cahill, ITRI
Automatic and semi-automatic lexical development

The manual construction of lexicons for NLP systems is notoriously labour-intensive and therefore expensive. Moreover, lexicons developed for one application are rarely suitable for use in other applications, so that this effort is expended for limited use. There are three obvious solutions to this problem: (1) produce lexicons that are more generally useful for different NLP applications; (2) produce lexicons that can be readily adapted for different uses; (3) produce tools that permit (semi-)automatic generation of new lexicons.

In this talk I shall discuss some of the possible ways of achieving each of these and present an approach to lexicon development that attempts to combine the three aims. In this approach, lexicons for individual applications can be automatically generated from a higher-level representation of the information. This higher-level representation fulfils the first two aims above, being both more general than the lexicons required for individual applications and more easily extended. I shall illustrate the approach with examples from two very different lexicons, one of which was developed to demonstrate the sharing of (primarily phonological) information across closely related languages and the other of which was developed for use in a multilingual NLG system.