Supported by CNPq, the Brazilian Reserach Council.
This thesis concerns the computational generation of referring expressions, which is one of the key components of any Natural Language Generation System. Unlike other work in this area, we focus on references to elements of a hierarchically ordered domain. For concreteness, we pay particular attention to one such domain, namely a document and its parts (sections, subsections, etc.) These document parts may be referred to for various purposes, for example to relate two different document parts (as in "see also section 7"), and we call these referring expressions instances of Document Deixis. We discuss how to determine the semantic content of document-deictic descriptions and argue that none of the existing algorithms for the generation of referring expressions are directly applicable to the problem, given that it is necessary to use hierarchical information to make such references easier to "resolve". We propose a number of implemented algorithms for this task, some of which add redundant information to make reference resolution easier. We report a psycholinguistic experiment aimed at deciding whether adding logically redundant information is actually appreciated by readers/writers, and how much redundancy is best. We also discuss when to generate document-deictic descriptions in the context of a Natural Language Generation system in which these descriptions are not specified as part of the input. We propose a number of strategies for Document Deixis generation, some of which are implemented and evaluated by a second psycholinguistic experiment. This study allows us to conclude that references in hierarchically ordered domains require the realisation of logically redundant information in order to facilitate the search for the referent. It also illustrates how a Natural Language Generation system can be adapted to the task of generating instances of Document Deixis.