ITRI-03-10
Richard Power and Donia Scott and Nadjet Bouayad-Agha
Document structure
We argue the case for abstract document structure as a separate descriptive level in the analysis and generation of written texts. The purpose of this representation is to mediate between the message of a text (i.e., its discourse structure) and its physical presentation (i.e., its organisation into graphical constituents like sections, paragraphs, sentences, bulleted lists, figures, footnotes and so forth). Abstract document structure can be seen as an extension of Nunberg's 'text grammar'; it is also closely related to logical mark-up in languages like HTML and LATEX. We show that by using this intermediate representation, several subtasks in language generation and language understanding can be defined more cleanly.