<University of Brighton logotype>



ITRI seminars - Spring 2001

ITRI seminars generally take place 12 noon on Thursdays in room W107 on the first floor of the Watts Building, University of Brighton (Moulsecoomb site). Occasional deviations from this pattern are indicated below.

Information on how to find W107 is available on our contact page.

25 Jan
abstract
Duska Rosenberg Royal Holloway, University of London
Languages in Multimedia: Common-ground framework for investigating the role of natural language interfaces in computer-mediated communication
8 Feb

Mark Steedman University of Edinburgh
Uses of Prosody in Spoken Language Processing
22 Feb
abstract
Andrea Setzer University of Sheffield
Annotating Events and Temporal Information in Newswire Texts
15 Mar
abstract
Fabio Ciravegna Department of Computer Science, University of Sheffield
User-driven Adaptive Information Extraction from Internet-related Text

Previous ITRI seminars
See also NLP seminars at COGS, University of Sussex

Abstracts

Duska Rosenberg
Languages in Multimedia: Common-ground framework for investigating the role of natural language interfaces in computer-mediated communication

My current research is a collaboration with architects and urban planners who have developed models for the design of physical workplace. We're now working on the design of location-independent workplace where ICT plays a key role in supporting mobile workers. The first issue we're addressing involves the study of interaction in three types of space : "cloister" which requires privacy and solitude, "club" where meetings among selected members take place and "cafe" where everyone can join. My own contribution involves the study of language use in different kinds of space, and in particular, what informational resources are normally available for people to establish the common ground in such spaces. The theoretical framework I'm using is based on the common ground developed by Kartunnen and Peters and also by Clark, but involves some non-trivial extensions. I'm currently working with Peters and Ginzburg on adapting situation semantics for the study of communication.

Andrea Setzer
Annotating Events and Temporal Information in Newswire Texts

If one is concerned with natural language processing applications such as information extraction (IE), which typically involve extracting information about temporally situated scenarios, the ability to accurately position key events in time is of great importance. To date only minimal work has been done in the IE community concerning the extraction of temporal information from text, and the importance, together with the difficulty of the task, suggest that a concerted effort be made to analyse how temporal information is actually conveyed in real texts. To this end we have devised an annotation scheme for annotating those features and relations in texts which enable us to determine the relative order and, if possible, the absolute time, of the events reported in them. Such a scheme could be used to construct an annotated corpus which would yield the benefits normally associated with the construction of such resources: a better understanding of the phenomena of concern, and a resource for the training and evaluation of adaptive algorithms to automatically identify features and relations of interest. We also describe a framework for evaluating the annotation and compute precision and recall for different responses.

Fabio Ciravegna
User-driven Adaptive Information Extraction from Internet-related Text

In the last years, the increasing importance of the Internet has stressed the central role of texts such as emails, Usenet posts and Web pages. In this context, linguistically intensive approaches as used in classical IE systems (e.g. [Hobbs97], [Humphreys98], [Grishman98], [Ciravegna00]) are difficult or unnecessary. Information carried by extralinguistic structures (e.g. HTML tags, document formatting, and stereotypical language) is more relevant and easy to use than deep linguistic knowledge. For this reason a new research stream on adaptive IE has arisen at the convergence of NLP, Information Integration and Machine Learning. The goal is to produce IE algorithms and systems adaptable to new Internet-related applications/scenarios by using only analyst's knowledge (i.e. knowledge on the domain/scenario itself) [Kushmerick 1997], [Califf 1998], [Muslea 1998], [Freitag 1999], [Soderland 1999], [Freitag 2000]. Such algorithms are easy to adapt to new applications and very effective when applied on highly structured HTML pages. Unfortunately they tend to be less effective on less structured texts (e.g. free texts). In our opinion this is because most successful algorithms make scarce (or no) use of NLP, tending to avoid any generalization over the flat word sequence. When they are applied to unstructured texts, data sparseness becomes a problem.

This paper presents LP-2, an adaptive IE algorithm designed in this new stream of research that makes use of shallow NLP in order to overcome data sparseness when confronted with NL texts, while keeping effectiveness on highly structured texts.

LP-2 has a considerable success story. From a scientific point of view, experiments report excellent results with respect to the current state of the art on some publicly available corpora. From an application point of view, a successful industrial IE tool has been based on it. Real world applications have been developed and licenses have been released to some commercial companies for building other applications.

In this talk I will first introduce the algorithm, discuss experimental results and show how the algorithm compares successfully with the current state of the art on semi-structured texts. The role and importance of shallow NLP for overcoming data sparseness will also be discussed. Then I will present my experience in designing and delivering LearningPinocchio, an industrial system for adaptive IE based on LP-2. Finally I will describe my research agenda for user-driven adaptive IE for the next years.


Maintained by Adam Kilgarriff (Adam.Kilgarriff@itri.brighton.ac.uk).
Last updated Tuesday March 13 2001

©Information Technology Research Inst itute