<University of Brighton logotype>



ITRI seminars - Summer 2002

ITRI seminars generally take place 12 noon on Thursdays in room W107 on the first floor of the Watts Building, University of Brighton (Moulsecoomb site). Occasional deviations from this pattern are indicated below.

Information on how to find W107 is available on our contact page.

2 May
abstract
Kees van Deemter ITRI, University of Brighton
Generating complex referring expressions: the embarrassment of riches
9 May
abstract
Bob Ladd Department of Theoretical and Applied Linguistics, University of Edinburgh
"Segmental anchoring" of F0 landmarks: implications for speech technology?
16 May
abstract
Massimo Zancanaro Cognitive and Communication Technologies Division, IRST, Istituto per la Ricerca Scientifica e Tecnologica, Italy
Building Adaptive Information Presentations from Existing Information Repositories
23 May
abstract
Gabriela Cavaglia ITRI, University of Brighton
Measuring homogeneity of different language varieties
6 June
abstract
Igor Aleksander Dept. of Electrical and Electronic Engineering, Imperial College
Neuromodelling as a basis for Artificial Intelligence
13 June
abstract
Lynne Cahill and Roger Evans ITRI, University of Brighton
GREG: Developing a multilingual valency lexicon for Georgian, Russian English and German
20 June
abstract
Massimo Poesio Department of Computer Science, University of Essex
Acquiring Lexical Knowledge for Anaphora Resolution
27 June
abstract
Patrick Hanks Lexicographer
The probable and the possible: Lexicography in the age of the internet

Previous ITRI seminars
Next Term ITRI seminars
See also NLP seminars at COGS, University of Sussex

Abstracts

Kees van Deemter
Generating complex referring expressions: the embarrassment of riches

Until recently, Generating Referring Expressions meant collecting (atomic) properties that identify an object uniquely when they are conjoined. Recent work has shown how this issue can be generalized if a larger range of referential devices is used, including negation and disjunction for example, if relations between objects are taken into account, and if the referent can be a set as well as an individual object. But these extensions lead to an `embarrassment of riches', for not only do they force the generator to choosen between a much larger number of noun phrase patterns than before, but the choice between two patterns is not always easy.

This talk will (1) explain some of my recent work in this area, (2) discuss some new questions that this work gives rise to, and (3) sketch some tentative ideas for tacking these questions which have arisen in collaboration with Emiel Krahmer of Tilburg University.

Bob Ladd
"Segmental anchoring" of F0 landmarks: implications for speech technology?

I will review a range of work from the past decade showing that F0 landmarks (e.g. local minima and maxima) are quite precisely "anchored" in time to points in the segmental string (e.g. onset of stressed syllable). The details of this anchoring differ in specifiable ways from language to language, and depend on various aspects of phonological structure such as the distinction between long and short vowels. Moreover, the clear conclusion from this work is that pitch accents - local rises and falls at prominent syllables - are multiply anchored (i.e. have at least two anchor points) and therefore do not have fixed slope or duration. These findings are inconsistent with the way pitch accents are normally modelled for speech synthesis (by e.g. Fujisaki or Taylor). However, it is an open question whether more accurate modelling of such phonetic detail wouldimprove prosodic aspects of speech technology.

Massimo Zancanaro
Building Adaptive Information Presentations from Existing Information Repositories

In the literature, a distinction is often made between adaptive and dynamic hypermedia. The former exist prior to and independently from their user; a user model is then employed to hide part of the structure (or to highlight another part) to better support the user in the exploration of the content. Fully dynamic hypermedia, on the other hand, do not exist until the very moment a user explores them; they are dynamically created on the fly using automatic text generation techniques.

This talk will introduce an approach, called Macronodes, for adaptive reusing of existing multimedia repositories. This approach tries to blur the distinction between adaptive and dynamic hypermedia aiming at finding an optimal trade-off between resources reuse and flexibility. The Macronode system dynamically builds a node of the hypermedia, composing previously annotated pieces of data. This composition process exploits discourse strategies and linguistic rules to both introduce flexible content selection and control over the linguistic realization.

For some classes of applications, adaptation of existing linguistic or multimedia material is more appropriate than natural language generation. The pros and cons of adaptive hypermedia techniques in the context of information presentations for museums will be discussed with reference to some case studies.

Finally, some attempts to effectively integrate a natural language generation engine in the Macronode architecture will be presented. NLG allows the generation of the most dynamic parts of the presentation (i.e. referring expressions, comparisons, etc.), while the Macronode machinery enriches the content and the phrasing yet allowing for personalization.

Gabriela Cavaglia
Measuring homogeneity of different language varieties

With the ever more widespread use of corpora in language research, the need for methods for corpus profiling arises both from theoretical (describe and compare corpora) and practical points of view (corpus design and porting problem). Corpus homogeneity and similarity are essential steps of corpus profiling: while corpus homogeneity concerns the ability to identify documents that belong to the same language variety, corpus similarity represents a step forward and concerns the ability to identify if the same language varieties are represented in different corpora.

Producing homogeneity and similarity measures, in a more or less direct way, is not difficult anymore: the measures are all variants of a four-step process based only on document-internal linguistic features and inter-document distance, used in various areas of text processing. The main problem related to corpus homogeneity and similarity measures is their evaluation because of the lack of gold-standard judgments with which the measures can be compared. A methodology to evaluate the measures using an NLP application is proposed.

Igor Aleksander
Neuromodelling as a basis for Artificial Intelligence

In some areas of cognitive processing (e.g. 'understanding' a visual world) the performance of the brain appears be ahead of what can be done using conventional AI modelling methods. I shall describe how neuromodelling can provide a new computational route and illustrate this with work done on visual awareness in robots and visual awareness deficits in Parkinson's Disease sufferers. I shall speculate on the application of neuromodelling in visual awareness schemes that involve natural language.

Lynne Cahill and Roger Evans
GREG: Developing a Multilingual Valency Lexicon for Georgian, Russian, English and German

The GREG (Georgian, Russian, English, German) project was a two-year EU project funded under the INTAS-Georgia initiative, which aims to provide research support and technology transfer to academic and commercial organisations in Georgia. The goal of GREG was to develop a multilingual valency lexicon, that is, a lexicon containing subcategorisation and thematic role information for 1000 verbs in each of the four languages. The project primarily funded partners at Tblisi State University and the Institute of Linguistics, Georgian Academy of Sciences to develop this lexicon, with advisory support and technology transfer provided by the Universities of Stuttgart and Brighton.

This talk will present the project from Brighton's perspective. Our two main roles in the project were to advise on appropriate sampling procedures to establish a balanced set of verbs to encode, and to provide a formal framework and implementation in DATR for the lexicon itself. In the talk we will give an overview of the project as a whole, and then we will describe the development of the framework itself, which turned out to be a more substantial (and interesting) theoretical and practical task than we had originally anticipated.

Massimo Poesio
Acquiring Lexical Knowledge for Anaphora Resolution

The lack of adequate bases of commonsense or even lexical knowledge is perhaps the main obstacle to the development of high-performance, robust tools for semantic interpretation (except for cases like pronoun interpretation, where a lot can be achieved on the basis of syntactic information only). It is also generally accepted that, notwithstanding the increasing availability in recent years of substantial hand-coded lexical resources such as WordNet and EuroWordNet, addressing the commonsense knowledge bottleneck will eventually require the development of effective techniques for acquiring such information automatically, e.g., from corpora. The goal of our research is to improve the performance of anaphora resolution systems by acquiring the commonsense knowledge require to resolve the more complex cases of anaphora, such as bridging references. We also hope to acquire in the process insights into what kind of commonsense knowledge is actually needed for the task. In this talk, I will discuss several versions of a system for resolving definite descriptions, and how automatically acquired lexical information has been used.

(Joint work with Tomonori Ishikawa, Sabine Schulte im Walde, and Renata Vieira)

Patrick Hanks
The probable and the possible: Lexicography in the age of the internet

Lexicographers in the Age of the Internet have access to an unparalleled wealth of evidence for words in use. But this wealth brings with it new problems, unsettling the comfortable certainties of received 19th-century wisdom. Corpus evidence presents challenges to received lexicographic wisdom in many ways, e.g.: And, more broadly: The talk addresses at least some of these questions.


Maintained by Paul Piwek (Paul.Piwek@itri.brighton.ac.uk ).
Last updated Saturday August 24 2002

©Information Technology Research Inst itute