<University of Brighton logotype>





ITRI seminars - Spring 2002

ITRI seminars generally take place 12 noon on Thursdays in room W107 on the first floor of the Watts Building, University of Brighton (Moulsecoomb site). Occasional deviations from this pattern are indicated below.

Information on how to find W107 is available on our contact page .

17 January
abstract

Ivandre Paraboni ITRI, University of Brighton
Generating references to document parts
22 January
abstract

Ehud Reiter Department of Computing Science, University of Aberdeen
Generating Summaries of Time Series Data
24 January
abstract

Caroline Stern Dept. of Languages & Literature, Ferris State University
Instructional Design and the second generation of online tutorials for information literacy
7 February
abstract

Yvonne Rogers COGS, University of Sussex
Reducing the overload of 'talk' on distributed team communication: augmenting the verbal channel with visual representations
The seminar by Yvonne Rogers has been postponed until further notice
14 February
abstract

Pat Hall Computing Department, Open University
The linguistic and cultural challenges of software localisation
21 February
abstract

Jane Oakhill Experimental Psychology, University of Sussex
Inferences from single words
7 March
abstract

Anne Deroeck Open University
Some thoughts on Arabic Language Processing - and why it is good for you
14 March
abstract

Diana McCarthy COGS, University of Sussex
Disambiguating Nouns, Verbs and Adjectives Using Automatically Acquired Selectional Preferences
14 March


The 2002 Distinguished Lecture in Artificial Intelligence and Cognitive Science
The Semantic Web by Professor James Hendler of the University of Maryland.
Time: 6:30 pm, Place: Westlain House Lecture Theatre.
For further information click here , or contact admin@itri.brighton.ac.uk (+44 1273 642900).





Previous ITRI seminars
Next Term ITRI seminars


See also NLP seminars at COGS, University of Sussex

Abstracts

Ivandre Paraboni
Generating References to Document Parts

Many documents are organised in hierarchically-structured components such as sections, subsections, items etc. These document parts may be referred to within the document itself (e.g., "See part B of section 2") for various purposes, such as stressing the importance of a document part, pointing to additional information etc. We will discuss the generation of references to document parts, a phenomenon we call document deixis (ddx). We will argue that due to the hierarchical structure of the domain ddx references may differ from more 'traditional' references to domain entities in a number of ways. For example, in the generation of ddx references it is necessary not only to provide a distinguishing description of the intended referent, but also information on how to find such referent in the hierarchical structure. We will discuss how a number of existing algorithms could be adapted for the task and present the results of an experiment suggesting that a new approach is called for.

Ehud Reiter
Generating Summaries of Time Series Data

The SumTime project at Aberdeen is investigating techniques for generating textual summaries of time-series data. I will give an overview of the project and compare data summarisation to the better known text summarisation task. I will then discuss in more detail the lexical work we have done in the project, which in particular suggests that different individuals often associate different meanings (denotations) with lexemes; for example two individuals may agree in abstract terms about what an "oscillation" is but disagree as to what actual signals can be described as an "oscillation". This has troubling implications for NLG - how can we describe numerical data when our readers may not agree on the meanings of the words used in the description?

Caroline Stern
Instructional Design and the second generation of online tutorials for information literacy

Information literacy (IL) is an emergent and important core skill in lifelong learning. Higher education is looking for creative ways to infuse information literacy instruction into an already overloaded curriculum. One way this is increasingly accomplished is through online, self-guided tutorials - some of which are in their second generation of development. While these Web-based teaching units are effective in limited ways, research shows that IL instructions should be cumulative, systematic, interactive, and have consequences for performance. How might the tutorials accomplish this? This seminar will offer a webliography of IL instruction tutorials currently available and briefly analyze the instructional design of second-generation, interactive online tutorials in light of how they have performed in lab-monitored settings. Discussion and resource sharing will be encouraged.

Pat Hall
The Linguistic and Cultural Challenges of Software Localisation

While software is produced mostly to work in English, with most of centres of software production themselves working in English, the market for this software is global, and yet only around 5% of the world's population are fluent in English. Software localisation is the process of translating software for use in other cultures using other languages. How can this be made an easy and cheap process to enable many more languages and cultures to benefit from localisation? This talk will focus on how language engineering can be marshalled to solve this problem.

The problems that software localisation is addressing will be described, together with the current methods of addressing this. These methods are over 30 years old, and do not exploit either current views of software construction, or linguistic knowledge. The potential for linguistic knowledge in addressing localisation problems will be discussed, from the construction of character encodings to human computer interaction using writing, to speech. This will be described in the context of current models of software architecture. Simple extensions to handle more general cultural variation will be discussed.

Jane Oakhill
Inferences from single words

Our current research aims to give an account of comprehension that both incorporates the insights of minimalism and the mental models theory. In particular, we aim to specify which of the potential inferences that do not contribute directly to a coherent interpretation of a text are made.

One class of inferences that may be made even when they are not necessary for establishing coherence, and hence that may be non-minimal, are those that are based on the semantic and pragmatic information associated with a single word. The studies to be reported explore inferences from with single words in three different contexts: stereotypical gender information, “anaphoric islands” and implicit causality associated with verbs.

Anne Deroeck
Some thoughts on Arabic Language Processing - and why it is good for you

Arabic (and other Semitic languages) present a particularly interesting challenge for language processing research. It has a complex, interdigitating morphology. Words sharing a root share an aspect of meaning. Arabic does not write vowels. Spelling conventions vary across a broad geographical region. There are no widely available electronic resources, including (until recently) text.

In February 2002, ELRA released our 18 million word Arabic dataset, covering 47,000 articles in 7 domains from the Al-Hayat newspaper. This was the third such dataset to be released within a year - events which bear witness to the burst of interest in the area. The availability of resources, together with a mature language engineering toolkit may encourage us to believe that Arabic language processing applications are a simple question of porting what we know to a different set of languages.

But there is reason for some caution. Most techniques have been developed and tested on Western European languages. There is evidence that some of the more successful rough and ready approaches do not fare well when confronted with Arabic.

In this talk, I will report on two aspects of this evidence. The first, briefly, concerns the results of some preliminary experiments with the Al-Hayat dataset. The second concerns our pre-occupation with stemming as a successful conflation technique for IR and shows why stemming Arabic is a dangerous thing. I will then motivate and explore an alternative to stemming and present an algorithm for clustering Arabic words sharing a root.

Diana McCarthy
Disambiguating Nouns, Verbs and Adjectives Using Automatically Acquired Selectional Preferences

Selectional preferences are frequently used by word sense disambiguation systems as one source of disambiguating information. We evaluate selectional preferences acquired for adjective and verb semantic classes, in adjective-noun, subject and direct object grammatical relationships on the SENSEVAL-2 test suite. We acquire these selectional preferences specific to verb or adjective classes, rather than forms, so that the preferences can be used to disambiguate the co-occurring adjectives and verbs, rather than just the nominal argument heads. The one sense per discourse heuristic is used to spread the sense tags to other word forms of the same part of speech occurring in the same document. The preferences perform well when compared to other unsupervised systems on the same test suite, however, the results confirm that for many applications other sources of knowledge would be required to achieve an adequate level of accuracy and coverage. We end by discussing possible ways of extending coverage and increasing precision using selectional preferences and anaphoric links.


Maintained by Paul Piwek (Paul.Piwek@itri.brighton.ac.uk ).
Last updated Spring 2002

©Information Technology Research Inst itute