ITRI-04-04

Thapelo Otlogetswe

The BNC Design as a Model for a Setswana Language Corpus

Proceedings of CLUK'04, Birmigham, UK, pp.193-198

The design of the BNC provides an attractive model of corpus construction for other languages. In this paper I outline the BNC model and its limitations and sketch how it could be used for Setswana corpus. I argue that one of the approaches to looking at the BNC text types and genres is by studying frequency distribution of lexical items. Frequency distributions are significant since they may lead corpus researchers to make informed generalizations about the language. I then suggest a pilot study of frequency lists and how it can inform the structure of the Setswana corpus.