ITRI-04-02

Marina Santini

A Shallow Approach To Syntactic Feature Extraction For Genre Classification

Proceedings of the 7th Annual Colloquium for the UK Special Interest Group for Computational Linguistics

In this paper, the shallow and computationally inexpensive approach to syntax suggested by ARGAMON et al. 1998 is explored, enhanced and applied to ten different genres included in the BNC. Their approach to syntax uses POS trigrams. The rationale behind this choice is that trigrams are large enough to encode useful syntactic information, and small enough to be computationally manageable. The sets of experiments described in this paper show that features representing syntactic structure have strong discriminating power. Results are extremely encouraging and deserve further investigations.