The East African component of the International Corpus of English

Corpus design

The East African component of the International Corpus of English (ICE-EA) is a database of written and spoken material from Kenya and Tanzania. All material was collected in the period between 1991 and 1996. The spoken component consists of (inter alia) transcriptions of informal conversations, classroom discussions, broadcast discussions and interviews and monologues. The written component includes newspaper articles, short stories and novels, informal and business letters, academic articles, monographs and other material. When compiling this corpus we followed the ICE stipulations as closely as possible to make a comparison with corpora of other varieties possible. At some points a few minor modifications of categories were necessary for two reasons: For example, it was difficult to acquire a sufficient number of texts for the natural science category because there are simply not enough monographs of this sort written and published by East Africans. English radio programmes can be recorded in both countries, but there are far fewer listeners in Tanzania because Swahili is the preferred language in every day conversations.

Another important modification of this corpus is its size: whereas the ICE guidelines only require 200 texts (i.e. 400,000 words) in the written components of each subcorpus, we decided to compile parallel written corpora for Kenya and Tanzania, thus producing 800,000 words of written data for East African English. One of the reasons for doing this was that the linguistic situation in Kenya and Tanzania differs to such an extent that we considered it to be necessary to represent both varieties of written English fully. English is, certainly, a second language variety in Kenya (ESL) but classification of the position of the language in Tanzania is not clear-cut. In both countries, English is prestigious, being the language of secondary and tertiary education and the High Court, but Kiswahili is the language used in Parliament and government institutions in Tanzania. English serves as a lingua franca between people of various ethnic groups in Kenya, particularly in bigger towns, and the capital, Nairobi, but Kiswahili fulfils this function in Tanzania.

The accompanying manual provides a comprehensive description of all text categories with special reference to ICE-East Africa, in addition to lists of all the texts used, background information on the speakers and authors where this could be obtained. The manual is published with the corpus on the new ICAME CD-ROM.