3.2.1 The Corpus Header
The corpus header provides the necessary bibliographical information about the corpus, the encoding principles and tags used (in brief), and the taxonomies used. We will only go through these briefly here; for the general set-up of corpus headers, cf. Burnard & Sperberg-McQueen.
- <filedesc>
File descriptioncontains bibliographical information such as title, editor/publisher, source material, place, date, extent, availability etc.
- <encodingdesc>
Encoding description: tag usagecontains the declaration of general editorial usage and all the tags employed in the corpus.
- <encodingdesc>
Encoding description: taxonomiesIt includes the classificatory systems used in the corpus, namely (i) the domain classification, (ii) the threefold sub-domain classification, and (iii) the decade structure. These latter are important in so far as in the text headers the short descriptors introduced here are used.
Both domains and decades, while easily recognizable via the text identifiers, are additionally encoded in the following way:
- decades: dec 1 (1640s), dec2 (1650s), dec 3 (1660s), dec4 (1670s), dec5 (1680s), dec6 (1690s), dec7 (1700s), dec8 (1710s), dec9 (1720s), dec10 (1730)
- domains: dom1 (economy), dom2 (law), dom3 (miscellaneous), dom4 (politics), dom5 (religion), dom6 (science).
- The sub-domain classification is based on the abbreviated domain and the numbers 1-3, e.g. ec1 = domestic economy and trade (cf. 2.3 above for the listing and the explanation).
Additionally, it provides spaces for for further, not strictly classificatory, information, such as the number of structural parts per text, the description of these textual parts, the number of authors of a text, as well as topic and genre keywords.
- <profiledesc>
Profile descriptioncontains a list of all the languages used in the Lampeter Corpus.
- <revisiondesc>
)contains a work report.
| -Contents- | ||