3.1.7 Typographical and Layout Features
This part of the mark-up is mostly concerned with respresenting the layout appearance of the source text as faithfully as possible. This includes marking typeface changes, page breaks and catchwords if necessary, extraordinary line breaks, as well as foreign passages.
Typeface changes highlight words or passages by setting them visually apart from the surrounding text. This can have a number of functions, e.g. emphasis, or be just a conventionalized procedure, as in the case of country and nationality names, for instance, which are always typographically differentiated from the running text in the Lampeter Corpus. We have tried to avoid making an attribution of function in typeface change cases and therefore have encoded only the typeface used as such. Typeface changes can either be indicated with a tag element of their own, or via the rend attribute in another element, e.g. <head rend="it"> for an italicized headline. Typeface change indicators always have to be seen against the background of the environment, such as the global rend attribute in the <text> element, orrend attributes in surrounding <div>, <p> or similar elements. The following typefaces are found in the corpus:
- <bo>
contains text in bold face.- <go>
contains text written in gothic type.- <it>
contains italicized text.- <ro>
contains text in roman typeface.- <sc>
contains text printed in small capitals.Foreign elements are usually also typographically prominent, be it because they co-occur with a typeface change or because they are written in a foreign script. These passages are encoded within the element <foreign>, which contain the attributes:
- rend, if a typeface change of the above kind occurs.
- lang specifies the language used. Languages found in the Lampeter Corpus (and the abbreviations used in the attribute) are Latin (LAT), Greek (GK), Hebrew (HB), French (FRA), German (GER), Dutch (DUT), Spanish (SPA), Italian (ITA), Swedish (SWE), Turkish (TUR), and Arabic (ARA). With Greek and Hebrew, the passage can either be represented in roman script in the source text, or it can be given in the original Greek or Hebrew letters. In the latter case, it has been transliterated into roman script for the electronic version, but the original situation is indicated by specifying GKGK (i.e. Greek in Greek script) and HBHB respectively in thelang attribute.
With the element <foreign> there are of course borderline cases where it can be difficult to decide whether a term is really foreign or already integrated into the English language - especially if the decision is to be made for a linguistic period of the past. In problematic cases, we have therefore resorted to the OED (where words marked with || are regarded as foreign) and accepted its judgment. It should, however, be born in mind that educated people in the 17th and 18th century with a better, and more "natural" grounding in classical languages in particular might not have made the same decisions as we have.
Other layout features include the following:
<pb> marks the page breaks of the original text. It contains the n attribute for supplying the page numbers of the source. If there was no pagination on some pages, we have supplied our own and signalled this by putting it within square brackets, for instance <pb n="[iv]">. We have adopted the same convention when correcting wrong pagination of the original, e.g. <pb n="41[38]">.
<fw> indicates the catchwords (i.e. the first word of a page printed also at the bottom of the preceding page), which are actually used on every page of the original pamphlets. As stated above, we have usually ignored them in the transcription, but we have included them in those cases when there was a difference (usually in spelling) between the word on the two pages.
<lb> marks line breaks which interrupt running text in some way and therefore seem intentional typographically. They are especially common on the titlepages.
| -Contents- | ||