Charles Explorer logo
🇨🇿

ONCO: Compiling an Old Norse Corpus

Publikace na Filozofická fakulta |
2023

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

A linguistic inquiry in the field of Old Norse/Icelandic is currently beset by manifold challenges: available databases and corpora either cover limited selections of texts, frequently both regarding genre and dialectal provenance, or are based only on normalized texts, not to mention modern Icelandic translations. It is therefore clear that a comprehensive corpus, akin to other historical corpora of other languages, such as those currently at the disposal of scholars of Old and Middle English, is lacking for Old Norse/Icelandic.

Such a corpus, containing a wide variety of extant texts, would naturally facilitate broader generalizations, comparative studies, and allowing observation of frequencies of occurrence. All these could advance our understanding of the language as well as perhaps shed some additional light on textual transmission and language contact.

Following a survey of available resources in the field, it is patent that a comprehensive corpus of Old Norse/Icelandic should encompass normalized as well as non-normalized texts of different dialectal provenances, not restricted to a particular genre. Morpho-syntactic tagging, apart from enhancing searches across the data set, would additionally allow for a future incorporation of other tools such as generated paradigms based on the corpus data, and other visualisation utilities.

In light of these demands, developing a tagger suitable for the task is a key concern as well as the main challenge. This presentation aims to discuss the process of creation of such a corpus.