Charles Explorer logo
🇬🇧

Consistency of the MorfFlex morphological dictionary

Publication at Faculty of Mathematics and Physics |
2021

Abstract

Language corpora usually contain, in addition to their own texts, various types of annotations. The most common one is a morphological annotation, which consists in assigning a lemma and a morphological tag to each wordform.

For morphological tagging, morphological dictionaries are traditionally used. Our paper presents a new version of the so-called "Prague" morphological dictionary MorfFlex used for tagging many Czech corpora (particularly Prague Dependency Treebanks, corpora published by the Institute of the Czech National Corpus in Prague or large Czech web corpora of the Aranea series).

Three basic principles were used to update the dictionary: the Golden Rule of Morphology, the Principle of Paradigm Unity, and the Principle of Paradigm Uniqueness.