Charles Explorer logo
🇬🇧

A New Corpus of Czech With an Innovated Annotation

Publication at Faculty of Arts |
2021

Abstract

The paper introduces the SYN2020 corpus. The design of SYN2020 incorporates several substantial new features in the area of segmentation, lemmatization and morphological tagging, such as a new treatment of lemma variants, a new system for identifying morphological categories of verbs or a new treatment of multiword tokens.