Charles Explorer logo
🇨🇿

New Parallel Corpora of Baltic and Slavic Languages - Assumptions of Corpus Construction

Publikace

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

In this article, we describe the design principles of the ten newly published CLARIN-PL corpora of Slavic and Baltic languages. In relation to other non-commercial online corpora, we highlight the distinctive features of these CLARIN-PL corpora: resource selection, preprocessing, manual segmentation at the sentence level, lemmatisation, annotation and metadata.

We also present current and planned work on the development of the CLARIN-PL Balto-Slavic corpora.