Charles Explorer logo
🇬🇧

STYX .1.0

Publication

Abstract

The STYX 1.0 corpus is a subset of the Prague Dependency Treebank (PDT, https://ufal.mff.cuni.cz/pdt2.0). The criterion for including sentences into STYX 1.0 was their suitability for practicing Czech morphology and syntax in elementary schools.

The PDT data are divided into three groups: the training data, the development test data and the evaluation test data (see more info https://ufal.mff.cuni.cz/pdt2.0/doc/pdt-guide/cz/html/ch03.html#a-data-purpose). The STYX 1.0 corpus keeps this division (see Data below).

The sentences in STYX are annotated according to both the PDT and the Czech school annotation system (sentence diagramming). The PDT annotation was transformed into the school annotation using manually designed rules, for more info see (Kucera, 2006) and (Hladka, Kucera, 2008).

In total, there are 11,655 sentences in the STYX corpus.