Charles Explorer logo

FicTree : a manually annotated treebank of Czech fiction



The FicTree treebank is a syntactically annotated corpus of Czech fiction. It consists of 135,000 words (166,000 tokens).

The lemmatization, the morphological and syntactic annotation were performed manually. The treebank is accessible both as an annotated corpus in the CNC KonText interface and as downloadable shuffled language data, available both in the Prague Dependency Treebank a-layer annotation standard, and in Universal Dependencies standard.