Charles Explorer logo
🇬🇧

Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0)

Publication

Abstract

A richly annotated and genre-diversified language resource, The Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0, or PDT-C in short in the sequel) is a consolidated release of the existing PDT-corpora of Czech data, uniformly annotated using the standard PDT scheme. PDT-corpora included in PDT-C: Prague Dependency Treebank (the original PDT contents, written newspaper and journal texts from three genres); Czech part of Prague Czech-English Dependency Treebank (translated financial texts, from English), Prague Dependency Treebank of Spoken Czech (spokem data, including audio and transcripts and multiple speech reconstruction annotation); PDT-Faust (user-generated texts).

The difference from the separately published original treebanks can be briefly described as follows: it is published in one package, to allow easier data handling for all the datasets; the data is enhanced with a manual linguistic annotation at the morphological layer and new version of morphological dictionary is enclosed; a c