Charles Explorer logo
🇬🇧

CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered

Publication at Faculty of Mathematics and Physics |
2016

Abstract

We present a new release of the Czech-English parallel corpus CzEng. CzEng 1.6 consists of about 0.5 billion words ("gigaword") in each language.

The corpus is equipped with automatic annotation at a deep syntactic level of representation and alternatively in Universal Dependencies. Additionally, we release the complete annotation pipeline as a virtual machine in the Docker virtualization toolkit.