Charles Explorer logo
🇨🇿

Building a multilingual parallel corpus for human users

Publikace na Filozofická fakulta |
2012

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

We present the architecture and the current state of InterCorp, a multilingual parallel corpus centered around Czech, intended primarily for human users and consisting of written texts with a focus on fiction. Following an outline of its recent development and a comparison with some other multilingual parallel corpora we give an overview of the data collection procedure that covers text selection criteria, data format, conversion, alignment, lemmatization and tagging.

Finally, we discuss challenges and prospects of the project.