Charles Explorer logo
🇬🇧

The InterCorp corpus, release 14

Publication

Abstract

A new version of a large parallel corpus containing translations between a total of 42 languages (including Czech). Compared to version 13, the number of words in foreign texts increased to 1,572 million, including 349 million in the fiction core and 1,223 million in freely available collections.

The total number of words in Czech texts is 207 million, including 118 million in the core and 90 million in the collections. Upper Sorbian texts were added.