InterCorp multilingual parallel corpus and its Croatian component

Publication at Faculty of Arts |

2023

Abstract

This paper aims to present the multilingual parallel corpus InterCorp and to highlight its potential contribution to the research possibilities of not only Slavic but also other languages. Intercorp is a parallel linguistic corpus that covers 42 languages after 15 years of development, including all Slavic languages.

It serves as a source of data for theoretical studies, lexicography, student research, language teaching, computer applications, translators, as well as for the general public. The main part of this corpus is literary works translated into several languages, but in recent years a special part of this corpus has also been publicistic, administrative and legal texts.

For the time being, the Croatian component is represented only by fiction texts and film subtitles. Thanks to its concept, this corpus allows not only investigating of structural differences and similarities between languages, but also compares the actual use of individual linguistic phenomena in today's languages.

Although Intercorp is predominantly a synchronic corpus, it includes fiction texts from the 1950s to the present, so it can also contribute to the study of linguistic dynamics and changes in linguistic usage during this period.

Keywords

InterCorp parallel corpus Slavic languages