Charles Explorer logo
🇬🇧

Methodological traps of parallel corpora

Publication at Faculty of Arts |
2015

Abstract

In this article, we present methodological problems pertaining to the exploitation of parallel corpora, i.e. corpora composed of translations and their respective originals, and we try to propose the principles and rules helping to avoid said problems. We treat successively factors of the size and composition of parallel corpora (section 2), technical factors such as alignment or bilingual POS-tagging (section 3), the tricky question of the equivalence of the parallel versions (4.1), the importance of metadata about the originals as well as the translations (4.2) and the particularities of the language of translation (4.3).

The methodological principles are illustrated by specific studies on the parallel corpus InterCorp.