Charles Explorer logo
🇬🇧

Methodological traps of parallel corpora and how to avoid them

Publication at Faculty of Arts |
2017

Abstract

In this article, we present methodological problems pertaining to the exploitation of parallel corpora, i.e. corpora composed of translations and their respective originals, and we try to propose the principles and rules helping to avoid said problems. We treat successively factors of the size and composition of parallel corpora (section 2), technical factors such as alignment or bilingual POS-tagging (section 3), the tricky question of the equivalence of the parallel versions (4.1), the importance of metadata about the originals as well as the translations (4.2) and the particularities of the language of translation (4.3).

The methodological principles are illustrated by specific studies on the parallel corpus InterCorp.