Charles Explorer logo
🇬🇧

Morphological Processing for English-Tamil Statistical Machine Translation

Publication at Faculty of Mathematics and Physics |
2012

Abstract

Various experiments from literature suggest that in statistical machine translation (SMT), applying either pre-processing or post-processing to morphologically rich languages leads to better translation quality. In this work, we focus on the English-Tamil language pair.

We implement suffix-separation rules for both of the languages and evaluate the impact of this preprocessing on translation quality of the phrase-based as well as hierarchical model in terms of BLEU score and a small manual evaluation. The results confirm that our simple suffix-based morphological processing helps to obtain better translation performance.

A by-product of our efforts is a new parallel corpus of 190k sentence pairs gathered from the web.