Charles Explorer logo
🇬🇧

English-Hindi Translation ? Obtaining Mediocre Results with Bad Data and Fancy Models

Publication at Faculty of Mathematics and Physics |
2009

Abstract

We describe our attempt to improve on previous English to Hindi machine translation results, using two open-source phrase-based MT systems: Moses and Joshua. We use several approaches to morphological tagging: from automatic word classes, through stem-suffix segmentation, to a POS tagger.

We also experiment with factored language models. We evaluate various combinations of training data sets and other existing English-Hindi resources.

To our knowledge, the BLEU score we obtained is currently the best published result for the IIIT-TIDES dataset.