We use target-side monolingual data to extend the vocabulary of the translation model in statistical machine translation. This method called “reverse self-training” improves the decoder’s ability to produce grammatically correct translations into languages with morphology richer than the source language esp. in small-data setting.
We empirically evaluate the gains for several pairs of European languages and discuss some approaches of the underlying back-off techniques needed to translate unseen forms of known words. We also provide a description of the systems we submitted to WMT11 Shared Task.