Bilingual corpora offer a promising bridge between resource-rich and resource-poor languages, enabling the development of natural language processing systems for the latter. English is often selected as the resource-rich language, but another choice might give better performance.
In this paper, we consider the task of unsupervised cross-lingual POS tagging, and construct a model that predicts the best source language for a given target language. In experiments on 9 languages, this model improves on using a single fixed source language.
We then show that further improvements can be made by combining information from multiple source languages.