Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval

Publikace

Abstrakt

Cross-Lingual Information Retrieval (CLIR) enables a user to query in a language which is different from the target documents language. CLIR incorporates a translation technique based on either a manual dictionary or a probabilistic dictionary which is generated from a parallel corpus.

The translation techniques for Hindi language suffer from a translation mis-mapped issue which is due to the morphological richness of Hindi language. In addition, a word may have multiple translations in a dictionary leading to word translation disambiguation issue.

This paper addresses two key findings, i.e., Semantic Morphological Variant Selection (SMVS), and Hybrid Word Translation Disambiguation (HWTD), the former resolves translation mis-mapped issue and the later disambiguates the queries more effectively. The proposed techniques are investigated for FIRE ad-hoc datasets, where SMVS and HWTD at word level achieve better evaluation measures in comparison to the baseline Statistical Machine Translation.

Klíčová slova

Continuous bag of word Cross-lingual information retrieval Disambiguation Parallel corpus Recurrent neural network Translation