Charles Explorer logo
🇬🇧

Cross-lingual dependency transfer with harmonized Indian language treebanks

Publication at Faculty of Mathematics and Physics |
2014

Abstract

One of the most important aspect of cross-lingual dependency transfer is how different annotation styles which often underestimate the parsing accuracy are handled. The emerging trend is that the annotation style of different language treebanks can be harmonized into one style and the cumbersome manual transformation rules thus can be avoided.

In this paper, we use harmonized treebanks (POS tagsets and dependency structures of original treebanks mapped to a common style) for inducing dependencies in a cross-lingual setting. We transfer dependencies using delexicalized parsers that use harmonized version of the original treebanks.

We apply this approach to five Indian languages (Hindi, Urdu, Telugu, Bengali and Tamil) and show that best performance can be obtained in delexicalized parsing when the transfer takes place from Indian language (IL) to IL treebanks.