Charles Explorer logo

Reusable Tagset Conversion Using Tagset Drivers

Publication at Faculty of Mathematics and Physics |


Annotation in the original language is: Part-of-speech or morphological tags are important means of annotation in a vast number of corpora. However, different sets of tags are used in different corpora, even for the same language.

Tagset conversion is difficult, and solutions tend to be tailored to a particular pair of tagsets. We propose a universal approach that makes the conversion tools reusable.

We also provide an indirect evaluation in the context of a parsing task.