Charles Explorer logo
🇨🇿

Aligning the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs

Publikace

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

We present here the efforts of aligning two language resources for Romanian: the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs: for each occurrence of those verbs in the treebank that were included as entries in the lexicon, a set of valence frames is automatically assigned, then manually validated by two linguists and, when necessary, corrected. Validating a valence frame also means semantically disambiguating the verb in the respective context.

The validation is done by two linguists, on complementary datasets. However, a subset of verbs were validated by both annotators and Cohen's κ is 0.87 for this subset.

The alignment we have made also serves as a method of enhancing the quality of the two resources, as in the process we identify morpho-syntactic annotation mistakes, incomplete valence frames or missing ones. Information from each resource complements the information from the other, thus their value increases.

The treebank and the lexicon are freely available, while the links discovered between them are also made available on GitHub.