Universal Derivations 1.0, A Growing Collection of Harmonised Word-Formation Resources

Publication at Faculty of Mathematics and Physics |

2020

Abstract

The paper deals with harmonisation of existing data resources containing word-formation features by converting them into a common file format and partially aligning their annotation schemas. We summarise (dis)similarities between the resources and describe individual steps of the harmonisation procedure, including manual annotations and application of Machine Learning techniques.

The resulting 'Universal Derivations 1.0' collection contains 27 harmonised resources covering 20 languages. It is publicly available in the LINDAT/CLARIAH CZ repository and can be queried via the DeriSearch tool.

Keywords

universal derivations growing collection harmonised word formation resources