Charles Explorer logo
🇬🇧

Universal Segmentations 1.0 (UniSegments 1.0)

Publication

Abstract

Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc.

The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.