Charles Explorer logo
🇬🇧

Using Unsupervised Paradigm Acquisition for Prefixes

Publication at Faculty of Mathematics and Physics |
2008

Abstract

We describe a simple method of unsupervised morpheme segmentation of words in an unknown language. All what is needed is a raw text corpus (or a list of words) in the given language.

The algorithm identifies word parts occurring in many words and interprets them as morpheme candidates (prefixes, stems and suffixes). New treatment of prefixes is the main innovation over Zeman (2007).

After filtering out spurious hypotheses, the list of morphemes is applied to segment input words. Official Morpho Challenge 2008 evaluation is given along with some additional experiments evaluated unofficially.

We also analyze and discuss errors with respect to the evaluation method.