Charles Explorer logo
🇬🇧

Weakly Supervised POS Taggers Perform Poorly on <em>Truly</em> Low-Resource Languages

Publication

Abstract

Part-of-speech (POS) taggers for low-resource languageswhich are exclusively based on various forms of weak supervision

– e.g., cross-lingual transfer, type-level supervision,or a combination thereof – have been reported to perform almostas well as supervised ones. However, weakly supervised

POS taggers are commonly only evaluated on languages thatare very different from truly low-resource languages, and thetaggers use sources of information, like high-coverage and almosterror-free dictionaries, which are likely not available forresource-poor languages. We train and evaluate state-of-theartweakly supervised POS taggers for a typologically diverseset of 15 truly low-resource languages. On these languages,given a realistic amount of resources, even our best modelgets only less than half of the words right. Our results highlightthe need for new and different approaches to POS taggingfor truly low-resource languages.