Charles Explorer logo
🇬🇧

Korektor – A System for Contextual Spell-checking and Diacritics Completion

Publication at Faculty of Mathematics and Physics, Faculty of Arts |
2012

Abstract

We present Korektor – a flexible and powerful purely statistical text correction tool for Czech that goes beyond a traditional spell checker. We use a combination of several language models and an error model to offer the best ordering of correction proposals and also to find errors that cannot be detected by simple spell checkers, namely spelling errors that happen to be homographs of existing word forms.

Our system works also without any adaptation as a diacritics generator with the best reported results for Czech text. The design of Korektor contains no language-specific parts other than trained statistical models, which makes it highly suitable to be trained for other languages with available resources.

The evaluation demonstrates that the system is a state-of-the-art tool for Czech, both as a spell checker and as a diacritics generator. We also show that these functions combine into a potential aid in the error annotation of a learner corpus of Czech.