Charles Explorer logo
🇬🇧

Improved Spelling Error Detection and Correction for Arabic

Publication at Faculty of Mathematics and Physics |
2012

Abstract

A spelling error detection and correction application is based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system.

We develop our dictionary of 9.3 million fully inflected Arabic words from a morphological transducer and a large corpus, cross validated and manually revised. We improve the error model by analysing error types and creating an edit distance re-ranker.

We also improve the language model by analysing the level of noise in different sources of data and selecting the right subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2010, OpenOffice Ayaspell and Google Document.