Charles Explorer logo
🇬🇧

Authorship and Time Attribution of Arabic Texts Using JGAAP

Publication at Faculty of Arts |
2018

Abstract

One basic task in Natural Language processing is text classification, such as sorting documents by their content. A less well-known variant on this task is classifying documents by inferred metadata, such as the document's (inferred) language, date of composition or authorship.

Authorship attribution is a well-studied problem, but most of the work done has been in major European languages such as English. [Notable exceptions who have studied Arabic, in particular, include. We present a study selected from a new corpus (CLAUDia) containing nearly a half-billion words of Arabic text using a standard authorship analysis tool (JGAAP) to study the effects of author, genre, and time of composition on writing style and by extension on classification.

We have selected a subcorpus balanced to permit comparisons between genres as well as between time periods to see how best-performing methods change with genre and time. We also provide an analysis of a larger variety of different feature sets than has previously been done for Arabic.