Item characteristics such as difficulty or discrimination power are typically estimated from data. When little or no data are available at the pre-test, the test developers rely on their experience in how items of different content and wording influence item characteristics.
In this work, we explore various item features gathered from text analysis of item wording to predict item difficulty. We illustrate the methods using the English language test of the Czech matura exam.