When compiling a list of headwords, every lexicographer comes across words with an unattested representative dictionary form in the data. This study focuses on how to distinguish between the cases when this form is missing due to a lack of data and when there are some systemic or linguistic reasons.
We have formulated lexicographic recommendations for different types of such lacunas based on our research carried out on Czech written corpora. As a prerequisite, we calculated a frequency threshold to find words that should have the representative form attested in the data.
Based on a manual analysis of 2,700 nouns, adjectives and verbs that do not, we drew up a classification of lacunas. The reasons for a missing dictionary form are often associated with limited collocability and non-preference for the representative grammatical category.
Findings on unattested word forms also have significant implications for language potentiality.