Charles Explorer logo
🇬🇧

Corpus as a Means for Study of Lexical Usage Changes

Publication at Faculty of Mathematics and Physics |
2008

Abstract

Annotation in the original language is: The paper presents a corpus-based method for obtaining ranked wordlists that can characterise lexical usage changes. The method is evaluated on two 100-million representatively balanced corpora of contemporary written Czech that cover two consecutive time periods.

Despite similar overall design of the corpora, lexical frequencies have to be first normalised in order to achieve comparability. Furthermore, dispersion information is used to reduce the number of domain-specific items, as their frequencies highly depend on inclusion of particular texts into the corpus.

Statistical significance measures are finally used for evaluation of frequency differences between individual items in both corpora. It is demonstrated that the method ranks the resulting wordlists appropriately and several limitations of the approach are also discussed.

Influence of corpora composition cannot be completely obliterated and comparability of the corpora is shown to play a key role. There