Contrastive analysis of frequency peaks of written and spoken lexicon

Publication at Faculty of Mathematics and Physics, Faculty of Arts |

2022

Abstract

A set of partial analyses of semantically and formally-grammatically defined groups of lexemes (adverbs, deictics, numerals, proper nouns, diminutives, female gender counterparts, and univerbized forms) belonging to the frequency peak of the spoken or written Czech language. A dataset of 3000 of the most frequent lemmas in the ORAL v1 and ORTOFON v1 corpora, or SYN2015; the differences between the two sets can be seen as a manifestation of diglossia in Czech at the lexical level.

Keywords

frequency peak written language spoken language diglossia adverbs deictics numerals proper names diminutives female gender counterparts univerbized forms