A set of partial analyses of semantically and formally-grammatically defined groups of lexemes (adverbs, deictics, numerals, proper nouns, diminutives, female gender counterparts, and univerbized forms) belonging to the frequency peak of the spoken or written Czech language. A dataset of 3000 of the most frequent lemmas in the ORAL v1 and ORTOFON v1 corpora, or SYN2015; the differences between the two sets can be seen as a manifestation of diglossia in Czech at the lexical level.