Charles Explorer logo
🇬🇧

GramatiKat

Publication

Abstract

GramatiKat provides information on the grammatical categories within a part of speech (e.g. which case is used most frequently for nouns etc.) as well as for individual lemmas (grammatical profiles). The tool is designed primarily for research into grammatical categories as well as for lexicological and lexicographic exploration, but it can be useful for other purposes, e.g. teaching Czech as a second language.

At the moment, only information on Czech nouns is available, we plan for adding adjectives and verbs in the future. Data is from the Czech National Corpus, namely the SYN2015 and ORALv4 corpora.

We only take into account nouns with frequency 100 and higher. The summary of word form distribution within a part of speech is based on the distribution of the word forms of each lemma (each lemma has equal weight in the calculations, regardless of frequency).

This ensures that extremely frequented lemmas do not distort the overall results.