Charles Explorer logo
🇬🇧

Lists: Frequency lists browser

Publication

Abstract

The Frequency lists browser tool allows the user to browse the frequency lists of various units (lemma, word and lc) in representative corpora of written Czech language (SYN2000, SYN2005, SYN2010, SYN2015) and in the corpus of spontaneous spoken Czech language Oral v1. For each written Czech corpus, the users can access not only the overall results, but also frequency information for the three sub-corpora (fiction, non-fiction, and journalistic texts).

Frequency lists only contain units which are made up of alphabetic symbols and hyphens, and which have a frequency higher than zero in each of the written corpora (SYN2000, SYN2005, SYN2010, and SYN2015), or, in case of Oral, a non-zero frequency in this corpus. When browsing the list by corpora (first tab), each unit has 4 types of frequency information: absolute frequency, relative frequency (IPM), average reduced frequency (ARF) and average reduced frequency normalized per million words (ARFn).