The article describes the Czech section of the crowdsourced audio dictionary stored on the forvo. com (2008-2021) portal, which is remarkable for several reasons: for its scope, reach, and linguistic diversity and, last but not least, due to the very unique variability of pronunciation that is realized. We compare the website with some other open multilingual databases of audio recordings and touch on the partly dichotomous relationship between the intended concept of the website and its actual form.
We also briefly characterize the Czech list of entries and summarize the advantages and weaknesses of available data for scientific purposes. Finally, we ponder a typical user of the portal, either an audio data provider (speaker), whose speech behaviour is obviously influenced by a specific speech situation during the recording, or a non-native lay recipient (listener), who is fully dependent on the confidence in the representativeness of specific pronunciation variants.
In the end, we define the term representativeness, that later will, in our article to follow, serve as an evaluation framework for the phonetic analysis of the recordings.