Charles Explorer logo
🇬🇧

Differences in Spoken Language Processing in General Corpora (ORAL, ORTOFON) and in a Specialized Corpus (DIALEKT) and Their Reflection in the Mapka Application

Publication at Faculty of Arts |
2023

Abstract

ORAL and ORTOFON, general corpora of the spoken Czech language, capture authentic and prototypical informal spoken language. DIALEKT, a specialized corpus, represents traditional regional dialects of the Czech language.

Since the corpora's goals and the nature of the captured language data differ, different data collection methods were required. It concerns not only the choice of speakers, but the whole communication situation.

Samples chosen from these three corpora are included in the Mapka application and reflect the distinct character of the corpora. The ORAL and ORTOFON samples show general spoken language in various informal situations and capture a wide range of speakers.

The DIALEKT samples represent traditional regional dialects spoken by chosen types of speakers in a semiformal situation of guided interview.