Charles Explorer logo
🇬🇧

Creating a sociologically balanced spoken corpus

Publication at Faculty of Arts |
2019

Abstract

The article presents the corpora of spoken Czech, which were created for language research and are publicly accessible. These are corpora that capture private spontaneous dialogues, therefore they were compiled according to the sociological criteria of each speaker.

These corpora have been binary balanced from the beginning in the categories of gender, age and the highest achieved level of education. Later, dialect regions were added, in which the speaker spent his childhood.

It is quite difficult to combine these criteria when recording longer interviews. Full balancing of all categories is accomplished in ORTOFON corpus.