Charles Explorer logo
🇬🇧

Corpus of spontaneous spoken Czech ORAL2013

Publication at Faculty of Arts |
2015

Abstract

The paper presents a corpus of spontaneous spoken Czech ORAL2013, its design principles and practical solutions adopted during the data collection. The corpus is designed as a representation of contemporary spontaneous spoken language used in informal, real-life situations in the whole Czech Republic.

The corpus consists of audio recordings and their transcriptions aligned with time stamps, it features manual annotation and broad regional coverage with a large variety of speakers. ORAL2013 contains 835 recordings from 2008-2011 with 2,544 speakers (out of which 1,297 speakers are unique), the total length of audio is almost 300 hours and the total size of the transcriptions exceeds 3.28 million tokens.

ORAL2013 is publicly available within the framework of the Czech National Corpus at http://www.korpus.cz/.