Charles Explorer logo
🇬🇧

Digitisation and Automatic Alignment of the DIALOG Corpus: Prosodically Annotated Corpus of Czech Television Debates

Publication at Faculty of Mathematics and Physics |
2007

Abstract

This article describes the development and automatic processing of the audio-visual DIALOG corpus. The DIALOG corpus is a prosodically annotated corpus of Czech television debates that has been recorded and annotated at the Czech Language Institute of the Academy of Sciences of the Czech Republic.

It has recently grown to more than 400 VHS 4-hour tapes and 375 transcribed TV debates. The described digitisation process and automatic alignment enable an easily accessible and user-friendly research environment, supporting the exploration of Czech prosody and its analysis and modelling.

This project has been carried out in cooperation with the Institute of Formal and Applied Linguistics of Faculty of Mathematics and Physics, Charles University, Prague. Currently the first version of the DIALOG corpus is available to the public (version 0.1, http://ujc.dialogy.cz).

It includes 10 selected and revised hour-long talk shows.