Spoken Language Corpora

🇬🇧

Class at Faculty of Arts |

AMLV00042

1 person43 study programmes

Syllabus

Probíraná témata:

1) Mluvený jazyk a jeho specifika

2) Vytváření mluvených korpusů

3) Typy korpusů mluveného jazyka , korpusy mluvené češtiny

4) Pořizování nahrávek, výběr mluvčích, anonymizace

5) Otázky transkripce (trankripční programy: Transcribere, ELAN, EXMERALDA)

6) Morfologická anotace mluvené češtiny

7) Světové korpusy mluveného jazyka

8) Práce s mluvenými korpusy (ORAL, ORTOFON, DIALEKT – korpusový manažer Kontext)

9) Mluvený jazyk v NLP

10) Specifické jevy mluvené češtiny

Annotation

The course focuses on spoken corpora: their building, design as well as use. Participants will get acquianted with spoken corpora of Czech and other languages, learn about the methods of data collection, transcription and transcription programs. They will learn how to query the corpora of spoken Czech (ORAL, ORTOFON, DIALEKT, DIALOG) in the KonText interface, on the website dialogy.net and using the SyD tool.

In the hands-on part, we will examine specific features of spoken language based on corpus data.

Discussion of selected research articles is an integral part of the course.

The course is taught mainly in Czech and therefore requires sufficient proficiency in order to attend