Charles Explorer logo
🇬🇧

Multi-tier Transcription of Informal Spoken Czech:The ORTOFON Corpus Approach

Publication at Faculty of Arts |
2014

Abstract

The spoken corpus ORTOFON is currently in the stage of data collection and annotation and will feature two main tiers of transcription: the ort layer (which is more or less orthographical) and the fon layer (which contains a simplified phonetic transcript). The recordings target prototypical spoken language as instantiated in informal conversations among people who know each other and are situated in their usual environment.

Like previous spoken corpora, ORTOFON will be balanced with respect to several sociolinguistic categories of the included speakers: gender, age, education and dialect region of childhood residence. By offering a detailed multi-tier transcript (including orthographic, phonetic and meta-linguistic layers), we aim to capture interactions in a complex way in the context of a given communication situation.

Examples will illustrate the specificities of our transcription guidelines.