Charles Explorer logo
🇬🇧

A learner corpus of Czech: current state and future directions

Publication at Faculty of Mathematics and Physics, Faculty of Arts |
2013

Abstract

The paper describes CzeSL, a learner corpus of Czech as a Second Language, together with its design properties. We start with a brief introduction of the project within the context of AKCES, a programme addressing Acquisition Corpora of Czech; in connection with the programme we are also concerned with the groups of respondents, including differences due to their L1; further we comment on the choice of the sociocultural metadata recorded with each text and related both to the learner and the text production task.

Next we describe the intended uses of CzeSL. The core of the paper deals with transcription and annotation.

We explain issues involved in the transcription of handwritten texts and present the concept of a multi-level annotation scheme including a taxonomy of captured errors. We conclude by mentioning results from an evaluation of the error annotation and presenting plans for future research.