Charles Explorer logo
🇬🇧

Universal Dependencies and Non-Native Czech

Publication at Faculty of Mathematics and Physics |
2018

Abstract

CzeSL is a learner corpus of texts produced by non-native speakers of Czech. Such corpora area great source of information about specific features of learners' language, helping language teachers and researchers in the area of second language acquisition.

In our project, we have focused on syntactic annotation of the non-native text within the framework of Universal Dependencies. As far as we know, this is a first project annotating a richly inflectional non-native language.

Our ideal goal has been to annotate according to the non-native grammar in the mind of the author, not according to the standard grammar. However, this brings many challenges.

First, we do not have enough data to get reliable insights into the grammar of each author. Second, many phenomena are far more complicated than they are in native languages.

We believe that the most important result of this project is not the actual annotation, but the guidelines and principles that can be used as a basis for other non-native languages.