The study is based on a learner corpus of Czech as a foreign language MERLIN (Multilingual Platform for the European Reference Levels: Interlanguage Exploration in Context) and the corpus CzeSL-SGT (Czech as a Second Language).
The corpus Merlin contains 442 texts (64,490 words) in Czech written by non-native speakers. The texts were obtained during certified exams.
The corpus CzeSL contains 12 388 texts (960,000 words). Both corpora are tagged with linguistic and error annotation; the error annotation works with two target hypotheses.
From these texts, 5,785 occurrences of use of the verb JÍT/go were selected and analysed for (i) valency patterns; (ii) tense; (iii) mood; (iv) negation; (v) presence of a prefix; (vi) presence of a perfectivizing prefix; (vii) collocation; (viii) lexical use. The analyses include an error analysis including aspects of non-nativeness in grammatically correct sentences.