The article provides an overview of the motivation, evolution and major principles of the international project Merlin. The main output of this project is a unique trilingual learner corpus consisting of German,Italian and Czech.
The corpus will be available as an online platform illustrating the Common European Framework of Reference for Languages (CEFR) with authentic learner data and enabling users to explore authentic written learner productions and related metadata (e.g. age, first language of the learner etc.). Each text in the corpus is linguistically analysed during the multiphase error annotation.
This process brings some problematic issues concerning the particularly specific character of Czech as a Slavic language. The article summarizes some of these problems and their possible solutions.