Working together towards an ideal infrastructure for language learner corpora

Publikace na Filozofická fakulta |

2019

Abstrakt

In this article, we provide an overview of first-hand experiences and vantage points for best practices from projects in seven European countries dedicated to learner corpus research (LCR) and the creation of language learner corpora. The corpora and tools involved in LCR are becoming more and more important, as are careful preparation and easy retrieval and reusability of corpora and tools.

However, the lack of commonly agreed on solutions for many aspects of LCR, interoperability between learner corpora and the exchange of data from different learner corpus projects remains a challenge. We show how concepts like metadata, anonymization, error taxonomies and linguistic annotations as well as tools, toolchains and data formats can be individually challenging and how the challenges can be solved.

Klíčová slova

learner corpus linguistic annotation metadata error taxonomy infrastructure interoperability reusability