Charles Explorer logo
🇬🇧

Corpus LINK

Publication

Abstract

LINK (originally LIngvistův Narozeninový Korpus, i.e. Linguist’s Birthday Corpus, created on the occasion of Professor František Čermák’s birthday) is a corpus comprising exclusively linguistic texts.

It is thus designed especially for the research of academic language specifics (study of terminology, the language of linguistics etc.). The corpus contains 2 353 748 positions in total, that is approximately 1.8 million tokens (without punctuation).

The corpus is lemmatized and morphologically tagged in the same way as the corpora of the SYN series (lemmatization and tagging are more or less of the same level as the SYN2009PUB corpus). The LINK corpus consists of 258 linguistics texts from the period 1985 - 2010, vast majority of which comes from the turn of the millennium.

The corpus includes both major linguistic studies (monographs, proceedings) and articles in professional periodicals and journals (esp. Slovo a slovesnost, Naše řeč).