Charles Explorer logo

Work with Language Corpora

Class at Faculty of Arts |


1. 3. 10. What is a corpus? What is it used for? How can it be used? Basic types of corpora.

Czech and German corpora - comparison (DeReKo, ČNK, InterCorp) in the form of a presentation. Conditions and creation of background for working with corpora (laptops, internet access, registration in COSMAS II and ČNK). 2. 10. 10.

Basic understanding of the corpus search engine environment (COSMAS II.) – selection of archive, corpus, formulation of the simplest queries, 3. 17. 10. DeReKo (IDS Mannheim) - COSMAS II. - basic functions of the corpus search engine; Practicing basic search in COSMAS II., 4. 24. 10.

Advanced search and its practice in COSMAS II.; multi-word conjunctions 5. 31. 10. Advanced search and its practice in COSMAS II.; multi-word compounds II; regular characters; setting options (Optionen) 6. 7.11.

Expanding functions from the COSMAS II offer; working with tagged corpora 7. 14.11. COSMAS II - practicing CQL (Corpus Query Language); associated applications – CCDB, SOM, etc.; DWDS 8. 21. 11.

Competitive analysis - method of entering and evaluating generated data; idioms and collocations - a special task for corpus linguistics 9. 28.11. Czech National Corpus - basic information, basic search 10. 5. 12.

Czech National Corpus - practice searching, explanation of extension functions 11/12/12 InterCorp; CNK application 12. 19.12. Practical training based on the focus and interest of the course participants 13. 2.1 credit test


Working with language corpora is a course in which undergraduate students learn about the existence of language corpora and the possibilities of their use in linguistic practice. The course is focused on practical work and is a precursor to the Seminar on Corpus Linguistics, which follows in the NMgr. program.

In addition to getting to know the basic types and properties of corpora, the emphasis is placed on the practical usability of corpora in the everyday life of a linguist. Emphasis is placed on the corpora of Czech (ČNK) and German (DeReKo and DWDS) incl. associated applications Kookkurrenzanalyse, CCDB, SOM, or Treq, SyD, Morfio, KWords, WaG etc.

In the field of usability, the needs of students are reflected with regard to their research and writing of seminar and final theses from other disciplines.

Study programmes