Czech Language and Corpus

Class at Faculty of Arts |

AMLV00008

Syllabus

Topics

The course covers the following topics. Each lecturer has his/her own individual approach, the order of and/or emphasis on the particular topic can thus vary.

What is a corpus; CNC corpora

Corpus linguistics

Reprezentativeness of written and spoken corpora, register variation

Corpus annotation and structure

Corpus querying and interpretation of a concordance

Frequency analysis

Regular expressions and advanced CQL queries

Collocation, colligation and semantic prosody

Corpus material in the research of individual language layers

Basic foundations of data processing (MS Excel, tables and figures)

Basic statistics for working with corpora

Corpus tools SyD, Morfio, KWords

Specialized corpora (Diakorp, InterCorp, author corpora)

Devising and delivering a linguistic research based on corpus data.

Annotation

The course is aimed typically at the students of Czech studies. The students will get to know the language corpora available at Czech National Corpus and learn how to use them for their own research. They will also learn how to work with the KonText query interface and other web applications to query, find and interpret language phenomena.

Credit requirements: active participation, test, analysis of a language phenomenon using corpus linguistic methods.