List of topics

1. Introduction: Historical outline, corpus-based and corpus-driven approaches, types of corpora

2. Collocation: First queries with BNC: running the query, KWIC and Sentence view, ordering the results, viewing a larger context and bibliographical information, restricting the query

3. The simple query: Simple query syntax, words and phrases, variation in phrases, using wildcards,proximity queries

4. Distribution and sorting: Comparing results, normalized frequencies, statistical significance, dispersion and file-frequency extremes

5. Collocations: Making statistical claims, association measures

6. Colligation, pattern grammar: Queries based on part-of-speech and headword/lemma, tagging and parsing

7. Keywords and frequency lists: Text-type and word lists, using keywords in stylistic analysis

8. Corpora of spoken language: Problems of transcription, metadata, speakers’ characteristics

9. Corpora of academic spoken English: Representativeness; units of meaning in spoken corpora, lexical bundles, n-grams

10. Issues in corpus design: Purpose, size and representativeness, criteria of text selection, sampling, balance, homogeneity Working with self-designed corpora, Antconc, tagging

11. Corpora in contrastive research: Varieties of English, parallel and comparable corpora

12. Leaving the corpus: Extracting query results to an external database, presenting the results   PLUS: Three lectures of invited corpus linguists


The seminar focuses mainly on corpus linguistics as a method. Its aim is to introduce the students to the use (an partly, building) of electronic language corpora and corpus tools.

The students will work with the British National Corpus, the parallel translation corpus InterCorp as well as with specialized corpora of academic English and web corpora.