Charles Explorer logo

Multilingual Natural Language Processing

Class at Faculty of Mathematics and Physics |


- Introduction to multilinguality (what it is, why it is hard to deal with, what it is good for, WALS)

- Plain text (alphabets, transliteration, tokenization, language identification, language similarity)

- Machine translation for multilingual processing (Apertium, OPUS, Bible, Watchtower, alignment algorithms, multilingual machine translation)

- Morphology (morphological variability of languages, morphological annotation, Universal POS tags, Universal features, tagset conversions, cross-lingual tagging)

- Syntax (syntactic variability of languages, harmonization of treebank annotations, Universal Dependencies; multilingual parsing, cross-lingual parsing)

- Word embeddings, multilingual embeddings, contextual vector representations.


The course focuses on multilingual aspects of natural language processing. It explains both the issues and the benefits of doing NLP in a multilingual setting, and shows possible approaches to use. We will target both dealing with multilingual variety in monolingual methods applied to multiple languages, as well as truly multilingual and crosslingual approaches which use resources in multiple languages at once. We will review and work with a range of freely available multilingual resources, both plaintext and annotated.

The course has the form of a practical seminar in the computer lab.