Morphological tagging and lemmatization in the Czech National Corpus

Publication at Faculty of Arts |

2007

Abstract

This paper presents the methods by which three large textual corpora (SYN2000, SYN2005 and SYN2006PUB) of the Czech National Corpus have been tagged and lemmatised. The process proceeded in several phases: tokenization and segmentation, morphological analysis and disambiguation.

Morphological tagging and lemmatization in the Czech National Corpus

Abstract

Person