Charles Explorer logo
🇬🇧

Lexical Association Measures: Collocation Extraction

Publication at Faculty of Mathematics and Physics |
2009

Abstract

This publication is devoted to an empirical study of lexical association measures and their application to collocation extraction. It presents a comprehensive inventory of lexical association measures and their evaluation on four reference data sets of collocation candidates: Czech dependency bigrams from the Prague Dependency Treebank, surface bigrams from the same source, instances of the latter from the Czech National Corpus, and Swedish distance verb-noun combinations obtained from the PAROLE corpus.

The collocation candidates in the reference data sets were manually annotated and labeled as collocations or non-collocations by expert linguists. The evaluation scheme applied in this work is based on measuring the quality of ranking collocation candidates according to their chance to form collocations.

The methods are compared by precision-recall curves, mean average precision scores, and appropriate tests of statistical significance. Further, the study focuses on