Charles Explorer logo
🇬🇧

Querying Multiword Expressions Annotation with NoSke

Publication at Faculty of Mathematics and Physics |
2017

Abstract

This paper demonstrates one of the possible ways on how to represent and query corpora with multiword expression (MWE) annotation. We exploit the multilingual corpus of 18 languages created under the PARSEME project with verbal multiword expression (VMWE) annotation.

VMWEs include categories such as idioms, light verb constructions, verb-particle constructions, inherently reflexive verbs, and others. The corpus was mainly used for the purposes of training predictive models, yet not much linguistic research was conducted based on this data.

We discuss how to allow linguists to query for MWEs in a simple user interface using the Corpus Query Language (CQL) within the NoSke corpus management and concordance system. Despite its limited abilities to represent challenging cases such as discontinuous, coordinated or embedded VMWEs, CQL can be sufficient to make basic analysis of the MWE-annotated data in corpus-based studies.