Charles Explorer logo
🇨🇿

Token-based typology and word order entropy: A study based on Universal Dependencies

Publikace

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

The present paper discusses the benefits and challenges of token-based typology, which takes into account the frequencies of words and constructions in language use. This approach makes it possible to introduce new criteria for language classification, which would be difficult or impossible to achieve with the traditional, type-based approach.

This point is illustrated by several quantitative studies of word order variation, which can be measured as entropy at different levels of granularity. I argue that this variation can be explained by general functional mechanisms and pressures, which manifest themselves in language use, such as optimization of processing (including avoidance of ambiguity) and grammaticalization of predictable units occurring in chunks.

The case studies are based on multilingual corpora, which have been parsed using the Universal Dependencies annotation scheme.