Charles Explorer logo
🇬🇧

Subjects tend to be coded only once: Corpus-based and grammar-based evidence for an efficiency-driven trade-off

Publication

Abstract

Using data from the World Atlas of Language Structures and the Universal Dependencies treebanks, we provide converging evidence from linguistic typology and comparative corpus linguistics for an efficiency-based trade-off in the encoding of referentially accessible subjects. Specifically, when familiar subjects are marked as bound elements attaching to the verb, the chancesof having obligatory independent subject pronouns decrease significantly across the world’s languages. At the same time, there is a trend against not encoding the subject at all, leading us topostulate an overall tendency to encode familiar subjects once and only once in a neutral topiccomment utterance. This tendency is mirrored in more fine-grained corpus data from Slavic:

East Slavic languages, in contrast to the other members of the genus, have past forms withoutverbal subject encoding, and it is precisely with these (former participle) forms that the use ofindependent subject pronouns is significantly higher than with other, non-participial verb forms.

By contrast, the occurrence of independent subject pronouns does not differ across various verbforms in other Slavic languages, as none of them has been affected by a loss of verbal subjectencoding.