Charles Explorer logo
🇬🇧

Measuring Syntheticity

Publication at Faculty of Arts |
2018

Abstract

Language typology is a common tool for describing, analyzing and comparing languages. The concept is very often used rather intuitively, especially in contrasting languages and their structures and it is rather problematic to quantify the degree of a language type present in the language under scrutiny.

The poster is going to present verification, or falsification, of the methodology for measuring morphological syntheticity proposed in "Measuring typological syntheticity of English diachronically with the use of corpora" (Tichý, Čermák, 2014). The methodology is based on morphological behaviour of high-frequency nouns, adjectives and verbs, the basic assumption being that the more varied the inflectional system of a language, the more synthetical its nature.

Based on the distribution of morphological markers across paradigms, authors computed syntheticity indices for these parts of speech in a given period (Old English, Middle English, Early Modern English, Present-Day English) and an overall index for the language of this period. A morphological marker is understood to be a formal expression, so it is not described on functional basis.

The index expresses the choice a speaker has in creating forms of a word, that is how well the markers are distributed throughout the system. To compute this index the authors used the informational entropy, capturing the degree of randomness in the system underlying a speaker's choice.

If this methodology is applicable to different languages and yields any comparable results, the description of English and its development can be considered functional and reliable. To see if this is really the case, the authors replicate the research on Modern Czech, using Czech corpora (SYN series, Institute of the Czech National Corpus).

Czech language is commonly described as a synthetic language, thus in terms of typology it is closer to Old English than it is to Present-Day English. Firstly, only probes of high-frequency words are carried out, in a similar way as in the original study.

If the syntheticity index is closer to the value of Old English than to the values of later stages of English, it is reasonable to assume that the index is representing the same property of the two language systems and to consider this methodology valid. The second step attempts to assess whether the fact that the calculation of the syntheticity index is based on a probe, i.e. on a limited set of data, may in itself skew the results and the value of the index.

Having the advantage of working with a fully lemmatized and tagged corpus, the whole procedure is repeated for the entire corpus and the final values compared with those obtained in the first step. Similarly, the procedure is repeated for the BNC and so that the results can be contrasted with those from the original study.