Charles Explorer logo
🇬🇧

Phonotactic Probability Effects on Pseudoword Perception: A Wordlikeness task on Czech

Publication at Faculty of Arts |
2022

Abstract

Phonotactic probability refers to the frequency with which phonological segments and sequences of phonological segments occur in words in a given language (Vitevich & Luce, 2004). It has been shown that phonotactic probabilities of words are important in language processing and language acquisition (Jusczyk, Luce & Charles-Luce, 1994; Mattys & Jusczyk, 2001; Pitt & McQueen, 1998).

For example, words with high phonotactic probability are recognized faster by native speakers in lexical decision tasks (Luce & Large, 2001) and pseudowords with high phonotactic probability are judged as more word-like by adults (Vitevitch, Luce, Charles-Luce & Kemmerer, 1997). The first widely available phonotactic probability calculator for English was developed by Vitevitch and Luce (2004) and it rapidly became a reference in the field used as a factor in hundreds of studies with English speakers.

Such a reference is however missing in a Slavic language. In this paper we present a soon-to-be-published script for phonotactic calculator for Czech as well as an experiment revealing the importance of phonotactic probability in processing of pseudowords in Czech.

We created a script in Python that provides estimates of phonotactic probability based on frequency of word tokens in a synchronous reference corpus of contemporary written Czech (Křen & Cvrček et al., 2020) or a spoken corpus of informal conversations (Kopřivová & Lukeš et al., 2017). The words are automatically transcribed into IPA using the CorPy library (Lukeš, 2016).

The phonotactic probability is estimated based on two measures: positional segment frequency and position-specific bisegment frequency, as in line with Vitevich & Luce (2004). The input for the script is any existing or non-existing word or an entire list of (non-)words, the output gives the phonotactic probability estimates.

For the experiment, we created a list of 40 pseudowords following phonotactic rules of Czech and we assessed their phonotactic probability. Then we created an online experiment in the PCIbex environment (Zehr & Schwarz, 2018) with a seven item Likert scale for every pseudoword. 88 native speakers of Czech judged the pseudowords based on their wordlikeness.

The data were analyzed using a mixed-effects model with phonotactic probability and neighbourhood density serving as predictors for the wordlikeness rating. Phonotactic probability turned out to be a significant predictor (p < 0.001), whereas neighbourhood density did not.