CoCzeFLA: Corpora of Czech as the First Language in Acquisition: A descriptive analysis of early use of být 'be'

Publikace na Ústřední knihovna, Filozofická fakulta |

2022

Abstrakt

The presentation introduce a new longitudinal corpus of spontaneous child Czech and a published analysis of the development of verb forms of být 'be' in the corpus. The corpus In the representative database of first language acquisition CHILDES, six Slavic languages are covered, mostly to a limited extend though.

Among longitudinal corpora covering a year of life or more, there are data for 4 Bulgarian children; 3 Croatian children; 1 Polish child; 1 Russian child; and 8 Serbian children. Just the Serbian corpus seems to be extensive enough to enable some general conclusions about spontaneous production in acquisition.

We add another Slavic corpus of this type to the public pool. Our corpus (Chroma) consists of transcripts of audiorecordings of 7 Czech children between ages of 1;7 to 3;9.

For each child, between 11 to 27 months of life are covered with a density of 28-63 recorded minutes per month (around 30 minutes mostly). We are currently working on a follow-up corpus (ChroMat) of transcripts of videorecordings of another 6-7 children in similar age range with rather higher record density.

The development of být 'be' 'Be' is usually the most frequent verb, and its acquisition is thus an important milestone in children's language development. Consequently, it might be an important marker of the developmental level in children.

First, we describe the early use of the verb as a copula or auxiliary, and describe the growth in usage (token frequency) and diversity (type frequency) for copulas and auxiliaries. Second, we examine whether the usage or diversity of 'be' might serve as a marker of grammatical development.

To address this question, we use cross-lagged mixed regression analyses examining whether the token or type frequency of 'be' predicts mean lenth of utterance (MLU) in subsequent transcripts, or vice versa. The results show that while MLU is a significant predictor of both, the token and type frequency of 'be', the opposite is not true.

However, models have identified significant interactions indicating that while usage and diversity of 'be' may predict higher subsequent MLU early during development, they are related to lower MLU in more advanced stages. This indicates that the mastery of the 'be' system is an early developmental achievement that is followed by the development in other domains of the grammatical system.

In the presentation, the already published data will be supplemented by an analysis of data from two another children whose transcription was recently completed.

Klíčová slova

longitudinal corpus first language acquisition copula auxiliary morphological development