Repeats in native and learner English

Publikace na Filozofická fakulta |

2019

Abstrakt

Spontaneous speech production is notoriously interspersed with different types of disfluencies of which repeats are some of the most frequent ones. This applies to both native speakers and language learners.

But do they produce repeats in a similar way? In our study we have compared recordings of 50 native speakers of English, 50 Czech and 50 Taiwanese advanced learners of English, and tried to answer questions regarding the frequency of repeats, the types of repeated words and their individual frequencies, the length of the repeated segments, between-speaker variability within the individual groups, the nature of any established differences thereof, and correlations with other fluency variables as well as with the learners' proficiency. The data for our analysis has been derived from the spoken parallel corpora LINDSEI (namely its Czech and Taiwanese subcorpora) and LOCNEC.

Each of the 150 interviews is approximately 15 minutes long. The total number of tokens is approximately 300,000.

The recordings have been orthographically transcribed, and all instances of repeats have been identified and tagged using a semi-automatic computer script which the author of the study developed for the purpose. Any instances of repetitions which had a semantic function (e.g. intensification) were removed.

The tagging system we designed enables differentiation between the length of repeats (e.g. one- or two-word repeats), numbers of repetitions, the word classes and discourse functions. The tagged corpora were then processed using a concordancer which facilitated the quantification of results and the sorting into categories for deeper analyses.

The three corpora contain 5,253 instances of repeats, 77% of which are presented by one-word repeats, 19% by two-word repeats, and 4% by repeated stretches of three or more words. Whilst these results are almost identical for the native and non-native corpora, differences appear in the mean rate of frequency, where the native speakers produce 1.5 (SD=0.87) repeats per hundred words (phw), the Czech learners 1.9 (SD=1.18) repeats phw, and the Taiwanese learners 2.15 (SD=1.47) repeats phw.

As regards between-speaker variability, this is greater in the Czech and Taiwanese subcorpora which have a larger number of speakers with a higher repeat rate than the mean repeat rate observed in the respective subcorpora. This might indicate that the learners feel a greater need to use repeats to maintain fluency.

It remains to be established whether this finding can be correlated to proficiency (the corpus is currently being rated for proficiency) and establish whether less proficient users show a greater reliance on repeats or whether, on the contrary, more proficient learners use repeats in a more native-like manner. As regards the analysis of the most frequently repeated word classes, all speakers repeat especially pronouns, prepositions, articles and conjunctions.

The repeated pronouns consist mainly of subject pronouns (especially I, we, he, they) and possessives (esp. my). The most frequently repeated conjunctions are and, but and if.

The most frequently repeated prepositions include in, of, for, to and about. The learners significantly (p <0.001) overuse repeats of practically all word classes except the indefinite article and contracted forms.

The most underused repeats of contractions are I've, I'll, I'd, he's and it's. No correlations have been found between the use of repeats and other fluency variables.

These include speech rate, and the frequency of filled and unfilled pauses. There does not appear to be any correlation between the learners' accuracy (operationalised as the rate of grammatical, lexical and lexico-grammatical errors) and their use of repeats.

A further examination of those learners who have an especially high repeat rate has revealed no typical pattern regarding the measures of these phenomena. We have yet to examine possible correlation with proficiency, which in the Czech subcorpus is likely to range between the CEFR levels B2 and C2, and in the Taiwanese subcorpus between B1 and C1 (based on preliminary ratings).

The results show that the learners had internalised the native strategy of producing repeats of primarily function words at the beginning of utterances, clauses or constituents as described in Maclay & Osgood (1959), Clark and Wasow (1998), Biber et al. (1999) and Kjellmer (2008). The results may be compared to Götz's (2013) study of the same phenomena with the German subcorpus of LINDSEI.

As was true for the our learners, Götz also found a significant underuse of repeats of articles and contractions by the learners. However, she found an overall significant underuse of repeats by the German learners which contrasts with our finding of the Czech and Taiwanese learners overusing repeats.

A close inspection of some of the individual speakers revealed idiosyncrasies in the use of repeats and considerable between-speaker variation. This might be an indication of the potential of the phenomenon for forensic science.

The comparison of the three corpora has revealed that the learners have successfully adopted a frequent native strategy of producing repeats to buy time for planning speech and for resolving arising problems. However, they mostly produce these with a higher frequency (except contractions and indefinite articles).

Further work is to be carried out investigating possible correlations between our findings and the learners' proficiency. (3) (PDF) Repeats in native and learner English. Available from: https://www.researchgate.net/publication/324438307_Repeats_in_native_and_learner_English [accessed Nov 07 2019].

Klíčová slova

fluency learner English repeats disfluency learner corpus research