In recent years, the attention of forensic phoneticians has turned, to some extent at least, to the idiosyncrasy of higher-order features found in longer stretches of speech. This includes mainly disfluency phenomena like hesitation markers, repetitions or interruptions (Duckworth & McDougall, 2013).
Disfluencies occur frequently in spontaneous conversations. They are considered to be a subconscious process resulting from continuous verbal planning (McDougall et al., 2015; Hughes et al., 2016).
Since they constitute a behavioural phenomenon, speakers are not easily able to alter their production of disfluency features, nor eliminate their usage; in other words, the usage of filled pauses, repetition, prolongation and other types of disfluencies are less prone to be changed due to voice disguise. Hesitations markers being stable within one speaker as for their quality and quantity is what makes them a promising source of forensic parameters (Braun & Rosin, 2015).
This study examines hesitation markers for their idiosyncratic potential in Czech. More specifically, our aim is to find whether filled pauses can be used to discriminate between speakers, classify speakers of two main regional dialects spoken in the Czech Republic, and whether recordings featured in the spoken part of the Czech National Corpus are suitable for studies of speaker specificity.
Our data were drawn from ORTOFON, a spoken corpus of Czech built by the Institute of the Czech National Corpus (Kopřivová et al., 2017). We selected 33 native male speakers of Czech, aged between 20 and 54 years.
In addition, all of them belong to a lower socioeconomic class, at least based on their attained education, having completed only primary or vocational school. 17 of the speakers come from Bohemia, the western part of the Czech Republic, while 16 come from Moravia, the eastern part of the country. The recordings for ORTOFON were obtained under various conditions, which is in line with the origin of most forensic recordings.
It is interesting to point out that from the 272 instances of filled pauses, as many as 231 (85%) correspond to the "uh" realization (i.e., they are purely vocalic in nature), while 38 tokens correspond to the "mh" realization (only nasal in nature) and only 3 to the "um" type (a sequence of a vocalic and nasal element). The presentation will therefore focus on the vocalic filled pauses only, and will include values of the first three formants obtained at three points within the filled pause.