Increasing speed and consistency of phonetic transcription of spoken corpora using ASR technology

Publikace na Filozofická fakulta |

2015

Abstrakt

As William Labov has amply demonstrated, phonetic variables can pattern in informative ways with respect to sociolinguistic factors. Similarly, a quantitative analysis of the connected-speech processes in a language might help us understand which kinds of sound change it is prone to undergo, and whether they are perhaps lexically constrained (lexical diffusion).

Adding phonetic transcription to sociolinguistically diverse spoken corpora should therefore be a natural choice, all the more since they tend to be smallish and many phonetic phenomena necessarily have a higher rate of recurrence than word-level phenomena. Yet producing such a transcription is time-consuming and costly: it requires a considerable amount of manual work from human experts.

By the same token, it is also error-prone and potentially inconsistent in large projects: it is hard to maintain consistency over a span of several years, in spite of stringent quality control. The spoken corpus currently in development at the Institute of the Czech National Corpus, called ORTOFON, will include a phonetic layer in addition to the basic transcript.

Manual work on transcribing is well under way, and we are now exploring ways of automating the process to alleviate the issues sketched out above.

Klíčová slova

spoken corpora phonetic transcription automation ASR