Charles Explorer logo
🇬🇧

Algorithms in Speech Recognition

Class at Faculty of Mathematics and Physics |
NPFL079

Syllabus

Overview of speech technologies

- wonders of speech recognition,

- main applications and their architectures,

- theories and models overview,

- software toolkits and libraries,

- speech processing books and magazines.

Acoustic Modelling (SPO C8-C9 | JEL C2-C3 | PSU C5.3 | DLA C3+C6, partially repetition of NPFL038)

- definition and parameters of the hidden Markov model (HMM),

- evaluation of an HMM (Forward algorithm),

- training of an HMM (Baum-Welch algorithm),

- extracting features of speech, scoring acoustic features (MFCC, Gaussians mixtures, parameters clustering),

- adaptive techniques (MAP, MLLR),

- confidence measures,

- software toolkits for speech recognition.

Language Modelling (NPFL067 | JEL C4 | SPO C11 | PSU 5.4)

- methods of language modelling,

- n-gram models, smoothing (Good-Turing, Katz), adaptive language models,

- structured language models (PCFG),

- specifics of spoken and writen language modelling,

- transducers and software tools for language modelling.

Basic decoding techniques (SPO C12 | JEL C5-C6 | PSU C6)

- search algorithms (search space and heuristics, A*),

- combining acoustic and language models (uni-, bi-, trigrams),

- time-synchronous search (Viterbi, beam, tree lexicon),

- state-synchronous search.

Large vocabulary search algorithms (SPO C13 | JEL C5-C6 | PSU 6.7.3, 6.7.5, 6.10)

- efficient manipulation of tree lexicon,

- N-best and multipass search strategies.

Automatic dialogue systems (SPO C17 | PSU C11)

- characteristics of spontaneous dialogues,

- prosody and structure of dialogues,

- semantic representation,

- dialogue management, emotion detection,

- VoiceXML.

Speaker identification (PSU C9)

- identification systems overview,

- selected speech features for speaker identification,

- basic methods.

This course can be preceded by NPFL038 and combined with NPFL067, NPFL068, NPFL123.

The software tools and libraries will be introduced and trained in the practical part of course.

Annotation

The course presents recent methodologies and software toolkits for speech recognition. Students will learn how to develop systems of automatic speech recognition and transcription, computer dialogue systems and speaker identification. The course shows principles, preparation and decoding algorithms of statistical acoustic and language models (HMM, n-gram and structured language models, final state transducers, graphical models,

Viterbi dynamic programming, heuristic hypothesis search strategies, stack decoder, neural networks).