Charles Explorer logo
🇬🇧

Fundamentals of Speech Recognition and Generation

Class at Faculty of Mathematics and Physics |
NPFL038

Syllabus

Introduction to Speech Production and Perception.

General Principles of Automatic Speech Recognition (HMM)

- Isolated Word Recognition,

- Output Probability Specification,

- Baum-Welch Re-Estimation,

- Recognition and Viterbi Decoding,

- Continuous Speech Recognition,

- Speaker Adaptation.

HTK Tools description

- Data Preparation Tools,

- Training Tools,

- Recognition Tools,

- Analysis Tool.

Data Preparation

- the Task Grammar,

- the Language Model,

- the Dictionary,

- Recording the Data, Creating the Transcription Files, Coding the Data.

Creating Monophone HMMs

- Creating Flat Start Monophones,

- Fixing the Silence Models,

- Realigning the Training Data.

Creating Triphones HMMs

- Making Triphones from Monophones,

- Making Tied-State Triphones,

- Splitting States.

Recogniser Evaluation.

General Principles of Automatic Speech Generation.

Speech Prosody Analysis.

Annotation

This course deals with speech recognition and generation tasks and feature extraction of voice and utterance characteristics. Of particular interest will be topics related to Hidden Markov Models as applied to speech (FFT, n- dimensional clustering, Gaussian mixtures, parameter value extraction from data, phonetic representation, prosodic analysis etc.) and to their DNN-HMM hybrid models.

Preparation and training of own speech recognition and generation models.