Comparing Prosody Formalisms for Machine Learning

Publication at Faculty of Mathematics and Physics |

2007

Abstract

We need to find the most suitable prosody formalism for the task of machine learning. The target application is a prosody generative module for text-to-speech synthesis.

This module will learn prosody marks (parameters or symbols) from large corpora. Formalism we are looking for should be general, perceptually relevant, restorable, automatically obtained, objective and learnable.

Main formalisms for the pitch description are briefly described and compared, namely Fujisaki model, ToBI, Intsint, Tilt and “Glissando threshold” adaptation. The most suitable method of pitch description for the task of machine learning is “Glissando threshold” adaptation with an additional simplification.

Keywords

comparing prosody formalisms machine learning