Charles Explorer logo

An Annotation Scheme for Speech Reconstruction on a Dialog Corpus

Publication at Faculty of Mathematics and Physics |


This paper presents the ongoing manual speech reconstruction annotation of the NAP corpus, which is a corpus of recorded conversations between pairs of people above family photographs, relating it to a more complex annotation scheme of the Prague Dependency Treebank family. The result of this effort will be a resource that will contain, on top of the audio recording of the dialog and its usual transcription, an edited and fully grammatical “reconstructed” dialog.

The format and alignment with the original audio and transcription on one side and a similar alignment (linking) to a deep analysis of the natural language sentences uttered in the dialog on the other side will be such that the resource can serve as a training and testing material for machine learning experiments in both intelligent editing as well as in dialog language understanding. The resource will be used in the Companions project, but it will be publicly available outside of the project as well.