Charles Explorer logo

New tools for working with the ORAL series corpora of spoken Czech : AchSynku and MluvKonk

Publication at Faculty of Arts |


This paper introduces two simple web-based tools whose aim is to make it easier to work with the ORAL series spontaneous spoken language corpora of the Czech National Corpus. Both strive to overcome and circumvent some of the limitations, either in the data themselves or in their visualization, currently faced by linguists who use them for research.

AchSynku is a variant search tool which aims to compensate for the lack of lemmatization in spoken corpora by suggesting, based on a word form input by the user, a list of variant and related forms occurring in the target corpora. MluvKonk is a visualization environment which turns single-line concordances into a multi-tier layout with one speaker per tier.

This makes it easier to follow the structure of a multi-party conversation, including turn-switching and overlaps. Though ultimately destined to be superseded by more systemic solutions, both applications are under active development and feedback is welcome, because these ulterior solutions will precisely take advantage of lessons learned in developing and especially using AchSynku and MluvKonk.