Charles Explorer logo

From Dictionary to Corpus

Publication at Faculty of Arts |


In our paper, we would like to introduce "the path of phrasemes" from the dictionary into the czech corpus using Slovník české frazeologie a idiomatiky (Dictionary of Czech Phraseology and Idiomatics, hereinafter DCPI) and synchronous written corpus SYN2010 as an example. It is a way of converting phraseme entries that appear in a paper dictionary in their basic or most common form to such a form which enables to identify and mark the corresponding collocations in corpus texts.

The software for automatic searching for collocations in corpus texts based on the DCPI works on disambiguated morphologically tagged texts. The idioms are listed in tables which are used by the software.

A linguistic analysis determining specific conditions must be entered for individual idioms. The automatically found collocations are marked and can then be searched for with the help of the corpus software Bonito.