Charles Explorer logo
🇬🇧

Next Step in Online Querying and Visualization of Word-Formation Networks

Publication at Faculty of Mathematics and Physics |
2020

Abstract

In this paper, we introduce a new and improved version of DeriSearch, a search engine and visualizer for word-formation networks. Word-formation networks are datasets that express derivational, compounding and other word-formation relations between words.

They are usually expressed as directed graphs, in which nodes correspond to words and edges to the relations between them. Some networks also add other linguistic information, such as morphological segmentation of the words or identification of the processes expressed by the relations.

Networks for morphologically rich languages with productive derivation or compounding have large connected components, which are difficult to visualize. For example, in the network for Czech, DeriNet 2.0, connected components over 500 words large contain 1/8 of the vocabulary, including its most common parts.

In the network for Latin, Word Formation Latin, over 10 000 words (1/3 of the vocabulary) are in a single connected component. With the recent release of the Univ