Charles Explorer logo
🇬🇧

mmi_clustering

Publication

Abstract

MMI_clustering is a set of command line tools implementing Mercer's maximum mutual information-based clustering technique. Main clustering program comes with subsidiary tools for class-based text transformations and result visualization.

Together these form useful gadget for language modeling, study of semantic classes, or even analysis of authors' specific associations. The package contains program computing classification tree (growing it in an eager way from leafs (that is words) to its root), program for cutting this tree at a given level so we could obtain predetermined number of word classes and finally there is a visualizer/transformer drawing trees (ASCII-art based) and using classes to transform input text into a text where each word would be annotated with its class for example (the output format is quite configurable).

Keywords