Charles Explorer logo
🇨🇿

A method for comparison of general sequences via type-token ratio

Publikace na Filozofická fakulta |
2021

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

This article proposes a new method for analyzing and comparing general linear sequences with the minimum prior knowledge on the sequences needed. Sequence analysis is a broad problem studied by various fields from sociology, computer security to linguistics or biology.

The currently presented method applies the simplest quantitative linguistic tools in order to achieve methods transparency and easily interpretable results. The results form a vector describing the sequence and allows their clustering, machine learning and simple visualizations by line charts or multidimensional methods as MDS or tSNE.

For completeness, artifacts and several formal models are derived to describe methods behavior in both common and extreme cases.