Charles Explorer logo
🇨🇿

Similarity vs. Relevance: From Simple Searches to Complex Discovery

Publikace na Přírodovědecká fakulta, Matematicko-fyzikální fakulta |
2021

Tento text není v aktuálním jazyce dostupný. Zobrazuje se verze "en".Abstrakt

Similarity queries play the crucial role in content-based retrieval. The similarity function itself is regarded as the function of relevance between a query object and objects from database; the most similar objects are understood as the most relevant.

However, such an automatic adoption of similarity as relevance leads to limited applicability of similarity search in domains like entity discovery, where relevant objects are not supposed to be similar in the traditional meaning. In this paper, we propose the meta-model of data-transitive similarity operating on top of a particular similarity model and a database.

This meta-model enables to treat directly non-similar objects x , y as similar if there exists a chain of objects x , i1 , ..., in , y having the neighboring members similar enough. Hence, this approach places the similarity in the role of relevance, where objects do not need to be directly similar but still remain relevant to each other (transitively similar).

The data-transitive similarity concept allows to use standard similarity-search methods (queries, joins, rankings, analytics) in more complex tasks, like the entity discovery, where relevant results are often complementary or orthogonal to the query, rather than directly similar. Moreover, we show the data-transitive similarity is inherently self-explainable and non-metric.

We discuss the approach in the domain of open dataset discovery.