Charles Explorer logo
🇬🇧

On Index-free Similarity Search in Metric Spaces

Publication at Faculty of Mathematics and Physics |
2009

Abstract

Metric access methods (MAMs) serve as a tool for speeding similarity queries. However, all MAMs developed so far are index-based; they need to build an index on a given database.

The indexing itself is either static (the whole database is indexed at once) or dynamic (insertions/deletions are supported), but there is always a preprocessing step needed. In this paper, we propose D-file, the first MAM that requires no indexing at all.

This feature is especially beneficial in domains like data mining, streaming databases, etc., where the production of data is much more intensive than querying. Thus, in such environments the indexing is the bottleneck of the entire production/querying scheme.

The idea of D-file is an extension of the trivial sequential file (an abstraction over the original database, actually) by so-called D-cache. The D-cache is a main-memory structure that keeps track of distance computations spent by processing all similarity queries so far (within a runtime session).