Clustered Pivot Tables for I/O-optimized Similarity Search

Publication at Faculty of Mathematics and Physics |

2011

Abstract

The pivot tables are a popular metric access method, primarily designed as a main-memory index structure. It has been many times proven that pivot tables are very efficient in terms of distance computations, hence, when assuming a computationally expensive distance function.

However, for cheaper distance functions and/or huge datasets exceeding the capacity of the main memory, the classic pivot tables become inefficient. In this paper, we propose a persistent variant of pivot tables, the clustered pivot tables, focusing on minimizing I/O cost when accessing small data blocks (a few kilobytes).

The clustered pivot tables employs a preprocessing method utilizing the M-tree in the role of clustering technique and an original heuristic for I/O-optimized kNN query processing. In the experiments we empirically show that our proposed method significantly reduces the number of necessary I/O operations during query processing.

Keywords

Clustering Pivot Tables I/O-optimized Similarity Search