The increasing amount of available unstructured content introduced a new concept of searching for information - the content-based retrieval. The principle behind is that the objects are compared based on their content which is far more complex than simple text or metadata based searching.
Many indexing techniques arose to provide an efficient and effective similarity searching. However, these methods are restricted to a specific domain such as the metric space model.
If this prerequisite is not fulfilled, indexing cannot be used, while each similarity search query degrades to sequential scanning which is unacceptable for large datasets. Inspired by previous successful results, we decided to apply the principles of genetic programming to the area of database indexing.
We developed the GP-SIMDEX which is a universal framework that is capable of finding precise and efficient indexing methods for similarity searching for any given similarity data. For this purpose, we introduce the inequality symbolic regression principle and show how it helps the GP-SIMDEX Framework to find appropriate results that in most, cases outperform the best-known indexing methods.