Charles Explorer logo
🇬🇧

Metric indexing of protein databases and promising approaches

Publication at Faculty of Mathematics and Physics |
2007

Abstract

Most widely used biological databases nowadays are nucleotide and protein ones. These databases are crucial for determination of biological functions of living organisms with respect to their DNA structure.

The biological function of a protein can be derived from the similarity with another protein with known function which is stored in a database and therefore the chance of finding the biological function of given protein or DNA sequence grows with size of the database. Because of this fact, the growth is exponential which in turn calls for sublinear methods of searching these databases.

Optimal solution is aligning the query sequence with all sequences in the queried database. Since aligning of two sequences is computationally expensive, fast heuristic methods (e.g.

BLAST - Altschul et al., 1997) are used although they can only approximate the optimal solution without restricting the resulting error. In this paper we try to use metric access methods (MAMs) for exact and approximate searching through