Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites

Publikace na Přírodovědecká fakulta, Matematicko-fyzikální fakulta |

2016

Abstrakt

Protein-protein interactions (PPI) play a key role in investigation of various biochemical processes and their identification is thus of great importance. Although computational prediction of interface amino acids has been an active field for some time, the quality of in silico methods is still far from perfect.

Therefore we present INSPiRE - a novel protein-protein interaction sites predictor which benefits from a knowledge base containing local structural information of known protein complexes. To build the knowledge base we downloaded all proteins involved in PPIs from Protein Data Bank [1] and converted them into labeled graphs with nodes representing amino acids.

Then, structural neighborhood of each node was encoded into a bit string and stored in the the knowledge base. In the prediction phase, we label amino acids of unknown protein as interface or non-interface based on how often any element with a similar structural neighborhood appears as interface or non-interface in the knowledge base.

Given the need to be able to efficiently search for similar elements that are not exact matches, we utilized a technique inspired by Atom-Pairs fingerprints (AP) [2] for encoding structural neighborhoods. We tested two types of structural neighborhood: k-nearest nodes in the 3D space; and all nodes within distance of i edges from the central node.

Also we compared amino acid type and relative solvent accessible surface area as nodes labels. The best results according to Matthews correlation coefficient were achieved using 12-nearest nodes neighborhood with amino acid type as label.

This setting was further tested on DS188 - a well established dataset compiled by Zhang et al. [3] - where it achieved MCC = 0:481, which is significantly better than MCC = 0:345 achieved by PredUs [4], the best-performing existing method. REFERENCES [1] H.

M. Berman, J.

Westbrook, Z. Feng, G.

Gilliland, T. N.

Bhat, H. Weissig, I.

N. Shindyalov, and P.

E. Bourne, "The protein data bank," Nucleic Acids Research, vol. 28, no. 1, pp. 235-242, 2000. [2] R.

E. Carhart, D.

H. Smith, and R.

Venkataraghavan, "Atom pairs as molecular features in structure-activity studies: definition and applications," J Chem Inform Comput Sci, vol. 25, no. 2, pp. 64-73, 1985. [3] Q. C.

Zhang, D. Petrey, R.

Norel, and B. H.

Honig, "Protein interface conservation across structure space," Proceedings of the National Academy of Sciences, vol. 107, no. 24, pp. 10 896-10 901, 2010. [4] Q. C.

Zhang, L. Deng, M.

Fisher, J. Guan, B.

Honig, and D. Petrey, "Predus: a web server for predicting protein interfaces using structural neighbors," in NAR, 2011.

Klíčová slova

protein interaction prediction molecular fingerprints data mining