Charles Explorer logo
🇬🇧

Towards universal platform for protein binding site prediction

Publication at Faculty of Science, Faculty of Mathematics and Physics |
2018

Abstract

Most of proteins perform their function by binding to other molecules - ligands, nucleic acids, peptides, other proteins, etc. In situations, when some type of binding is suspected but only unannotated structure is known, binding site prediction methods can provide useful starting point for further analysis.

Existing prediction methods are based either on template matching in a library of known protein complexes (template-based methods) or are template-free (geometric, energetic, conservation-based and machine learning based). Some machine learning based methods for individual types of binding sites exist, but not much attention has been paid to the fact that any of those methods can potentially be used to predict any type of binding sites just by training a model on a different training dataset.

We have created a framework for developing machine learning based binding site prediction methods for various types of binding partners. Resulting methods work by predicting binding score of points lying on the solvent accessible surface of a protein.

Those points are described by a feature vector calculated from their local neighbourhood and represent potential locations of atoms of potential binding partners. The system is easily extensible by new protein surface descriptors and integrates a Bayesian optimization procedure for joint optimization of various arbitrary parameters of the algorithm (thresholds, distance cut-offs, etc.).

These features allow tailoring developed prediction methods to specific types of binding partners. So far, we have applied the approach to develop protein-ligand (small molecules) and protein-peptide binding site prediction methods.

In both cases, we were able to develop predictors that achieve state-of-the-art performance, while being faster than most of the competing methods.