Protein-peptide binding interactions play an important role in cellular regulation and are functionally important in many diseases. If no prior knowledge of the location of a binding site is available, prediction may be needed as a starting point for further modeling or docking.
Existing approaches to prediction either require a sequence of the peptide to be already known or offer an unsatisfactory predictive performance. Here we propose P2Rank-Pept, a new machine learning based method for prediction of peptide-binding sites from protein structure.
We show that our method significantly outperforms other evaluated methods, including the most recent structure based prediction method SPRINT-Str published last year (AUC: $0.85 > 0.78$). P2Rank-Pept utilizes local structural and sequence information, including evolutionary conservation, and builds a prediction model based on a Random Forest classifier.
The novelty of our approach lies in using points on the solvent accessible surface as a unit of classification (as opposed to the typical approach of focusing on amino acid residues), and in the application of the robust technique of Bayesian optimization to systematically optimize arbitrary parameters of the algorithm. Our results assert that P2Rank software package is a viable framework for developing top-performing binding-site prediction methods for different types of binding partners.