Ligand binding site prediction from protein structure plays an important role in various complex rational drug design efforts. Its applications include drug side effects prediction, docking prioritization in inverse virtual screening and elucidation of protein function in genome wide structural studies.
Currently available tools have limitations that disqualify them from many possible use cases. In general they are either fast and relatively inaccurate (e.g. purely geometric methods) or accurate but too slow for large scale applications (e.g. methods that rely on a large template libraries of known protein-ligand complexes).
P2Rank is a recently intorduced machine learning based method that have already exhibited speeds comparable to fastest geometric methods while providing much higher identification success rates. Here we present an improved version that brings speed-up as well as higher quality predictions.
A leap in predictive performance was achieved thanks to the technique of Bayesian optimization, which allowed simultaneous optimization of numerous arbitrary parameters of the algorithm. We have evaluated our method with respect to various performance and prediction quality criteria and compared it to other state of the art methods, as well as to it's previous version, with encouraging results.