Charles Explorer logo
🇬🇧

Cryptic binding site prediction with protein language models

Publication at Faculty of Science, Faculty of Mathematics and Physics |
2023

Abstract

Structure-based identification of protein-ligand binding sites plays a crucial role in the initial stages of rational drug discovery pipelines. As machine learning methods are increasingly integrated into the process, a significant challenge arises while training these methods, as labeled data are typically derived from ligand-bound structures.

Consequently, these methods struggle to detect binding sites within proteins where the binding site is concealed in the absence of a bound ligand. Here, we explore the possibility of harnessing protein language models to address this issue and compare their performance against state-of-the-art methods, both those specialized in the cryptic binding site (CBS) detection and those that are not.

We show that applying pre-trained protein-language models in a relatively straightforward manner enables us to surpass the state-of-the-art of CBS prediction.