The semiempirical quantum mechanical (SQM) methods used in drug design are commonly parametrized and tested on data sets of systems that may not be representative models for drug-biomolecule interactions in terms of both size and chemical composition. This is addressed here with a new benchmark data set, PLF547, derived from protein-ligand complexes, consisting of complexes of ligands with protein fragments (such as amino-acid side chains), with interaction energies based on MP2-F12 and DLPNO-CCSD(T) calculations.
From these, composite benchmark interaction energies are also built for complexes of the ligand with the complete active site of the protein (PLA15 data set). These data sets are used to test multiple SQM methods with corrections for noncovalent interactions; the role of the solvation model in the calculations is tested as well.