Abstract
Protein-ligand docking is one of the most widely used methods in structure-based virtual screening in the early stages of drug discovery. Its calculations require approximately 1 min per compound, making exhaustive evaluation of ultralarge libraries containing billions of molecules computationally impractical. In this study, we propose COFFEE-PRESC (COmpound Filtering by Fragment pair-based Efficient Evaluation for PRESCreening), a fast, fragment-based prescreening method. COFFEE-PRESC first docks fragments in a preconstructed fragment set to the target protein and enumerates multiple favorable protein-fragment docking poses and then pairs them to consider the pairwise positional relationship. The fragment set is composed of a small number of representative fragments that exhibit high similarity to many other fragments, enabling coverage of a large and diverse chemical space. Compounds that contain structures similar to fragment pairs are then retrieved through similarity-based searches. This retrieval methodology guarantees that the mutual positional relationship of the two matched fragments does not spatially collide. Finally, the retrieved compounds are evaluated using docking scores of the representative fragments and similarity values between the representative and individual fragments matched in the compound retrieval process. COFFEE-PRESC was 32-fold faster while achieving higher accuracy than Spresso, an existing prescreening tool, highlighting its potential for application to ultralarge compound library screening. The code is available under an MIT license at https://github.com/akiyamalab/coffee-presc.