Abstract
Structure-based virtual screening (SBVS) is a fundamental approach in drug discovery, yet its predictive accuracy is highly dependent on methodological choices, scoring functions, and data processing strategies. This study systematically evaluates five protocol variants integrating molecular docking, induced-fit docking (IFD), quantum-polarized ligand docking (QPLD), ensemble docking (ED), and molecular mechanics/generalized Born surface area (MM-GBSA) in Helicobacter pylori urease employing four distinct crystallographic structures obtained from the protein data bank (PDB). We assess their predictive performance using statistical correlation metrics (Spearman and Pearson) and error-based measures (mean absolute error, root-mean-squared error, and inlier ratio metric). Additionally, we investigate the influence of data fusion techniquesminimum, median, arithmetic, geometric, harmonic, and Euclidean meansand varying numbers of docking poses (ranging from 1 to 100) on ligand ranking accuracy. Results indicate that MM-GBSA and ED consistently outperform other methods in compound ranking, although MM-GBSA exhibits higher errors in absolute binding energy predictions. While increasing the number of poses generally reduces predictive accuracy, the minimum fusion approach remains robust across all conditions. Comparisons between IC(50) and pIC(50) as experimental reference values reveal that pIC(50) provides higher Pearson correlations, reinforcing its suitability for affinity prediction, while both metrics perform similarly in Spearman rankings. These findings refine SBVS workflows by optimizing scoring and pose aggregation strategies, highlighting the importance of method selection and data fusion techniques. The proposed framework enhances ligand prioritization in virtual screening campaigns and can be adapted to other therapeutic targets. Future research should explore adaptive scoring frameworks and machine-learning approaches to further improve the SBVS predictive reliability.