Benchmarking Single-Pose Docking, Consensus Rescoring, and Supervised ML on the LIT-PCBA Library: A Critical Evaluation of DiffDock, AutoDock-GPU, GNINA, and DiffDock-NMDN
arXiv cs.LG / 5/5/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The study benchmarks multiple virtual screening workflows on the experimentally derived LIT-PCBA library, using 15 targets and 578,295 ligand–target pairs with confirmed actives/inactives.
- For pose generation, the authors compare AutoDock-GPU and DiffDock, then apply rescoring with GNINA and NMDN, finding AutoDock-GNINA (GNINA rescoring of AutoDock-GPU poses) as the strongest single method with a median EF1% of 2.14.
- DiffDock-based pipelines generally underperform the best AutoDock-GNINA approach on several targets, with particularly challenging cases such as OPRK1.
- Consensus rescoring/ranking strategies improve robustness but still do not beat the top single-scoring workflow, while supervised ML re-ranking provides the largest benefit, reaching a median EF1% of 4.49 (+110% over AutoDock-GNINA).
- Overall, the work concludes that no single docking method is universally dominant and that validated, cost-effective classical+ML hybrid pipelines with supervised re-ranking currently deliver the most practical early enrichment for virtual screening.
Related Articles

Black Hat USA
AI Business

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Anthropic Launches AI Services Company with Blackstone & Goldman Sachs
Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to