Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection
arXiv cs.AI / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- We introduce the Budget-Sensitive Discovery Score (BSDS), a formally verified metric with 20 theorems machine-checked by Lean 4, that jointly penalizes false discoveries (lambda-weighted FDR) and excessive abstention (gamma-weighted coverage gap) at each budget level.
- Its budget-averaged form, the Discovery Quality Score (DQS), provides a single summary statistic that cannot be inflated by cherry-picking an optimal budget.
- In a case study on drug-discovery candidate selection, BSDS/DQS evaluated 39 proposers including LLM configurations and found that a simple RF-based Greedy-ML proposer achieved the best DQS, while no LLM configuration surpassed it under zero-shot or few-shot settings.
- The framework is general to any budget-constrained candidate selection with asymmetric error costs and generalized across five MoleculeNet benchmarks, indicating broad applicability.
Related Articles

The programming passion is melting
Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA