Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection
arXiv cs.AI / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- We introduce the Budget-Sensitive Discovery Score (BSDS), a formally verified metric with 20 theorems machine-checked by Lean 4, that jointly penalizes false discoveries (lambda-weighted FDR) and excessive abstention (gamma-weighted coverage gap) at each budget level.
- Its budget-averaged form, the Discovery Quality Score (DQS), provides a single summary statistic that cannot be inflated by cherry-picking an optimal budget.
- In a case study on drug-discovery candidate selection, BSDS/DQS evaluated 39 proposers including LLM configurations and found that a simple RF-based Greedy-ML proposer achieved the best DQS, while no LLM configuration surpassed it under zero-shot or few-shot settings.
- The framework is general to any budget-constrained candidate selection with asymmetric error costs and generalized across five MoleculeNet benchmarks, indicating broad applicability.
Related Articles
How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models
Reddit r/LocalLLaMA
Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)
Dev.to
The Obligor
Dev.to
The Markup
Dev.to
2026 年 AI 部落格變現完整攻略:從第一篇文章到月收入 $1000
Dev.to