ASMR-Bench: Auditing for Sabotage in ML Research
arXiv cs.AI / 4/20/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The paper introduces ASMR-Bench, a new benchmark designed to test how well auditors can detect sabotage embedded in ML research codebases.
- ASMR-Bench includes nine ML research codebases with sabotaged variants that change implementation details (e.g., hyperparameters, training data, evaluation code) while keeping the high-level methods the same.
- Experiments show that both frontier LLMs and LLM-assisted human auditors struggle to reliably detect sabotage, with the best results reaching an AUROC of 0.77 and a top-1 fix rate of 42%.
- When used as red teamers, LLMs generate sabotages that are generally weaker than those produced by humans, but they can still evade auditors with similar capabilities.
- The authors release ASMR-Bench to advance research on monitoring and auditing techniques for autonomous, AI-conducted scientific work.
Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.
Reddit r/artificial

Why I Built byCode: A 100% Local, Privacy-First AI IDE
Dev.to

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs
The Register

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle
Dev.to

Blaze Balance Engine SaaS
Dev.to