ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway
arXiv cs.AI / 4/10/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces ToxReason, a new benchmark for evaluating mechanistic chemical toxicity reasoning grounded in the Adverse Outcome Pathway (AOP), rather than relying only on chemical-structure correlations.
- It tests whether models can infer organ-level toxic outcomes and their underlying mechanisms from the Molecular Initiating Event (MIE) to Adverse Outcome (AO) using drug–target interaction evidence and toxicity labels.
- The authors show that strong toxicity prediction accuracy can still coincide with biologically unfaithful or unreliable explanations, highlighting a gap in current benchmark evaluation.
- Experiments across multiple LLMs indicate that reasoning-aware training improves both mechanistic reasoning quality and toxicity prediction performance.
- Overall, the work argues that trustworthy toxicity modeling requires incorporating reasoning into both evaluation and training, not just measuring predictive scores.
Related Articles
CIA is trusting AI to help analyze intel from human spies
Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table
Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.
Dev.to
Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios
Dev.to

How To Optimize Enterprise AI Energy Consumption
Dev.to