CompleteRXN: Toward Completing Open Chemical Reaction Databases
arXiv cs.LG / 5/4/2026
📰 NewsModels & Research
Key Points
- Existing chemical reaction datasets like USPTO are significantly incomplete, often missing byproducts, co-reactants, and stoichiometric information, which undermines downstream reliability.
- The article introduces CompleteRXN, a large-scale supervised benchmark designed for reaction completion under realistic missing-data conditions by mapping USPTO records to curated mechanistic reactions and enforcing atom-balanced, aligned pairs.
- Evaluations compare multiple baselines, including a constrained encoder-decoder reaction completion model, the Constrained Reaction Balancer (CRB), and SynRBL, showing that performance worsens as incompleteness increases.
- CRB achieves the strongest benchmark results, reaching 99.20% equivalence accuracy on a random split and 91.12% on an extreme out-of-distribution split.
- When tested on reactions outside the benchmark (full uncurated USPTO), accuracy drops substantially across methods, underscoring a gap between benchmark scores and practical robustness and motivating future improvements.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge
CLMA Frame Test
Dev.to
Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B
Reddit r/LocalLLaMA