Autonomous Adaptive Solver Selection for Chemistry Integration via Reinforcement Learning

arXiv cs.LG / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a constrained reinforcement learning framework that treats chemical solver choice as a Markov decision process, automatically selecting between CVODE (implicit BDF) and a QSS solver during chemistry integration.
  • Instead of making myopic decisions from local state, the RL agent learns trajectory-aware policies that account for how current solver choices affect downstream error accumulation, while enforcing a user-specified accuracy tolerance via a Lagrangian reward with online multiplier adaptation.
  • In 0D homogeneous reactor benchmarks, the RL-adaptive policy achieves about a 3× mean speedup (with a wide range up to ~10.6×) while preserving ignition delay and species profiles for a 106-species n-dodecane mechanism, at the cost of roughly 1% inference overhead.
  • The authors report zero-retraining transfer to 1D counterflow diffusion flames across strain rates 10–2000 s⁻¹, achieving consistent ~2.2× speedup versus CVODE and selecting CVODE for only ~12–15% of space-time points while maintaining near-reference temperature accuracy.

Abstract

The computational cost of stiff chemical kinetics remains a dominant bottleneck in reacting-flow simulation, yet hybrid integration strategies are typically driven by hand-tuned heuristics or supervised predictors that make myopic decisions from instantaneous local state. We introduce a constrained reinforcement learning (RL) framework that autonomously selects between an implicit BDF integrator (CVODE) and a quasi-steady-state (QSS) solver during chemistry integration. Solver selection is cast as a Markov decision process. The agent learns trajectory-aware policies that account for how present solver choices influence downstream error accumulation, while minimizing computational cost under a user-prescribed accuracy tolerance enforced through a Lagrangian reward with online multiplier adaptation. Across sampled 0D homogeneous reactor conditions, the RL-adaptive policy achieves a mean speedup of approximately 3\times, with speedups ranging from 1.11\times to 10.58\times, while maintaining accurate ignition delays and species profiles for a 106-species \textit{n}-dodecane mechanism and adding approximately 1\% inference overhead. Without retraining, the 0D-trained policy transfers to 1D counterflow diffusion flames over strain rates 10--2000~\mathrm{s}^{-1}, delivering consistent \approx 2.2\times speedup relative to CVODE while preserving near-reference temperature accuracy and selecting CVODE at only 12--15\% of space-time points. Overall, the results demonstrate the potential of the proposed reinforcement learning framework to learn problem-specific integration strategies while respecting accuracy constraints, thereby opening a pathway toward adaptive, self-optimizing workflows for multiphysics systems with spatially heterogeneous stiffness.

Autonomous Adaptive Solver Selection for Chemistry Integration via Reinforcement Learning | AI Navigate