RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach
arXiv cs.LG / 3/20/2026
📰 NewsModels & Research
Key Points
- The paper identifies that DRL-based bus holding control suffers Q-value instability due to conflating aleatoric (noise) and epistemic (data insufficiency) uncertainties, leading to underestimation and potential policy collapse in noisy environments.
- RE-SAC introduces IPM-based weight regularization on the critic to hedge aleatoric risk and provides a smooth lower bound for the robust Bellman operator without costly inner-loop perturbations, along with a diversified Q-ensemble to curb overconfident estimates in sparse data regions.
- In simulations of a realistic bidirectional bus corridor, RE-SAC achieves higher cumulative reward than vanilla SAC (-0.4e6 vs -0.55e6) and reduces Oracle Q-value estimation error by up to 62% in rare out-of-distribution states (MAE 1647 vs 4343).
- The results demonstrate improved robustness to high traffic variability and better performance in realistic transit-control scenarios.
Related Articles
Two bots, one confused server: what Nimbus revealed about AI agent identity
Dev.to
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark forFinance
Dev.to
A Coding Implementation to Build an Uncertainty-Aware LLM System with Confidence Estimation, Self-Evaluation, and Automatic Web Research
MarkTechPost
DNA Memory: Making AI Agents Learn, Forget, and Evolve Like a Human Brain
Dev.to
Tinybox- offline AI device 120B parameters
Hacker News