RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach

arXiv cs.LG / 3/20/2026

📰 NewsModels & Research

共有:

Key Points

The paper identifies that DRL-based bus holding control suffers Q-value instability due to conflating aleatoric (noise) and epistemic (data insufficiency) uncertainties, leading to underestimation and potential policy collapse in noisy environments.
RE-SAC introduces IPM-based weight regularization on the critic to hedge aleatoric risk and provides a smooth lower bound for the robust Bellman operator without costly inner-loop perturbations, along with a diversified Q-ensemble to curb overconfident estimates in sparse data regions.
In simulations of a realistic bidirectional bus corridor, RE-SAC achieves higher cumulative reward than vanilla SAC (-0.4e6 vs -0.55e6) and reduces Oracle Q-value estimation error by up to 62% in rare out-of-distribution states (MAE 1647 vs 4343).
The results demonstrate improved robustness to high traffic variability and better performance in realistic transit-control scenarios.

Abstract

Bus holding control is challenging due to stochastic traffic and passenger demand. While deep reinforcement learning (DRL) shows promise, standard actor-critic algorithms suffer from Q-value instability in volatile environments. A key source of this instability is the conflation of two distinct uncertainties: aleatoric uncertainty (irreducible noise) and epistemic uncertainty (data insufficiency). Treating these as a single risk leads to value underestimation in noisy states, causing catastrophic policy collapse. We propose a robust ensemble soft actor-critic (RE-SAC) framework to explicitly disentangle these uncertainties. RE-SAC applies Integral Probability Metric (IPM)-based weight regularization to the critic network to hedge against aleatoric risk, providing a smooth analytical lower bound for the robust Bellman operator without expensive inner-loop perturbations. To address epistemic risk, a diversified Q-ensemble penalizes overconfident value estimates in sparsely covered regions. This dual mechanism prevents the ensemble variance from misidentifying noise as a data gap, a failure mode identified in our ablation study. Experiments in a realistic bidirectional bus corridor simulation demonstrate that RE-SAC achieves the highest cumulative reward (approx. -0.4e6) compared to vanilla SAC (-0.55e6). Mahalanobis rareness analysis confirms that RE-SAC reduces Oracle Q-value estimation error by up to 62% in rare out-of-distribution states (MAE of 1647 vs. 4343), demonstrating superior robustness under high traffic variability.

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

Dev.to

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

Reddit r/LocalLLaMA

composer 2 is just Kimi K2.5 with RL?????

Reddit r/LocalLLaMA

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Dev.to

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

Reddit r/MachineLearning

RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach

Key Points

Abstract

Related Articles

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

composer 2 is just Kimi K2.5 with RL?????

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

composer 2 is just Kimi K2.5 with RL?????

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems