Biased Dreams: Limitations to Epistemic Uncertainty Quantification in Latent Space Models

arXiv cs.LG / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies epistemic uncertainty quantification in latent dynamics models used in model-based reinforcement learning, focusing on Dreamer-style recurrent state space models.
  • The authors show that latent transitions become biased toward well-represented regions of latent space, producing an attractor effect that may not reflect the true environment dynamics.
  • Because environment-model discrepancies may not appear in latent space, uncertainty estimates can become unreliable, weakening their use for exploration and for preventing exploitation of model errors.
  • The work finds that these attractor states often occur in high-reward regions, leading latent rollouts to systematically overestimate predicted rewards.
  • Overall, the results point to key limitations of epistemic uncertainty estimation in latent dynamics models and argue for more critical evaluation of this approach.

Abstract

Model-Based Reinforcement Learning distinguishes between physical dynamics models operating on proprioceptive inputs and latent dynamics models operating on high-dimensional image observations. A prominent latent approach is the Recurrent State Space Model used in the Dreamer family. While epistemic uncertainty quantification to inform exploration and mitigate model exploitation is well established for physical dynamics models, its transfer to latent dynamics models has received limited scrutiny. We empirically demonstrate that latent transitions are biased toward well-represented regions of latent space, exhibiting an attractor behavior that can deviate from true environment dynamics. As a result, discrepancies in environment dynamics may not manifest in latent space, undermining the reliability of epistemic uncertainty estimates. Because these attractors often lie in high-reward regions, latent rollouts systematically overestimate predicted rewards. Our findings highlight key limitations of epistemic uncertainty estimation in latent dynamics models and motivate more critical evaluation of this method.