Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training
arXiv cs.AI / 4/22/2026
💬 OpinionModels & Research
Key Points
- The paper argues that common curiosity rewards based only on local prediction error miss how the world model’s cumulative prediction error evolves over all visited transitions.
- It introduces “Curiosity-Critic,” an intrinsic reward tied to improvement of a cumulative prediction objective that can be computed in a tractable per-step form using the difference from an asymptotic (baseline) error.
- The method estimates the asymptotic error baseline online with a learned critic co-trained with the world model, allowing exploration to focus on learnable transitions without requiring access to an oracle noise floor.
- Experiments in a stochastic grid-world environment show Curiosity-Critic converges faster and yields better final world-model accuracy than prediction-error and visitation-count curiosity baselines.
- The approach provides an online separation of epistemic (reducible) versus aleatoric (irreducible) prediction error, and shows prior curiosity formulations as special cases under different approximations of the baseline.
Related Articles

Rethinking CNN Models for Audio Classification
Dev.to
v0.20.0rc1
vLLM Releases
I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw
Dev.to

HNHN: Hypergraph Networks with Hyperedge Neurons
Dev.to

Anthropic’s Mythos is stoking cybersecurity fears. What does it mean for China?
SCMP Tech