K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

arXiv cs.LG / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces K-Score, a policy-gradient reinforcement learning method that replaces heuristic reward normalization with an online 1D Kalman filter for estimating reward statistics.
By recursively estimating the latent reward mean, K-Score smooths high-variance returns and can adapt to non-stationary environments during training.
The approach is designed to add minimal computational overhead and does not require any changes to existing policy network architectures.
Experiments on LunarLander and CartPole show that Kalman-filtered rewards improve training by accelerating convergence and reducing variance versus standard normalization methods.
The authors provide implementation code via https://github.com/Sumxiaa/Kalman_Normalization.

Abstract

We propose a simple yet effective alternative to reward normalization in policy gradient reinforcement learning by integrating a 1D Kalman filter for online reward estimation. Instead of relying on fixed heuristics, our method recursively estimates the latent reward mean, smoothing high-variance returns and adapting to non-stationary environments. This approach incurs minimal overhead and requires no modification to existing policy architectures. Experiments on \textit{LunarLander} and \textit{CartPole} demonstrate that Kalman-filtered rewards significantly accelerate convergence and reduce training variance compared to standard normalization techniques. Code is available at https://github.com/Sumxiaa/Kalman_Normalization.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

How I Automate My Dev Workflow with Claude Code Hooks

Dev.to

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™

Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

Dev.to

K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

How I Automate My Dev Workflow with Claude Code Hooks

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer