Online Learning of Kalman Filtering: From Output to State Estimation

arXiv cs.LG / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies online learning of Kalman filtering when the underlying system model is unknown within partially observed linear dynamical systems.
  • It proposes a unified online-optimization framework that works for both output estimation and state estimation cases.
  • For output estimation, the authors use properties of the estimation-error cost (including conditional strong convexity) to prove a \log T regret bound over the horizon T.
  • For state estimation, they establish an impossibility result showing no algorithm can achieve sublinear regret in T under full access assumptions, then show \sqrt{T} regret becomes achievable under a random query scheme using more informative measurements.
  • Numerical experiments are used to validate the algorithms and to illustrate the trade-off between number of state queries and achievable regret under limited observations.

Abstract

In this paper, we study the problem of learning Kalman filtering with unknown system model in partially observed linear dynamical systems. We propose a unified algorithmic framework based on online optimization that can be used to solve both the output estimation and state estimation scenarios. By exploring the properties of the estimation error cost functions, such as conditionally strong convexity, we show that our algorithm achieves a \log T-regret in the horizon length T for the output estimation scenario. More importantly, we tackle the more challenging scenario of learning Kalman filtering for state estimation, which is an open problem in the literature. We first characterize a fundamental limitation of the problem, demonstrating the impossibility of any algorithm to achieve sublinear regret in T. By further introducing a random query scheme into our algorithm, we show that a \sqrt{T}-regret is achievable when rendering the algorithm limited query access to more informative measurements of the system state in practice. Our algorithm and regret readily capture the trade-off between the number of queries and the achieved regret, and shed light on online learning problems with limited observations. We validate the performance of our algorithms using numerical examples.