Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version

arXiv cs.LG / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies stability of relative temporal-difference (TD) learning when using linear function approximation, aiming to understand how it behaves in settings where discount factors are near one.
It derives stability conditions and shows that the baseline distribution is a key factor for guaranteeing stability.
When the baseline is set to the empirical state-action process distribution, the method is shown to be stable for any non-negative baseline weight and any discount factor.
The authors perform a sensitivity analysis of parameter estimates, quantifying asymptotic bias and covariance.
The analysis indicates that both asymptotic bias and asymptotic covariance remain uniformly bounded even as the discount factor approaches one, addressing a common concern in TD methods.

Abstract

Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied in the tabular setting, stability guarantees with function approximation remain poorly understood. This paper analyzes relative TD learning with linear function approximation. We establish stability conditions for the algorithm and show that the choice of baseline distribution plays a central role. In particular, when the baseline is chosen as the empirical distribution of the state-action process, the algorithm is stable for any non-negative baseline weight and any discount factor. We also provide a sensitivity analysis of the resulting parameter estimates, characterizing both asymptotic bias and covariance. The asymptotic covariance and asymptotic bias are shown to remain uniformly bounded as the discount factor approaches one.