RG-TTA: Regime-Guided Meta-Control for Test-Time Adaptation in Streaming Time Series

arXiv cs.LG / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces RG-TTA, a model-agnostic meta-controller for test-time adaptation in streaming time series that adjusts adaptation intensity according to how similar the incoming data batch is to previously seen regimes.
  • RG-TTA computes a regime similarity score using an ensemble of distribution- and feature-based metrics (e.g., Kolmogorov–Smirnov, Wasserstein-1, feature distance, variance ratio) to (a) scale the learning rate and (b) decide when to stop gradient updates via loss-driven early stopping.
  • It further improves efficiency by gating checkpoint reuse from a regime memory, loading specialist models only when they show clear loss improvements (≥30%) over the current model.
  • Across 672 streaming experiments covering multiple update policies, 4 architectures (GRU, iTransformer, PatchTST, DLinear), 14 datasets (real and synthetic regime shifts), and 4 forecast horizons, regime-guided methods often achieve the lowest MSE, with RG-TTA delivering a 5.7% MSE reduction over standard TTA while also running about 5.5% faster.
  • The authors demonstrate composability by integrating RG-TTA with existing gradient-based TTA approaches (e.g., RG-EWC and RG-DynaTTA), showing that regime-guided control can improve both accuracy and compute efficiency depending on the base strategy.

Abstract

Test-time adaptation (TTA) enables neural forecasters to adapt to distribution shifts in streaming time series, but existing methods apply the same adaptation intensity regardless of the nature of the shift. We propose Regime-Guided Test-Time Adaptation (RG-TTA), a meta-controller that continuously modulates adaptation intensity based on distributional similarity to previously-seen regimes. Using an ensemble of Kolmogorov-Smirnov, Wasserstein-1, feature-distance, and variance-ratio metrics, RG-TTA computes a similarity score for each incoming batch and uses it to (i) smoothly scale the learning rate -- more aggressive for novel distributions, conservative for familiar ones -- and (ii) control gradient effort via loss-driven early stopping rather than fixed budgets, allowing the system to allocate exactly the effort each batch requires. As a supplementary mechanism, RG-TTA gates checkpoint reuse from a regime memory, loading stored specialist models only when they demonstrably outperform the current model (loss improvement >= 30%). RG-TTA is model-agnostic and strategy-composable: it wraps any forecaster exposing train/predict/save/load interfaces and enhances any gradient-based TTA method. We demonstrate three compositions -- RG-TTA, RG-EWC, and RG-DynaTTA -- and evaluate 6 update policies (3 baselines + 3 regime-guided variants) across 4 compact architectures (GRU, iTransformer, PatchTST, DLinear), 14 datasets (6 real-world multivariate benchmarks + 8 synthetic regime scenarios), and 4 forecast horizons (96, 192, 336, 720) under a streaming evaluation protocol with 3 random seeds (672 experiments total). Regime-guided policies achieve the lowest MSE in 156 of 224 seed-averaged experiments (69.6%), with RG-EWC winning 30.4% and RG-TTA winning 29.0%. Overall, RG-TTA reduces MSE by 5.7% vs TTA while running 5.5% faster; RG-EWC reduces MSE by 14.1% vs standalone EWC.