Bi-level Heterogeneous Learning for Time Series Foundation Models: A Federated Learning Approach

arXiv cs.LG / 4/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that time series data heterogeneity (both across domains and within a domain) is more severe than in vision or language and can harm foundation-model training when heterogeneous datasets are naively mixed in batches.
  • It proposes a bi-level learning framework that first extracts domain-invariant, semantically consistent knowledge while reducing cross-domain gradient/representation interference.
  • The method uses a federated learning approach with local regularization to mitigate intra-domain conflicts and domain-aware aggregation to improve inter-domain collaboration.
  • Experiments on multiple benchmarks show the resulting time series foundation models outperform centralized and other federated baselines for both point and probabilistic forecasting, with competitive zero-shot performance at scale.
  • Overall, the work provides a practical pathway to train time series foundation models “from scratch” in heterogeneous, multi-domain environments by controlling both intra- and inter-domain discrepancies.

Abstract

Heterogeneity in time series data is more pronounced than in vision or language, as temporal dynamics vary substantially across domains and tasks. Existing efforts on training time series foundation models (TSFMs) from scratch are often trained with mixed-batch strategies that merge large-scale datasets, which can cause gradient conflicts and degrade representation quality. To address this, we propose a fine-grained learning method that distills invariant knowledge from heterogeneous series while reducing cross-domain interference. We characterize heterogeneity at two levels: inter-domain and intra-domain. To tackle this bi-level heterogeneity, we design a federated learning method that mitigates intra-domain conflicts by enforcing domain-invariant and semantically consistent representations through local regularization, and addresses inter-domain discrepancies by enhancing cross-domain collaboration via domain-aware aggregation. Experiments across diverse benchmarks show that TSFMs trained with our method consistently outperform both centralized and federated TSFM baselines in point and probabilistic forecasting, while also achieving competitive zero-shot performance at scale, offering a flexible pathway for training TSFMs from scratch in heterogeneous environments.