TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting

arXiv cs.LG / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that many LLM-based time-series forecasting methods rely on deep synchronous fusion, which repeatedly entangles high-level LLM semantics with fine-grained numerical dynamics across all network layers.
  • It introduces a new framework, TimeSAF, designed to reduce “semantic perceptual dissonance” by decoupling unimodal learning from cross-modal interaction.
  • TimeSAF uses a hierarchical asynchronous fusion approach with an independent semantic fusion trunk that aggregates global semantics via learnable queries and a stage-wise decoder that injects those signals back into the temporal backbone asynchronously.
  • Experiments on long-term forecasting benchmarks reportedly show substantial improvements over state-of-the-art baselines, with strong few-shot and zero-shot generalization.
  • Overall, TimeSAF presents an architectural alternative to synchronous fusion that aims to provide stable semantic guidance without degrading low-level temporal dynamics learning.

Abstract

Despite the recent success of large language models (LLMs) in time-series forecasting, most existing methods still adopt a Deep Synchronous Fusion strategy, where dense interactions between textual and temporal features are enforced at every layer of the network. This design overlooks the inherent granularity mismatch between modalities and leads to what we term semantic perceptual dissonance: high-level abstract semantics provided by the LLM become inappropriately entangled with the low-level, fine-grained numerical dynamics of time series, making it difficult for semantic priors to effectively guide forecasting. To address this issue, we propose TimeSAF, a new framework based on hierarchical asynchronous fusion. Unlike synchronous approaches, TimeSAF explicitly decouples unimodal feature learning from cross-modal interaction. It introduces an independent cross-modal semantic fusion trunk, which uses learnable queries to aggregate global semantics from the temporal and prompt backbones in a bottom-up manner, and a stage-wise semantic refinement decoder that asynchronously injects these high-level signals back into the temporal backbone. This mechanism provides stable and efficient semantic guidance while avoiding interference with low-level temporal dynamics. Extensive experiments on standard long-term forecasting benchmarks show that TimeSAF significantly outperforms state-of-the-art baselines, and further exhibits strong generalization in both few-shot and zero-shot transfer settings.