CastFlow: Learning Role-Specialized Agentic Workflows for Time Series Forecasting

arXiv cs.LG / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • CastFlow is a new agentic time-series forecasting framework designed to move beyond the static, one-shot generative approach used by many LLM-based methods.
  • It structures forecasting as a planning→action→forecasting→reflection workflow, enabling multi-view temporal pattern extraction, multi-round context gathering, and iterative refinement, including ensemble-based forecasting.
  • The method uses a memory module to retrieve prior experience and a multi-view toolkit to create diagnostic evidence and produce a reliable ensemble forecast baseline.
  • CastFlow employs role-specialized components: a frozen general-purpose LLM for reasoning and a fine-tuned domain-specific LLM that performs evidence-guided numerical forecasting using the ensemble baseline rather than forecasting from scratch.
  • The paper reports two-stage workflow-oriented training (SFT followed by RL with verifiable rewards, RLVR) and shows improved results across multiple datasets versus strong baselines.

Abstract

Recently, large language models (LLMs) have shown great promise in time series forecasting. However, most existing LLM-based forecasting methods still follow a static generative paradigm that directly maps historical observations to future values in a single pass. Under this paradigm, forecasting is constrained by limited temporal pattern extraction, single-round acquisition of contextual features, one-shot forecast generation, and lack of support from ensemble forecasts. To address these limitations, in this work, we propose CastFlow, a dynamic agentic forecasting framework that enables multi-view temporal pattern extraction, multi-round contextual features acquisition, iterative forecast refinement, and forecasting with ensemble forecasts. First, CastFlow organizes the forecasting process into planning, action, forecasting, and reflection, establishing an agentic workflow. Second, this workflow is supported by a memory module that retrieves prior experience and a multi-view toolkit that constructs diagnostic evidence and provides a reliable ensemble forecast baseline. Third, CastFlow adopts a role-specialized design that combines general-purpose reasoning with specialized numerical forecasting. Under this design, a frozen LLM preserves general-purpose reasoning, while a fine-tuned domain-specific LLM performs evidence-guided numerical forecasting based on the ensemble forecast baseline, rather than from scratch. To optimize a fine-tuned domain-specific LLM, we further develop a two-stage workflow-oriented training that combines supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). To evaluate the effectiveness of CastFlow, we conduct extensive experiments on diverse datasets and show that it achieves superior overall results against strong baselines. We hope that this work can serve as a step toward more adaptive and accurate time series forecasting.