FETS Benchmark: Foundation Models Outperform Dataset-specific Machine Learning in Energy Time Series Forecasting

arXiv cs.AI / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The FETS benchmark paper argues that energy time-series forecasting has traditionally been dataset-specific and costly, but foundation models can generalize better via large-scale pretraining.
  • It introduces the Foundation Models in Energy Time Series Forecasting (FETS) benchmark, including a structured taxonomy of use cases and 54 datasets across nine data categories.
  • Across all evaluated settings and data categories, foundation models outperform classical dataset-optimized machine learning approaches, even when those models have access to the full historical target data during training.
  • The study finds that covariate-informed foundation models perform best, with predictive accuracy correlated to spectral entropy and saturating beyond a certain context length, and improving with higher aggregation levels.
  • The authors conclude that foundation models offer scalable, generalizable forecasting for the energy sector, especially in data-constrained and privacy-sensitive environments.

Abstract

Driven by the transition towards a climate-neutral energy system, accurate energy time series forecasting is critical for planning and operation. Yet, it remains largely a dataset-specific task, requiring comprehensive training data, limiting scalability, and resulting in high model development and maintenance effort. Recently, foundation models that aim to learn generalizable patterns via extensive pretraining have shown superior performance in multiple prediction tasks. Despite their success and strong potential to address challenges in energy forecasting, their application in this domain remains largely unexplored. We address this gap by presenting the Foundation Models in Energy Time Series Forecasting (FETS) benchmark. We (1) provide a structured overview of energy forecasting use cases along three main dimensions: stakeholders, attributes, and data categories; (2) collect and analyze 54 datasets across 9 data categories, guided by typical stakeholder interests; (3) benchmark foundation models against classical machine learning approaches across different forecasting settings. Foundation models consistently outperform dataset-specific optimized machine learning approaches across all settings and data categories, despite the latter having seen the full historic target data during training. In particular, covariate-informed foundation models achieve the strongest performance. Further analysis reveals a strong correlation between predictive performance and spectral entropy, performance saturation beyond a certain context length, and improved performance at higher aggregation levels such as national load, district heating, and power grid data. Overall, our findings highlight the strong potential of foundation models as scalable and generalizable forecasting solutions for the energy domain, particularly in data-constrained and privacy-sensitive settings.