Assessing the Performance-Efficiency Trade-off of Foundation Models in Probabilistic Electricity Price Forecasting

arXiv cs.LG / 4/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study addresses how to choose between task-specific machine learning models and Time Series Foundation Models (TSFMs) for day-ahead probabilistic electricity price forecasting (PEPF) in volatile European power markets.
  • Across multiple evaluation metrics (CRPS, Energy Score, and predictive interval calibration), TSFMs generally outperform models trained from scratch, indicating stronger uncertainty-aware forecasting under changing market conditions.
  • However, when task-specific models are well configured—especially an NHITS backbone with Quantile-Regression Averaging (NHITS+QRA)—their performance can be very close to TSFMs, and may even surpass them.
  • The paper highlights that adding informative feature groups and using few-shot adaptation across European markets can further improve task-specific models, implying a meaningful trade-off between computational cost and incremental accuracy gains.
  • The overall conclusion is that TSFMs provide expressive modeling capacity, but conventional approaches remain highly competitive, so model selection should explicitly consider compute vs. marginal performance improvements for PEPF.

Abstract

Large-scale renewable energy deployment introduces pronounced volatility into the electricity system, turning grid operation into a complex stochastic optimization problem. Accurate electricity price forecasting (EPF) is essential not only to support operational decisions, such as optimal bidding strategies and balancing power preparation, but also to reduce economic risk and improve market efficiency. Probabilistic forecasts are particularly valuable because they quantify uncertainty stemming from renewable intermittency, market coupling, and regulatory changes, enabling market participants to make informed decisions that minimize losses and optimize expected revenues. However, it remains an open question which models to employ to produce accurate forecasts. Should these be task-specific machine learning (ML) models or Time Series Foundation Models (TSFMs)? In this work, we compare four models for day-ahead probabilistic EPF (PEPF) in European bidding zones: a deterministic NHITS backbone with Quantile-Regression Averaging (NHITS+QRA) and a conditional Normalizing-Flow forecaster (NF) are compared with two TSFMs, namely Moirai and ChronosX. On the one hand, we find that TSFMs outperform task-specific deep learning models trained from scratch in terms of CRPS, Energy Score, and predictive interval calibration across market conditions. On the other hand, we find that well-configured task-specific models, particularly NHITS combined with QRA, achieve performance very close to TSFMs, and in some scenarios, such as when supplied with additional informative feature groups or adapted via few-shot learning from other European markets, they can even surpass TSFMs. Overall, our findings show that while TSFMs offer expressive modeling capabilities, conventional models remain highly competitive, emphasizing the need to weigh computational expense against marginal performance improvements in PEPF.