Multimodal Forecasting for Commodity Prices Using Spectrogram-Based and Time Series Representations

arXiv cs.LG / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses multivariate commodity-price forecasting by modeling both cross-variable dependencies and heterogeneous external influences more effectively than standard time-series approaches.
  • It introduces SEMF (Spectrogram-Enhanced Multimodal Fusion), which converts the target series into Morlet wavelet spectrograms and uses a Vision Transformer encoder to extract localized, frequency-aware features.
  • Exogenous variables (e.g., financial indicators and macroeconomic signals) are processed through a separate Transformer to capture their temporal structure and multivariate dynamics.
  • A bidirectional cross-attention module fuses the spectrogram-based and time-series modalities while preserving each modality’s distinct characteristics and learning cross-modal correlations.
  • Experiments on multiple commodity forecasting tasks show SEMF delivers consistent gains over seven competitive baselines across forecasting horizons and evaluation metrics, indicating improved multi-scale pattern capture.

Abstract

Forecasting multivariate time series remains challenging due to complex cross-variable dependencies and the presence of heterogeneous external influences. This paper presents Spectrogram-Enhanced Multimodal Fusion (SEMF), which combines spectral and temporal representations for more accurate and robust forecasting. The target time series is transformed into Morlet wavelet spectrograms, from which a Vision Transformer encoder extracts localized, frequency-aware features. In parallel, exogenous variables, such as financial indicators and macroeconomic signals, are encoded via a Transformer to capture temporal dependencies and multivariate dynamics. A bidirectional cross-attention module integrates these modalities into a unified representation that preserves distinct signal characteristics while modeling cross-modal correlations. Applied to multiple commodity price forecasting tasks, SEMF achieves consistent improvements over seven competitive baselines across multiple forecasting horizons and evaluation metrics. These results demonstrate the effectiveness of multimodal fusion and spectrogram-based encoding in capturing multi-scale patterns within complex financial time series.