Deep Clustering for Climate: Analyzing Teleconnections through Learned Categorical States

arXiv cs.LG / 4/28/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the challenge of extracting meaningful climate regimes from noisy, nonlinear climate variables by learning a discretized representation of time series.
  • It proposes using Masked Siamese Networks to map daily minimum and maximum temperature sequences into semantically meaningful, categorical clusters.
  • The learned clusters (under the authors’ assumptions) provide a simplified representation that can be used for downstream analysis and for sampling specific climate scenarios.
  • The resulting categorical states show statistical associations with El Niño events, suggesting scientific relevance beyond purely data-driven segmentation.
  • The work highlights self-supervised discretization as a promising technique for climate data analysis and motivates extending the approach with additional climate indicators.

Abstract

Understanding and representing complex climate variability is essential for both scientific analysis and predictive modeling. However, identifying meaningful climate regimes from raw variables is challenging, as they exhibit high noise and nonlinear dependencies. In this work, we explore the use of Masked Siamese Networks to discretize climate time series into semantically rich clusters. Focusing on daily minimum and maximum temperature, we show that the resulting representations: (i) yield clusters that reflect meaningful climate states under our modeling assumptions, offering a simplified representation for downstream use; (ii) enable sampling and analysis of specific climate scenarios; and (iii) exhibit statistical associations with El Ni\~no events, underscoring their scientific relevance. Our findings highlight the potential of self-supervised discretization as a tool for climate data analysis and open avenues for incorporating richer climate indicators in future work.