AI Navigate

What on Earth is AlphaEarth? Hierarchical structure and functional interpretability for global land cover

arXiv cs.LG / 3/19/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a functional interpretability framework to reverse-engineer the role of embedding dimensions in Google AlphaEarth Foundations (GAEF) for land cover classification.
  • It demonstrates that embedding dimensions exhibit a hierarchical functional spectrum, including specialist, low-/mid-generalist, and high-generalist factors that encode different levels of geospatial information.
  • It shows that accurate land cover classification can be achieved with as few as 2 to 12 of the 64 dimensions, indicating substantial redundancy and potential reductions in computational cost.
  • It uses large-scale experiments and a structural analysis of embedding–classification relationships based on feature importance and progressive ablation to map dimension roles and guide dimension selection for operational tasks.

Abstract

Geospatial foundation models generate high-dimensional embeddings that achieve strong predictive performance, yet their internal organization remains obscure, limiting their scientific use. Recent interpretability studies relate Google AlphaEarth Foundations (GAEF) embeddings to continuous environmental variables, but it is still unclear whether the embedding space exhibits a functional or hierarchical organization, in which some dimensions act as specialized representations while others encode shared or broader geospatial structure. In this work, we propose a functional interpretability framework that reverse-engineers the role of embedding dimensions by characterizing their contribution to land cover structure from observed classification behavior. The approach combines large-scale experimentation with a structural analysis of embedding-class relationships based on feature importance patterns and progressive ablation. Our results show that embedding dimensions exhibit consistent and non-uniform functional behavior, allowing them to be categorized along a hierarchical functional spectrum: specialist dimensions associated with specific land cover classes, low- and mid-generalist dimensions capturing shared characteristics between classes, and highgeneralist dimensions reflecting broader environmental gradients. Critically, we find that accurate land cover classification (98% of baseline performance) can be achieved using as few as 2 to 12 of the 64 available dimensions, depending on the class. This demonstrates substantial redundancy in the embedding space and offers a pathway toward significant reductions in computational cost. Together, these findings reveal that AlphaEarth embeddings are not only physically informative, but also functionally organized into a hierarchical structure, providing practical guidance for dimension selection in operational classification tasks.