AI Navigate

How To Embed Matters: Evaluation of EO Embedding Design Choices

arXiv cs.CV / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper provides a systematic analysis of embedding design in GeoFM-based EO workflows, showing how decisions on representation extraction, aggregation, and combination affect downstream performance and pipeline scalability.
  • Using NeuCo-Bench, the study examines factors including backbone architecture, pretraining strategy, representation depth, spatial aggregation, and representation combination to assess their impact on EO tasks.
  • The authors demonstrate that compact embeddings can be aggregated into fixed-size representations more than 500x smaller than the raw data, enabling scalable deployment.
  • Across models, the study finds that transformer backbones with mean pooling are strong default embeddings, intermediate (not final) ResNet layers can outperform final layers, and self-supervised objectives offer task-specific strengths, with combining embeddings boosting robustness.
  • These results inform practical design choices for embedding-based EO workflows and emphasize trade-offs between accuracy and scalability when selecting embedding strategies.

Abstract

Earth observation (EO) missions produce petabytes of multispectral imagery, increasingly analyzed using large Geospatial Foundation Models (GeoFMs). Alongside end-to-end adaptation, workflows make growing use of intermediate representations as task-agnostic embeddings, enabling models to compute representations once and reuse them across downstream tasks. Consequently, when GeoFMs act as feature extractors, decisions about how representations are obtained, aggregated, and combined affect downstream performance and pipeline scalability. Understanding these trade-offs is essential for scalable embedding-based EO workflows, where compact embeddings can replace raw data while remaining broadly useful. We present a systematic analysis of embedding design in GeoFM-based EO workflows. Leveraging NeuCo-Bench, we study how backbone architecture, pretraining strategy, representation depth, spatial aggregation, and representation combination influence EO task performance. We demonstrate the usability of GeoFM embeddings by aggregating them into fixed-size representations more than 500x smaller than the raw input data. Across models, we find consistent trends: transformer backbones with mean pooling provide strong default embeddings, intermediate ResNet layers can outperform final layers, self-supervised objectives exhibit task-specific strengths, and combining embeddings from different objectives often improves robustness.