A Proxy Consistency Loss for Grounded Fusion of Earth Observation and Location Encoders

arXiv cs.AI / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles the challenge that supervised Earth observation (EO) learning is often constrained by sparse high-quality labels by leveraging abundant geographic proxy variables that are correlated with the target but not identical.
  • It proposes a trainable location encoder that absorbs proxy data through a newly defined proxy consistency loss (PCL), enabling proxy information to be used even when proxies can be sampled independently of label availability.
  • The approach emphasizes that the location encoder must be properly regularized to remain robust and performant under limited labeled data.
  • Experiments on air quality prediction and poverty mapping show that proxy integration via the location encoder with PCL outperforms alternatives, including using proxy and EO inputs directly in an observation encoder or fusing with frozen pretrained location embeddings.
  • Results indicate that PCL improves in-sample accuracy by incorporating richer proxy information, while the learned latent embeddings improve out-of-sample generalization to regions lacking training labels.

Abstract

Supervised learning with Earth observation inputs is often limited by the sparsity of high-quality labeled or in-situ measured data to use as training labels. With the abundance of geographic data products, in many cases there are variables correlated with - but different from - the variable of interest that can be leveraged. We integrate such proxy variables within a geographic prior via a trainable location encoder and introduce a proxy consistency loss (PCL) formulation to imbue proxy data into the location encoder. The first key insight behind our approach is to use the location encoder as an agile and flexible way to learn from abundantly available proxy data which can be sampled independently of training label availability. Our second key insight is that we will need to regularize the location encoder appropriately to achieve performance and robustness with limited labeled data. Our experiments on air quality prediction and poverty mapping show that integrating proxy data implicitly through the location encoder outperforms using both as input to an observation encoder and fusion strategies that use frozen, pretrained location embeddings as a geographic prior. Superior performance for in-sample prediction shows that the PCL can incorporate rich information from the proxies, and superior out-of-sample prediction shows that the learned latent embeddings help generalize to areas without training labels.