Geospatial foundation-model embeddings improve population estimation unevenly across space and scale

arXiv cs.LG / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study addresses the challenge of producing reliable subnational population estimates in regions where census data are sparse, outdated, or too coarse for fine-grained mapping.
  • It benchmarks geospatial foundation-model embeddings from the Population Dynamics Foundation Model (PDFM) against conventional harmonized geospatial covariates (e.g., settlement extent, night-time lights, environmental conditions) for Brazil, Nigeria, and the United States.
  • In geographically structured validation, PDFM embeddings improved predictive performance substantially, including a median 20.1% reduction in unexplained variance and a 23.2% reduction in Kullback-Leibler divergence versus geospatial covariates.
  • The improvements are uneven: PDFM helps most where traditional covariates poorly capture settlement context (notably in larger and less-developed subnational areas).
  • A key limitation is scale mismatch—PDFM embedding performance is more tightly coupled to spatial scale and transfers less flexibly across different spatial aggregations than the geospatial covariate approach, which constrains its effectiveness.

Abstract

Reliable subnational population estimates are essential for applications, yet remain difficult where censuses are sparse, outdated or spatially coarse. Existing population-mapping workflows rely on hand-built geospatial covariates, such as settlement extent, night-time lights, and environmental conditions, which must be assembled and harmonised across scales and geographies. Geospatial foundation models offer an alternative by learning reusable representations of place from more multifaceted and heterogeneous data sources. Here, we benchmark Population Dynamics Foundation Model (PDFM) embeddings against the harmonised geospatial covariates for subnational population estimation in Brazil, Nigeria and the United States. Under geographically structured validation, PDFM increased predictive fit by a median of 20.1% (IQR: 10.0-33.2%, across country-model comparisons) reduction in unexplained variance, and reduced Kullback-Leibler divergence by 23.2% (9.2-26.2%). However, these gains were uneven. PDFM was most advantageous where the geospatial covariates weakly characterised settlement context, such as larger and less-developed subnational areas. Moreover, PDFM performance was scale-coupled with embeddings providing less flexible transfer across spatial aggregations than geospatial covariates. These findings showed that geospatial foundation-model representations of place can improve population estimation in data poor settings, but their benefits break down predictably under spatial scale mismatch, revealing a fundamental limitation of current geospatial AI.