MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling

arXiv cs.CV / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The arXiv paper argues that current generative foundation models are limited by bounded spatial scale, which prevents realistic modeling of how geographic environments change across thousands of kilometers.
  • It introduces MetaEarth3D as a generative foundation model designed to achieve spatially consistent, planetary-scale 3D generation, treating spatial scale as a new fundamental scaling axis.
  • Using optical Earth observation simulation as a testbed, the model can produce multi-level, unbounded, and diverse 3D scenes ranging from large terrains to cities and fine-grained street blocks.
  • The approach is trained on 10 million globally distributed real-world Earth observation images and is reported to deliver both visual realism and geospatial statistical realism.
  • The authors position MetaEarth3D as a generative data engine for ultra-wide-area spatial intelligence, potentially supporting next-generation Earth observation applications.

Abstract

Recent generative AI models have achieved remarkable breakthroughs in language and visual understanding. However, although these models can generate realistic visual content, their spatial scale remains confined to bounded environments, preventing them from capturing how geographic environments evolve across thousands of kilometers or from modeling the spatial structure of the large-scale physical world. This limitation poses a critical challenge for ultra-wide-area spatial intelligence in Earth observation and simulation, revealing a deeper gap in generative AI: progress has relied primarily on scaling model parameters and training data, while overlooking spatial scale as a core dimension of intelligence. Here, motivated by this missing dimension, we investigate spatial scale as a new scaling axis in foundation models and present MetaEarth3D, the first generative foundation model capable of spatially consistent generation at the planetary scale. Taking optical Earth observation simulation as a testbed, MetaEarth3D enables the generation of multi-level, unbounded, and diverse 3D scenes spanning large-scale terrains, medium-scale cities, and fine-grained street blocks. Built upon 10 million globally distributed real-world training images, MetaEarth3D demonstrates both strong visual realism and geospatial statistical realism. Beyond generation, MetaEarth3D serves as a generative data engine for diverse virtual environments in ultra-wide spatial intelligence. We argue that this study may help empower next-generation spatial intelligence for Earth observation.