CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

arXiv cs.RO / 3/31/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces CARLA-Air, an open-source simulation infrastructure that unifies realistic urban driving (CARLA) and physics-accurate multirotor flight (AirSim) inside a single Unreal Engine process.
  • CARLA-Air preserves native Python APIs from both CARLA and AirSim as well as ROS 2 interfaces, aiming for zero-modification code reuse across air-ground robotics stacks.
  • It synchronizes both vehicles and sensors within a shared physics tick and rendering pipeline, enabling photorealistic environments plus aerodynamically consistent UAV dynamics and rule-compliant traffic with socially aware pedestrians.
  • The system supports capturing up to 18 sensor modalities per tick across platforms, and targets air-ground embodied intelligence workflows such as cooperation, embodied navigation/vision-language action, multi-modal perception, dataset construction, and reinforcement-learning policy training.
  • CARLA-Air is released with prebuilt binaries and full source, along with an extensible asset pipeline to integrate custom robot platforms into the unified world.

Abstract

The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving simulators lack aerial dynamics, while multirotor simulators lack realistic ground scenes. Bridge-based co-simulation introduces synchronization overhead and cannot guarantee strict spatial-temporal consistency. We present CARLA-Air, an open-source infrastructure that unifies high-fidelity urban driving and physics-accurate multirotor flight within a single Unreal Engine process. The platform preserves both CARLA and AirSim native Python APIs and ROS 2 interfaces, enabling zero-modification code reuse. Within a shared physics tick and rendering pipeline, CARLA-Air delivers photorealistic environments with rule-compliant traffic, socially-aware pedestrians, and aerodynamically consistent UAV dynamics, synchronously capturing up to 18 sensor modalities across all platforms at each tick. The platform supports representative air-ground embodied intelligence workloads spanning cooperation, embodied navigation and vision-language action, multi-modal perception and dataset construction, and reinforcement-learning-based policy training. An extensible asset pipeline allows integration of custom robot platforms into the shared world. By inheriting AirSim's aerial capabilities -- whose upstream development has been archived -- CARLA-Air ensures this widely adopted flight stack continues to evolve within a modern infrastructure. Released with prebuilt binaries and full source: https://github.com/louiszengCN/CarlaAir