VULCAN: Vision-Language-Model Enhanced Multi-Agent Cooperative Navigation for Indoor Fire-Disaster Response

arXiv cs.RO / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces VULCAN, a multi-agent cooperative navigation framework designed specifically for indoor fire disaster response by combining multi-modal perception with vision-language models (VLMs).
  • It argues that existing multi-agent navigation systems—typically vision-only and built for benign environments—suffer major performance drops under fire-specific dynamics like smoke, heat, and changing layouts.
  • The authors extend the Habitat-Matterport3D benchmark with physically realistic fire simulations, including smoke diffusion, thermal hazards, and sensor degradation, to enable more credible evaluations.
  • Experiments compare multiple baseline cooperative navigation approaches in both normal and fire-driven settings, identifying critical failure modes and highlighting the need for robust, hazard-aware perception and planning.

Abstract

Indoor fire disasters pose severe challenges to autonomous search and rescue due to dense smoke, high temperatures, and dynamically evolving indoor environments. In such time-critical scenarios, multi-agent cooperative navigation is particularly useful, as it enables faster and broader exploration than single-agent approaches. However, existing multi-agent navigation systems are primarily vision-based and designed for benign indoor settings, leading to significant performance degradation under fire-driven dynamic conditions. In this paper, we present VULCAN, a multi-agent cooperative navigation framework based on multi-modal perception and vision-language models (VLMs), tailored for indoor fire disaster response. We extend the Habitat-Matterport3D benchmark by simulating physically realistic fire scenarios, including smoke diffusion, thermal hazards, and sensor degradation. We evaluate representative multi-agent cooperative navigation baselines under both normal and fire-driven environments. Our results reveal critical failure modes of existing methods in fire scenarios and underscore the necessity of robust perception and hazard-aware planning for reliable multi-agent search and rescue.

VULCAN: Vision-Language-Model Enhanced Multi-Agent Cooperative Navigation for Indoor Fire-Disaster Response | AI Navigate