EnvSocial-Diff: A Diffusion-Based Crowd Simulation Model with Environmental Conditioning and Individual-Group Interaction

arXiv cs.CV / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • EnvSocial-Diff is a diffusion-based crowd simulation model that combines social-physics-inspired learning with explicit environmental conditioning to better reproduce realistic pedestrian trajectories.
  • The model includes a structured environmental conditioning module that encodes obstacles, objects of interest, and lighting levels as interpretable scene constraints and attractors.
  • It also introduces an individual–group interaction component using a graph-based design to capture both fine-grained interpersonal relations and broader group conformity effects.
  • Experiments across multiple benchmark datasets indicate EnvSocial-Diff outperforms the latest state-of-the-art crowd simulation approaches, highlighting the value of both environmental and multi-level social modeling.
  • The authors provide released code via the project GitHub repository, enabling others to reproduce and build upon the approach.

Abstract

Modeling realistic pedestrian trajectories requires accounting for both social interactions and environmental context, yet most existing approaches largely emphasize social dynamics. We propose \textbf{EnvSocial-Diff}: a diffusion-based crowd simulation model informed by social physics and augmented with environmental conditioning and individual--group interaction. Our structured environmental conditioning module explicitly encodes obstacles, objects of interest, and lighting levels, providing interpretable signals that capture scene constraints and attractors. In parallel, the individual--group interaction module goes beyond individual-level modeling by capturing both fine-grained interpersonal relations and group-level conformity through a graph-based design. Experiments on multiple benchmark datasets demonstrate that EnvSocial-Diff outperforms the latest state-of-the-art methods, underscoring the importance of explicit environmental conditioning and multi-level social interaction for realistic crowd simulation. Code is here: https://github.com/zqyq/EnvSocial-Diff.