E-SocialNav: Efficient Socially Compliant Navigation with Language Models

arXiv cs.RO / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that existing robotic navigation benchmarks often focus on task success while under-evaluating “social compliance” in robot behavior.
  • It evaluates GPT-4o and Claude for socially compliant navigation and finds that larger LMs can be too inefficient for real-time, resource-constrained robots due to latency and energy costs.
  • The authors propose E-SocialNav, a more efficient language-model approach trained on a comparatively small dataset to generate socially compliant behaviors.
  • E-SocialNav uses a two-stage training pipeline (supervised fine-tuning followed by direct preference optimization) and outperforms zero-shot baselines while improving both semantic similarity to human annotations and action accuracy.
  • The source code is published on GitHub, enabling further experimentation and benchmarking of the proposed socially compliant navigation method.

Abstract

Language models (LMs) are increasingly applied to robotic navigation; however, existing benchmarks primarily emphasize navigation success rates while paying limited attention to social compliance. Moreover, relying on large-scale LMs can raise efficiency concerns, as their heavy computational overhead leads to slower response times and higher energy consumption, making them impractical for real-time deployment on resource-constrained robotic platforms. In this work, we evaluate the social compliance of GPT-4o and Claude in robotic navigation and propose E-SocialNav, an efficient LM designed for socially compliant navigation. Despite being trained on a relatively small dataset, E-SocialNav consistently outperforms zero-shot baselines in generating socially compliant behaviors. By employing a two-stage training pipeline consisting of supervised fine-tuning followed by direct preference optimization, E-SocialNav achieves strong performance in both text-level semantic similarity to human annotations and action accuracy. The source code is available at https://github.com/Dr-LingXiao/ESocialNav.