AI Navigate

Social-R1: Towards Human-like Social Reasoning in LLMs

arXiv cs.AI / 3/11/2026

Ideas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the challenge of enabling human-like social intelligence in large language models, focusing on perception of social cues, mental state inference, and appropriate response generation.
  • It introduces ToMBench-Hard, an adversarial benchmark designed to provide difficult training examples that prevent LLMs from using superficial shortcuts for social reasoning.
  • Social-R1, a novel reinforcement learning framework, is proposed to align model reasoning with human cognition by supervising the entire reasoning process with multi-dimensional rewards, including structural alignment and logical integrity.
  • Experimental results show that a 4B parameter model trained with Social-R1 can outperform significantly larger models and generalize effectively across eight diverse social reasoning benchmarks.
  • These findings highlight that training with challenging cases and trajectory-level alignment of reasoning processes offers a promising path toward efficient and reliable social intelligence in LLMs.

Computer Science > Artificial Intelligence

arXiv:2603.09249 (cs)
[Submitted on 10 Mar 2026]

Title:Social-R1: Towards Human-like Social Reasoning in LLMs

View a PDF of the paper titled Social-R1: Towards Human-like Social Reasoning in LLMs, by Jincenzi Wu and 7 other authors
View PDF HTML (experimental)
Abstract:While large language models demonstrate remarkable capabilities across numerous domains, social intelligence - the capacity to perceive social cues, infer mental states, and generate appropriate responses - remains a critical challenge, particularly for enabling effective human-AI collaboration and developing AI that truly serves human needs. Current models often rely on superficial patterns rather than genuine social reasoning. We argue that cultivating human-like social intelligence requires training with challenging cases that resist shortcut solutions. To this end, we introduce ToMBench-Hard, an adversarial benchmark designed to provide hard training examples for social reasoning. Building on this, we propose Social-R1, a reinforcement learning framework that aligns model reasoning with human cognition through multi-dimensional rewards. Unlike outcome-based RL, Social-R1 supervises the entire reasoning process, enforcing structural alignment, logical integrity, and information density. Results show that our approach enables a 4B parameter model to surpass much larger counterparts and generalize robustly across eight diverse benchmarks. These findings demonstrate that challenging training cases with trajectory-level alignment offer a path toward efficient and reliable social intelligence.
Comments:
Subjects: Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.09249 [cs.AI]
  (or arXiv:2603.09249v1 [cs.AI] for this version)
  https://doi.org/10.48550/arXiv.2603.09249
Focus to learn more
arXiv-issued DOI via DataCite

Submission history

From: Jincenzi Wu [view email]
[v1] Tue, 10 Mar 2026 06:26:24 UTC (1,896 KB)
Full-text links:

Access Paper:

Current browse context:
cs.AI
< prev   |   next >
Change to browse by:
cs

References & Citations

export BibTeX citation Loading...

BibTeX formatted citation

×
Data provided by:

Bookmark

BibSonomy logo Reddit logo
Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
Links to Code Toggle
Papers with Code (What is Papers with Code?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos

Demos

Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers

Recommenders and Search Tools

Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.