Auditing Support Strategies in LLMs through Grounded Multi-Turn Social Simulation

arXiv cs.CL / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that current evaluations of “social support” LLMs often use single-turn prompts, even though real users reveal their situation gradually over multiple turns.
It proposes a multi-turn social simulation framework that reveals ordered fragments of Reddit users’ support-seeking narratives turn by turn, coding each response using the Social Support Behavior Code (SSBC) instead of a single quality score.
Using linear probes on hidden representations (without changing the generation context), the study tests whether the model’s support choices track the model-internal estimate of user distress.
Experiments on Llama-3.1-8B and OLMo-3-7B over 6,200+ turns show systematic behavior shifts with estimated distress: teaching strategies decrease as distress increases, while affective/esteem-oriented strategies show suggestive but model-specific increases.
The authors find that community context also independently affects support behavior by reflecting topic and discourse norms rather than demographic categories, motivating multi-turn auditing for socially sensitive LLM applications.

Abstract

When users seek social support from chatbots, they disclose their situation gradually, yet most evaluations of supportive LLMs rely on single-turn, fully specified prompts. We introduce a multi-turn simulation framework that closes this gap. Support-seeking narratives from five Reddit communities are decomposed into ordered fragments and revealed turn by turn to a language model. Each response is coded with the Social Support Behavior Code (SSBC), an established multi-label taxonomy that captures the composition of support, rather than a single quality score. To ask whether support choices track the model's own construal of user distress, we use linear probes on hidden representations to estimate this internal signal without altering the generation context. Across two mid-scale models (Llama-3.1-8B, OLMo-3-7B) and more than 6,200 turns, support composition shifts systematically with estimated distress: teaching declines as estimated distress rises, a finding that replicates across architectures, while increases in affective and esteem-oriented strategies (such as validation) are suggestive but model-specific and rest on noisier annotations. Community context independently shapes behavior, tracking topic and discourse norms rather than demographic categories. These trajectory-level dynamics, invisible to single-turn evaluation, motivate multi-turn auditing frameworks for socially sensitive applications.