Recognition Without Authorization: LLMs and the Moral Order of Online Advice

arXiv cs.CL / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how assistant-style LLMs’ “advisory defaults” interact with the tightly codified moral norms of online communities, using r/relationship_advice as a vote-ratified reference point.
  • Across four LLMs evaluated on 11,565 subreddit posts, models often recognize the same underlying dynamics as human commenters but are substantially less likely to translate that recognition into action-authorizing directives.
  • The discrepancy is largest on high-consensus posts involving abuse or safety threats, where the models recommend “exit” at about half the rate of human advice while still using strong hedging, validation, and therapeutic framing.
  • The authors propose the pattern “recognition without authorization,” arguing it is structural—driven by portable, risk-averse, weakly directive assistant norms—potentially influenced by safety alignment, training-data averaging, and assistant design choices.
  • The work reframes model divergence as a lens on how standardized assistant behaviors flatten when they encounter context-specific moral orders, rather than as a purely technical error.

Abstract

Large language models are increasingly used to mediate everyday interpersonal dilemmas, yet how their advisory defaults interact with the concentrated moral orders of specific communities remains poorly understood. This article compares four assistant-style LLMs with community-endorsed advice on 11,565 posts from r/relationship_advice, using the subreddit as a concentrated, vote-ratified moral formation whose prescriptive clarity makes divergence measurable. Across models, LLMs identify many of the same dynamics as human commenters, but are markedly less likely to convert that recognition into directive authorization for action. The gap is sharpest where community consensus is strongest: on high-consensus posts involving abuse or safety threats, models recommend exit at roughly half the human rate while maintaining elevated levels of hedging, validation, and therapeutic framing. The article describes this pattern as recognition without authorization: the capacity to register harm while withholding socially ratified permission for consequential action. This divergence is not incidental but structural: a portable advisory style that remains validating, risk-averse, and weakly directive across contexts. Safety alignment is one plausible contributor to this pattern, alongside training-data averaging and broader assistant design. The article argues that model divergence can be reframed from a technical error to a way of seeing what standardized assistant norms flatten when they encounter situated moral worlds.