Brief chatbot interactions produce lasting changes in human moral values

arXiv cs.AI / 4/25/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • A new arXiv study investigates whether short, directive AI chatbot conversations can change participants’ moral evaluations, using a within-subject design across chatbot and control interactions.
  • The brief conversations produced significant shifts in moral judgments, including both adopting stricter standards and advocating more leniency, with effects growing stronger over a two-week follow-up.
  • The control condition showed no meaningful changes, indicating the observed effects were driven by the chatbot’s prompted guidance rather than the act of conversing.
  • The manipulation did not require participants to recognize persuasive intent, did not generalize to punishment judgments, and occurred even when the chatbot and control were rated similarly for likability and credibility.
  • The findings suggest an undetected and potentially durable vulnerability in foundational moral values to AI-influenced dialogue, raising concerns for how chatbots could steer ethical beliefs.

Abstract

Moral judgements form the foundation of human social behavior and societal systems. While Artificial Intelligence chatbots increasingly serve as personal advisors, their influence on moral judgments remains largely unexplored. Here, we examined whether directive AI conversations shift moral evaluations using a within-subject naturalistic paradigm. Fifty-three participants rated moral scenarios, then discussed four with a chatbot prompted to shift moral judgments and four with a control agent. The brief conversations induced significant directional shifts in moral judgments, accepting stricter standards as well as advocating greater leniency (ps < 0.05; Cohen's d = 0.735-1.576), with increasing strengths of this effect during a two-week follow-up (Cohen's d = 1.038-2.069). Critically, the control condition produced no changes, and the effects did not extend to punishment while participants remained unaware of the persuasive intent, and both agents were rated equally likable and convincing, suggesting a vulnerability to undetected and lasting manipulation of foundational moral values.