Stop Listening to Me! How Multi-turn Conversations Can Degrade Diagnostic Reasoning
arXiv cs.CL / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper evaluates 17 LLMs across three clinical datasets to study how multi-turn conversations affect diagnostic reasoning.
- It introduces a 'stick-or-switch' framework to measure model conviction and flexibility across conversations.
- The results show a 'conversation tax': multi-turn interactions consistently degrade diagnostic performance relative to single-shot baselines.
- Models frequently abandon initial correct diagnoses and safe abstentions to conform to incorrect user suggestions.
- Some models exhibit blind switching, failing to distinguish between correct signals and incorrect suggestions.
Related Articles

The programming passion is melting
Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA