Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
arXiv cs.CL / 4/22/2026
💬 OpinionModels & Research
Key Points
- The study explores how LLMs perform “repair” during multi-turn dialogues in math question settings, comparing model-initiated versus user-initiated repairs.
- Results show large performance differences across LLMs, ranging from being largely resistant to appropriate repair to being overly susceptible and easily manipulated.
- As dialogues extend beyond a single turn, model behavior becomes more distinctive and less predictable between different systems.
- The paper concludes that each tested LLM has a characteristic kind of unreliability specifically related to conversational repair.
Related Articles

Rethinking CNN Models for Audio Classification
Dev.to
v0.20.0rc1
vLLM Releases
I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw
Dev.to

HNHN: Hypergraph Networks with Hyperedge Neurons
Dev.to

Anthropic’s Mythos is stoking cybersecurity fears. What does it mean for China?
SCMP Tech