Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty
arXiv cs.AI / 4/7/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how large language models behave as sequential decision policies in non-stationary reversal-learning tasks with switch events driven by performance criteria or timeouts.
- Across DeepSeek-V3.2, Gemini-3, and GPT-5.2, win-stay behavior is near ceiling while lose-shift is significantly weaker, indicating asymmetric reliance on positive versus negative evidence.
- Models show different adaptation profiles: DeepSeek-V3.2 exhibits strong perseveration and weak acquisition after reversals, whereas Gemini-3 and GPT-5.2 adapt faster but remain less sensitive to losses than humans.
- Introducing random transition schedules that increase volatility amplifies reversal-specific persistence without necessarily reducing overall win rates, suggesting rigid adaptation can coexist with high aggregate performance.
- Hierarchical RL analyses suggest rigidity may stem from weak loss learning, overly deterministic policies, or value polarization due to counterfactual suppression, motivating volatility-aware evaluation diagnostics for LLMs.
Related Articles

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to

Moving from proof of concept to production: what we learned with Nometria
Dev.to

Frontend Engineers Are Becoming AI Trainers
Dev.to