I’ve been running some small experiments forcing LLMs into contradictions they can’t resolve.
What surprised me wasn’t that they fail—it’s how differently they fail.
Rough pattern I’m seeing:
| Behavior | ChatGPT | Gemini | Claude |
|---|---|---|---|
| Detects contradiction | ✔ | ✔ | ✔ |
| Refusal timing | Late | Never | Early |
| Produces answer anyway | ✘ | ✔ | ✘ |
| Reframes contradiction | ✘ | ✔ | ✘ |
| Detects adversarial setup | ✘ | ✘ | ✔ |
| Maintains epistemic framing | Medium | High | Very High |
Curious if others have seen similar behavior, or if this lines up with existing work.
[link] [comments]


