[MIT] RLCR: Teaching AI models to say "I'm not sure"

Reddit r/LocalLLaMA / 5/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • MIT CSAIL researchers identify overconfidence in state-of-the-art reasoning models as stemming from a specific training flaw that makes responses sound equally certain even when they are wrong.
  • They propose a method called RLCR that corrects this problem without sacrificing accuracy.
  • The work reframes confidence calibration as an outcome of the training process, aiming to teach models to express uncertainty (e.g., saying “I’m not sure”) when appropriate.
  • This approach targets a key reliability issue for persuasive AI systems, potentially improving how users interpret model outputs.

Confidence is persuasive. In AI systems, it is often misleading.

Today's most capable reasoning models share a trait with the loudest voice in the room: They deliver every answer with the same unshakable certainty, whether they're right or guessing. Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have now traced that overconfidence to a specific flaw in how these models are trained, and developed a method that fixes it without giving up any accuracy.

submitted by /u/Zyj
[link] [comments]