[MIT] RLCR: Teaching AI models to say "I'm not sure"

Reddit r/LocalLLaMA / 5/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

MIT CSAIL researchers identify overconfidence in state-of-the-art reasoning models as stemming from a specific training flaw that makes responses sound equally certain even when they are wrong.
They propose a method called RLCR that corrects this problem without sacrificing accuracy.
The work reframes confidence calibration as an outcome of the training process, aiming to teach models to express uncertainty (e.g., saying “I’m not sure”) when appropriate.
This approach targets a key reliability issue for persuasive AI systems, potentially improving how users interpret model outputs.

Confidence is persuasive. In AI systems, it is often misleading.

Today's most capable reasoning models share a trait with the loudest voice in the room: They deliver every answer with the same unshakable certainty, whether they're right or guessing. Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have now traced that overconfidence to a specific flaw in how these models are trained, and developed a method that fixes it without giving up any accuracy.

submitted by /u/Zyj
[link] [comments]