Online Self-Calibration Against Hallucination in Vision-Language Models
arXiv cs.CV / 5/4/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses hallucinations in large vision-language models (LVLMs), where the model can invent visual details not present in the input image.
- It argues that existing offline preference-alignment approaches can suffer a “supervision–perception mismatch,” causing student models to learn to guess details they cannot truly perceive.
- The authors identify a “generative–discriminative gap” in LVLMs, noting that these models perform better at discriminative verification than in open-ended generation, and uses this to enable more reliable self-supervision.
- They propose OSCAR, an online self-calibration framework that uses Monte Carlo Tree Search plus a dual-granularity reward to build preference data and then iteratively refines the model via Direct Preference Optimization.
- Experiments show OSCAR delivers state-of-the-art results on hallucination benchmarks and also boosts broader multimodal capabilities.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

You Are Right — You Don't Need CLAUDE.md
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to