Delineating Knowledge Boundaries for Honest Large Vision-Language Models

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses that large vision-language models (VLMs) can hallucinate facts and often fail to refuse questions that fall outside their parametric knowledge, especially in long-tail or specialized areas.
  • It introduces a model-specific “Visual-Idk” dataset, built via multi-sample consistency probing, to separate known information from unknown/unanswerable queries.
  • The authors propose aligning VLM behavior using supervised fine-tuning followed by preference-aware optimization methods such as DPO or ORPO to better define and enforce knowledge boundaries.
  • Experiments on the Visual-Idk dataset show improved Truthful Rate from 57.9% to 67.3%, and additional internal probing suggests the model understands its limits rather than merely learning refusal templates.
  • The approach generalizes to out-of-distribution medical and perceptual settings, aiming to make visual assistants more trustworthy and cautious.

Abstract

Large Vision-Language Models (VLMs) have achieved remarkable multimodal performance yet remain prone to factual hallucinations, particularly in long-tail or specialized domains. Moreover, current models exhibit a weak capacity to refuse queries that exceed their parametric knowledge. In this paper, we propose a systematic framework to enhance the refusal capability of VLMs when facing such unknown questions. We first curate a model-specific "Visual-Idk" (Visual-I don't know) dataset, leveraging multi-sample consistency probing to distinguish between known and unknown facts. We then align the model using supervised fine-tuning followed by preference-aware optimization (e.g., DPO, ORPO) to effectively delineate its knowledge boundaries. Results on the Visual-Idk dataset show our method improves the Truthful Rate from 57.9\% to 67.3\%. Additionally, internal probing also demonstrates that the model genuinely recognizes its boundaries instead of just memorizing refusal patterns. Our framework further generalizes to out-of-distribution medical and perceptual domains, providing a robust path toward more trustworthy and prudent visual assistants.