R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

arXiv cs.CV / 4/23/2026

📰 NewsModels & Research

Key Points

  • The paper introduces R-CoV (Region-aware Chain-of-Verification), a post-hoc method to reduce object hallucinations in large vision-language models (LVLMs) by encouraging region-level reasoning.
  • R-CoV prompts LVLMs to extract entities, generate coordinates, describe image regions, and then run an internal verification step to check whether claimed objects are supported.
  • The approach is training-free and can be integrated across multiple LVLMs without relying on external object detection models.
  • Experiments on several common hallucination benchmarks show that R-CoV significantly alleviates object hallucinations across different LVLMs.
  • The method uses a six-step pipeline—initial response, entity extraction, coordinate generation, region description, verification execution, and final response generation—to improve the reliability of visual claims.

Abstract

Large vision-language models (LVLMs) have demonstrated impressive performance in various multimodal understanding and reasoning tasks. However, they still struggle with object hallucinations, i.e., the claim of nonexistent objects in the visual input. To address this challenge, we propose Region-aware Chain-of-Verification (R-CoV), a visual chain-of-verification method to alleviate object hallucinations in LVLMs in a post-hoc manner. Motivated by how humans comprehend intricate visual information -- often focusing on specific image regions or details within a given sample -- we elicit such region-level processing from LVLMs themselves and use it as a chaining cue to detect and alleviate their own object hallucinations. Specifically, our R-CoV consists of six steps: initial response generation, entity extraction, coordinate generation, region description, verification execution, and final response generation. As a simple yet effective method, R-CoV can be seamlessly integrated into various LVLMs in a training-free manner and without relying on external detection models. Extensive experiments on several widely used hallucination benchmarks across multiple LVLMs demonstrate that R-CoV can significantly alleviate object hallucinations in LVLMs. Project page: https://github.com/Jiahao000/R-CoV.