Rethinking VLMs for Image Forgery Detection and Localization
arXiv cs.CV / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a new IFDL pipeline called IFDL-VLM that uses vision-language models to assist image forgery detection and localization.
- It shows that priors from vision-language models often hurt performance due to biases toward semantic plausibility rather than authenticity.
- It reveals that location masks encode forgery concepts and can serve as extra priors to facilitate training and improve interpretability of results.
- It reports experiments on 9 benchmarks with in-domain and cross-dataset generalization, achieving new state-of-the-art performance in detection, localization, and interpretability, with code available.




