DistortBench: Benchmarking Vision Language Models on Image Distortion Identification

arXiv cs.CV / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces DistortBench, a diagnostic no-reference benchmark to test how well vision-language models (VLMs) identify image distortion type and severity.
DistortBench includes 13,500 four-choice questions spanning 27 distortion types, grouped into six perceptual categories and five severity levels, with 25 distortions based on KADID-10k calibrations plus two added rotation distortions.
The authors evaluate 18 VLMs (17 open-weight models from five families and one proprietary model), finding that even the top model achieves only 61.9% accuracy versus a human majority-vote baseline of 65.7%.
Analysis shows limited and non-monotonic scaling with model size, performance degradation in most “base–thinking” pairs, and different severity-response behaviors across model families.
The authors position DistortBench as a tool to measure and improve VLMs’ low-level visual perception capabilities, which remain a key weakness.
Despite VLMs’ strengths on high-level multimodal tasks, they struggle with low-level distortion perception, highlighting a gap for future model improvement.

Abstract

Vision-language models (VLMs) are increasingly used in settings where sensitivity to low-level image degradations matters, including content moderation, image restoration, and quality monitoring. Yet their ability to recognize distortion type and severity remains poorly understood. We present DistortBench, a diagnostic benchmark for no-reference distortion perception in VLMs. DistortBench contains 13,500 four-choice questions covering 27 distortion types, six perceptual categories, and five severity levels: 25 distortions inherit KADID-10k calibrations, while two added rotation distortions use monotonic angle-based levels. We evaluate 18 VLMs, including 17 open-weight models from five families and one proprietary model. Despite strong performance on high-level vision-language tasks, the best model reaches only 61.9% accuracy, just below the human majority-vote baseline of 65.7% (average individual: 60.2%), indicating that low-level perceptual understanding remains a major weakness of current VLMs. Our analysis further reveals weak and non-monotonic scaling with model size, performance drops in most base--thinking pairs, and distinct severity-response patterns across model families. We hope DistortBench will serve as a useful benchmark for measuring and improving low-level visual perception in VLMs.

The anti-AI crowd is giving “real farmers don’t use tractors” energy, and it’s getting old.

Dev.to

Training ChatGPT on Private Data: A Technical Reference

Dev.to

The Rise of Intelligent Software: How AI is Reshaping Modern Product Development

Dev.to

The Anatomy of a Modern AI Marketing Curriculum in 2026 — What It Covers and Why It Matters

Dev.to

AI as a Fascist Artifact

Dev.to

DistortBench: Benchmarking Vision Language Models on Image Distortion Identification

Key Points

Abstract

Related Articles

The anti-AI crowd is giving “real farmers don’t use tractors” energy, and it’s getting old.

Training ChatGPT on Private Data: A Technical Reference

The Rise of Intelligent Software: How AI is Reshaping Modern Product Development

The Anatomy of a Modern AI Marketing Curriculum in 2026 — What It Covers and Why It Matters

AI as a Fascist Artifact

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer