Q-Tacit:潜在視覚推論による画像品質評価

arXiv cs.CV / 2026/3/25

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • 本論文は、VLM(視覚言語モデル)ベースの画像品質評価における新しいパラダイムであるQ-Tacitを提案し、推論を自然言語から潜在の品質空間へ移すことを目的としています。
  • 品質知覚において、言語は不適切な表現になり得ると主張しています。理由は、視覚的な品質手がかりを離散的なテキストトークンへ抽象化するのが難しいためです。
  • Q-Tacitは2段階手法を用います。すなわち、構造的な視覚品質の事前知識を潜在空間へ注入し、さらに潜在推論の軌跡を較正して評価の品質を向上させます。
  • 実験により、Q-Tacitは、従来の連鎖的な思考(chain-of-thought)スタイルの推論手法よりも大幅に少ないトークン数で、全体として強力な画像品質の推論性能を達成することが示されています。
  • 著者らは、IQAにおける潜在視覚推論アプローチに関するさらなる研究を可能にするため、ソースコードを公開する予定だと述べています。

Abstract

Vision-Language Model (VLM)-based image quality assessment (IQA) has been significantly advanced by incorporating Chain-of-Thought (CoT) reasoning. Recent work has refined image quality reasoning by applying reinforcement learning (RL) and leveraging active visual tools. However, such strategies are typically language-centric, with visual information being treated as static preconditions. Quality-related visual cues often cannot be abstracted into text in extenso due to the gap between discrete textual tokens and quality perception space, which in turn restricts the reasoning effectiveness for visually intensive IQA tasks. In this paper, we revisit this by asking the question, "Is natural language the ideal space for quality reasoning?" and, as a consequence, we propose Q-Tacit, a new paradigm that elicits VLMs to reason beyond natural language in the latent quality space. Our approach follows a synergistic two-stage process: (i) injecting structural visual quality priors into the latent space, and (ii) calibrating latent reasoning trajectories to improve quality assessment ability. Extensive experiments demonstrate that Q-Tacit can effectively perform quality reasoning with significantly fewer tokens than previous reasoning-based methods, while achieving strong overall performance. This paper validates the proposition that language is not the only compact representation suitable for visual quality, opening possibilities for further exploration of effective latent reasoning paradigms for IQA. Source code will be released to support future research.