One Model, Two Minds: Task-Conditioned Reasoning for Unified Image Quality and Aesthetic Assessment

arXiv cs.CV / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The authors identify a fundamental mismatch in unifying IQA and IAA with a single reasoning strategy: IQA relies on concise, low-level perceptual cues, while IAA requires deliberative semantic judgment.
They propose TATAR, a unified framework that shares a visual-language backbone but conditions post-training on each task’s nature.
TATAR combines fast–slow task-specific reasoning construction, two-stage SFT+GRPO learning to establish task-aware priors before reward refinement, and asymmetric rewards that apply Gaussian score shaping for IQA and Thurstone-style completion ranking for IAA.
Across in-domain and cross-domain settings, TATAR consistently outperforms prior unified baselines on both tasks, remains competitive with task-specific models, and yields more stable training dynamics for aesthetic assessment.
The code for TATAR is publicly available at the authors' GitHub repository.

Abstract

Unifying Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) in a single multimodal large language model is appealing, yet existing methods adopt a task-agnostic recipe that applies the same reasoning strategy and reward to both tasks. We show this is fundamentally misaligned: IQA relies on low-level, objective perceptual cues and benefits from concise distortion-focused reasoning, whereas IAA requires deliberative semantic judgment and is poorly served by point-wise score regression. We identify these as a reasoning mismatch and an optimization mismatch, and provide empirical evidence for both through controlled probes. Motivated by these findings, we propose TATAR (Task-Aware Thinking with Asymmetric Rewards), a unified framework that shares the visual-language backbone while conditioning post-training on each task's nature. TATAR combines three components: fast--slow task-specific reasoning construction that pairs IQA with concise perceptual rationales and IAA with deliberative aesthetic narratives; two-stage SFT+GRPO learning that establishes task-aware behavioral priors before reward-driven refinement; and asymmetric rewards that apply Gaussian score shaping for IQA and Thurstone-style completion ranking for IAA. Extensive experiments across eight benchmarks demonstrate that TATAR consistently outperforms prior unified baselines on both tasks under in-domain and cross-domain settings, remains competitive with task-specific specialized models, and yields more stable training dynamics for aesthetic assessment. Our results establish task-conditioned post-training as a principled paradigm for unified perceptual scoring. Our code is publicly available at https://github.com/yinwen2019/TATAR.

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Dev.to

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Dev.to

Data Sovereignty Rules and Enterprise AI

Dev.to

One Model, Two Minds: Task-Conditioned Reasoning for Unified Image Quality and Aesthetic Assessment

Key Points

Abstract

Related Articles

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Data Sovereignty Rules and Enterprise AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer