VeraRetouch: A Lightweight Fully Differentiable Framework for Multi-Task Reasoning Photo Retouching

arXiv cs.CV / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper proposes VeraRetouch, a lightweight, fully differentiable multi-task reasoning framework for photo retouching that can jointly analyze defects, produce reasoning plans, and apply precise edits.
It uses a compact 0.5B vision-language model to generate retouching plans from instructions and scene semantics, and replaces external non-differentiable tools with a fully differentiable Retouch Renderer for end-to-end pixel-level training.
The Retouch Renderer is trained with decoupled control latents for lighting, global color, and targeted color adjustments, reducing optimization barriers and parameter redundancy while improving generalization.
To address limited data, the authors introduce AetherRetouch-1M+, a million-scale dataset for professional retouching created via a new inverse degradation workflow, and they add DAPO-AE, a reinforcement learning post-training method for better autonomous aesthetic cognition.
Experiments reportedly show state-of-the-art results on multiple benchmarks with a much smaller model footprint, supporting mobile deployment, and the code/models are released on GitHub.

Abstract

Reasoning photo retouching has gained significant traction, requiring models to analyze image defects, give reasoning processes, and execute precise retouching enhancements. However, existing approaches often rely on non-differentiable external software, creating optimization barriers and suffering from high parameter redundancy and limited generalization. To address these challenges, we propose VeraRetouch, a lightweight and fully differentiable framework for multi-task photo retouching. We employ a 0.5B Vision-Language Model (VLM) as the central intelligence to formulate retouching plans based on instructions and scene semantics. Furthermore, we develop a fully differentiable Retouch Renderer that replaces external tools, enabling direct end-to-end pixel-level training through decoupled control latents for lighting, global color, and specific color adjustments. To overcome data scarcity, we introduce AetherRetouch-1M+, the first million-scale dataset for professional retouching, constructed via a new inverse degradation workflow. Furthermore, we propose DAPO-AE, a reinforcement learning post-training strategy that enhances autonomous aesthetic cognition. Extensive experiments demonstrate that VeraRetouch achieves state-of-the-art performance across multiple benchmarks while maintaining a significantly smaller footprint, enabling mobile deployment. Our code and models are publicly available at https://github.com/OpenVeraTeam/VeraRetouch.

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest

Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.

Dev.to

Automating FDA Compliance: AI for Specialty Food Producers

Dev.to

VeraRetouch: A Lightweight Fully Differentiable Framework for Multi-Task Reasoning Photo Retouching

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Announcing the NVIDIA Nemotron 3 Super Build Contest

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.

Automating FDA Compliance: AI for Specialty Food Producers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer