AceTone: Bridging Words and Colors for Conditional Image Grading

arXiv cs.CV / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

AceTone is presented as a new multimodal, unified framework for conditional image color grading that can be driven by both text prompts and reference images.
The method reformulates color grading as a generative transformation task that outputs 3D-LUTs, using a VQ-VAE tokenizer to compress LUTs into 64 discrete tokens while maintaining ΔE<2 fidelity.
The authors introduce the AceTone-800K large-scale dataset and train a vision-language model to predict LUT tokens, then apply reinforcement learning to better match perceptual fidelity and aesthetic preferences.
Experiments reportedly show state-of-the-art performance on text-guided and reference-guided grading, including up to a 50% improvement in LPIPS versus prior methods.
Human evaluations indicate the generated color styles are visually pleasing and stylistically coherent, positioning AceTone as a step toward language-driven, aesthetics-aligned color grading.

Abstract

Color affects how we interpret image style and emotion. Previous color grading methods rely on patch-wise recoloring or fixed filter banks, struggling to generalize across creative intents or align with human aesthetic preferences. In this study, we propose AceTone, the first approach that supports multimodal conditioned color grading within a unified framework. AceTone formulates grading as a generative color transformation task, where a model directly produces 3D-LUTs conditioned on text prompts or reference images. We develop a VQ-VAE based tokenizer which compresses a

3\times32^3

LUT vector to 64 discrete tokens with

\Delta E<2

fidelity. We further build a large-scale dataset, AceTone-800K, and train a vision-language model to predict LUT tokens, followed by reinforcement learning to align outputs with perceptual fidelity and aesthetics. Experiments show that AceTone achieves state-of-the-art performance on both text-guided and reference-guided grading tasks, improving LPIPS by up to 50% over existing methods. Human evaluations confirm that AceTone's results are visually pleasing and stylistically coherent, demonstrating a new pathway toward language-driven, aesthetic-aligned color grading.

Black Hat Asia

AI Business

Unitree's IPO

ChinaTalk

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖

Dev.to

Benchmarking Batch Deep Reinforcement Learning Algorithms

Dev.to

A bug in Bun may have been the root cause of the Claude Code source code leak.

Reddit r/LocalLLaMA

AceTone: Bridging Words and Colors for Conditional Image Grading

Key Points

Abstract

Related Articles

Black Hat Asia

Unitree's IPO

Did you know your GIGABYTE laptop has a built-in AI coding assistant? Meet GiMATE Coder 🤖

Benchmarking Batch Deep Reinforcement Learning Algorithms

A bug in Bun may have been the root cause of the Claude Code source code leak.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer