Towards In-Context Tone Style Transfer with A Large-Scale Triplet Dataset

arXiv cs.CV / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses limitations in tone style transfer for photo retouching, where a lack of high-quality large-scale triplet datasets has forced models to rely on weaker self-supervised or proxy objectives.
It introduces TST100K, a new large-scale dataset of 100,000 content–reference–stylized triplets, built via a data construction pipeline that uses a trained tone style scorer to enforce strict stylistic consistency.
To improve visual and semantic quality, the authors propose ICTone, a diffusion-based in-context tone transfer framework that jointly conditions on both images to better preserve semantics and avoid issues like inappropriate color transfer.
The approach further uses reward feedback learning driven by the tone style scorer to enhance stylistic fidelity and overall visual aesthetics, achieving state-of-the-art results in both quantitative metrics and human evaluations.

Abstract

Tone style transfer for photo retouching aims to adapt the stylistic tone of the reference image to a given content image. However, the lack of high-quality large-scale triplet datasets with stylized ground truth forces existing methods to rely on self-supervised or proxy objectives, which limits model capability. To mitigate this gap, we design a data construction pipeline to build TST100K, a large-scale dataset of 100,000 content-reference-stylized triplets. At the core of this pipeline, we train a tone style scorer to ensure strict stylistic consistency for each triplet. In addition, existing methods typically extract content and reference features independently and then fuse them in a decoder, which may cause semantic loss and lead to inappropriate color transfer and degraded visual aesthetics. Instead, we propose ICTone, a diffusion-based framework that performs tone transfer in an in-context manner by jointly conditioning on both images, leveraging the semantic priors of generative models for semantic-aware transfer. Reward feedback learning using the tone style scorer is further incorporated to improve stylistic fidelity and visual quality. Experiments demonstrate the effectiveness of TST100K, and ICTone achieves state-of-the-art performance on both quantitative metrics and human evaluations.

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

Dev.to

Towards In-Context Tone Style Transfer with A Large-Scale Triplet Dataset

Key Points

Abstract

Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer