HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models

arXiv cs.CV / 3/26/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes HAM (Heterogeneous Attention Modulation), a training-free diffusion-model method for image/text-guided style reference transfer that aims to resolve the style–content identity trade-off.
It introduces a style noise initialization strategy to set latent noise for the diffusion process, followed by HAM that modulates different attention mechanisms to better preserve user content identity.
HAM includes two components—Global Attention Regulation (GAR) and Local Attention Transplantation (LAT)—intended to balance global style adherence with local detail retention.
Experiments (qualitative and quantitative) reportedly achieve state-of-the-art results across multiple metrics on style transfer tasks.
Overall, the work suggests that carefully designed attention modulation during inference can improve identity preservation without additional model training.

Abstract

Diffusion models have demonstrated remarkable performance in image generation, particularly within the domain of style transfer. Prevailing style transfer approaches typically leverage pre-trained diffusion models' robust feature extraction capabilities alongside external modular control pathways to explicitly impose style guidance signals. However, these methods often fail to capture complex style reference or retain the identity of user-provided content images, thus falling into the trap of style-content balance. Thus, we propose a training-free style transfer approach via

\textbf{h}

eterogeneous

\textbf{a}

ttention

\textbf{m}

odulation (

\textbf{HAM}

) to protect identity information during image/text-guided style reference transfer, thereby addressing the style-content trade-off challenge. Specifically, we first introduces style noise initialization to initialize latent noise for diffusion. Then, during the diffusion process, it innovatively employs HAM for different attention mechanisms, including Global Attention Regulation (GAR) and Local Attention Transplantation (LAT), which better preserving the details of the content image while capturing complex style references. Our approach is validated through a series of qualitative and quantitative experiments, achieving state-of-the-art performance on multiple quantitative metrics.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

I asked my AI agent to design a product launch image. Here's what came back.

Dev.to

HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

I asked my AI agent to design a product launch image. Here's what came back.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer