NumColor: Precise Numeric Color Control in Text-to-Image Generation

arXiv cs.CV / 3/17/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The article identifies that diffusion models struggle with precise numeric colors because subword tokenization fragments color codes into meaningless tokens.
NumColor introduces a Color Token Aggregator and a ColorBook containing 6,707 learnable embeddings that map colors into the text encoder's perceptually uniform CIE Lab space to enable accurate color control.
It uses two auxiliary losses, directional alignment and interpolation consistency, to enforce a geometric mapping between Lab space and the embedding space, enabling smooth color interpolation.
A synthetic dataset, NumColor-Data, with 500,000 images provides unambiguous color-to-pixel correspondence to train the ColorBook, avoiding annotation ambiguity from photographs.
NumColor transfers zero-shot to multiple diffusion models (e.g., SD3, SD3.5, PixArt-α, PixArt-Σ) and delivers 4-9x improvements in numerical color accuracy and 10-30x improvements in color harmony on GenColorBench.

Abstract

Text-to-image diffusion models excel at generating images from natural language descriptions, yet fail to interpret numerical colors such as hex codes (#FF5733) and RGB values (rgb(255,87,51)). This limitation stems from subword tokenization, which fragments color codes into semantically meaningless tokens that text encoders cannot map to coherent color representations. We present NumColor, that enables precise numerical color control across multiple diffusion architectures. NumColor comprises two components: a Color Token Aggregator that detects color specifications regardless of tokenization, and a ColorBook containing 6,707 learnable embeddings that map colors to embedding space of text encoder in perceptually uniform CIE Lab space. We introduce two auxiliary losses, directional alignment and interpolation consistency, to enforce geometric correspondence between Lab and embedding spaces, enabling smooth color interpolation. To train the ColorBook, we construct NumColor-Data, a synthetic dataset of 500K rendered images with unambiguous color-to-pixel correspondence, eliminating the annotation ambiguity inherent in photographic datasets. Although trained solely on FLUX, NumColor transfers zero-shot to SD3, SD3.5, PixArt-{\alpha}, and PixArt-{\Sigma} without model-specific adaptation. NumColor improves numerical color accuracy by 4-9x across five models, while simultaneously improving color harmony scores by 10-30x on GenColorBench benchmark.

Astral to Join OpenAI

Dev.to

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic

Dev.to

Your AI coding agent is installing vulnerable packages. I built the fix.

Dev.to

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

NumColor: Precise Numeric Color Control in Text-to-Image Generation

Key Points

Abstract

Related Articles

Astral to Join OpenAI

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic

Your AI coding agent is installing vulnerable packages. I built the fix.

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer