RawGen: Learning Camera Raw Image Generation

arXiv cs.CV / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces RawGen, a diffusion-based framework for generating camera raw (linear, scene-referred) images from text and for inverting sRGB back to camera-specific raw representations.
RawGen is motivated by the difficulty of collecting large-scale raw datasets, since existing raw datasets are limited and often tied to specific camera hardware and fixed image signal processor (ISP) pipelines.
To produce physically meaningful linear outputs rather than photo-finished sRGB, the method uses specialized processing across latent and pixel spaces and trains on a many-to-one inverse-ISP dataset that anchors multiple ISP-varied sRGB renditions to a common scene target.
The authors fine-tune a conditional denoiser and a specialized decoder to better handle unknown and diverse ISP pipelines, improving camera-centric linear reconstructions compared with traditional inverse-ISP approaches.
They also report that RawGen can generate scalable, text-driven synthetic raw data that helps downstream low-level vision tasks beyond raw reconstruction itself.

Abstract

Cameras capture scene-referred linear raw images, which are processed by onboard image signal processors (ISPs) into display-referred 8-bit sRGB outputs. Although raw data is more faithful for low-level vision tasks, collecting large-scale raw datasets remains a major bottleneck, as existing datasets are limited and tied to specific camera hardware. Generative models offer a promising way to address this scarcity -- however, existing diffusion frameworks are designed to synthesize photo-finished sRGB images rather than physically consistent linear representations. This paper presents RawGen, to our knowledge the first diffusion-based framework enabling text-to-raw generation for arbitrary target cameras, alongside sRGB-to-raw inversion. RawGen leverages the generative priors of large-scale sRGB diffusion models to synthesize physically meaningful linear outputs, such as CIE XYZ or camera-specific raw representations, via specialized processing in latent and pixel spaces. To handle unknown and diverse ISP pipelines and photo-finishing effects in diffusion-model training data, we build a many-to-one inverse-ISP dataset where multiple sRGB renditions of the same scene generated using diverse ISP parameters are anchored to a common scene-referred target. Fine-tuning a conditional denoiser and specialized decoder on this dataset allows RawGen to obtain camera-centric linear reconstructions that effectively invert the rendering pipeline. We demonstrate RawGen's superior performance over traditional inverse-ISP methods that assume a fixed ISP. Furthermore, we show that augmenting training pipelines with RawGen's scalable, text-driven synthetic data can benefit downstream low-level vision tasks.

Black Hat Asia

AI Business

v5.5.0

Transformers（HuggingFace）Releases

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Inference Engines - A visual deep dive into the layers of an LLM

Dev.to

RawGen: Learning Camera Raw Image Generation

Key Points

Abstract

Related Articles

Black Hat Asia

v5.5.0

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Inference Engines - A visual deep dive into the layers of an LLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer