BurstGP: Enhancing Raw Burst Image Super Resolution with Generative Priors

arXiv cs.CV / 4/28/2026

📰 NewsModels & Research

共有:

Key Points

Burst image super-resolution (BISR) improves a single high-resolution image by aggregating multiple low-resolution frames, but existing approaches can struggle with complex textures and produce oversmoothing.
The paper introduces BurstGP, a diffusion-model-based BISR method that incorporates generative priors from recent foundation models to enhance realism while preserving fidelity.
BurstGP extends a conventional multiframe-aware BISR pipeline with a degradation-aware conditioning mechanism that tailors the generation of fine details based on the estimated input degradation.
It also proposes a robust sRGB-to-lRGB inverter, allowing the use of generative multiframe (video) sRGB priors while still processing raw inputs and producing lRGB outputs.
Experiments show BurstGP outperforms prior state of the art in both quantitative perceptual metrics (e.g., MUSIQ, LPIPS) and qualitative results, especially for recovering richer textures and structural details.

Abstract

Burst image super resolution (BISR) aims to construct a single high-resolution (HR) image by aggregating information from multiple low-resolution (LR) frames, relying on temporal redundancy and spatial coherence across the burst. While conventional methods achieve impressive results, they often struggle with complex textures and oversmoothing. Diffusion models, particularly those pretrained on high-quality data, have shown remarkable capability in generating realistic details for image and video super-resolution. However, their potential remains largely under-explored in BISR, where existing approaches typically rely on task-specific diffusion models trained from scratch and operate on single-frame reconstructions. In this work, we propose BurstGP, a novel diffusion-based solution for BISR, which leverages generative priors of recent foundation models to overcome these issues. In particular, we build a multiframe-aware diffusion model on top of a conventional BISR approach, which boosts image quality with minimal loss to fidelity. Further, we introduce (i) a novel degradation-aware conditioning mechanism, which controls synthesis of fine details based on the estimated degradation in the input, and (ii) a robust sRGB-to-lRGB inverter, enabling us to utilize generative multiframe (video) sRGB priors, while operating with raw input and lRGB output images. Empirically, we demonstrate that BurstGP outperforms the existing state of the art, both quantitatively (especially with respect to perceptual metrics, including MUSIQ and LPIPS) and qualitatively. In particular, our proposed method excels at recovering richer textures and finer structural details, highlighting the potential of video priors for BISR over traditional methods.