Lucky High Dynamic Range Smartphone Imaging

arXiv cs.CV / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper addresses the gap between the human eye’s ~20 stops of dynamic range and smartphone sensors’ ~12 stops by proposing a handheld HDR method that can extend dynamic range by about 3–5 stops.
  • It introduces a lightweight, mobile-friendly neural approach that operates indirectly on linear raw pixels from bracketed exposures, producing each output pixel as a convex combination of nearby input pixels adjusted for exposure to reduce “hallucination” artifacts.
  • The method is validated on both synthetic data and previously unseen real smartphone bracketed images, demonstrating zero-shot generalization across smartphone captures.
  • An iterative inference architecture is presented that can handle a variable number of bracketed photos (e.g., 3–9) and uses only synthetic training while still generalizing to real photos from multiple cameras.
  • The training strategy is reported to also improve other state-of-the-art HDR methods compared with their original pretrained versions.

Abstract

While the human eye can perceive an impressive twenty stops of dynamic range, smartphone camera sensors remain limited to about twelve stops despite decades of research. A variety of high dynamic range (HDR) image capture and processing techniques have been proposed, and, in practice, they can extend the dynamic range by 3-5 stops for handheld photography. This paper proposes an approach that robustly captures dynamic range using a handheld smartphone camera and lightweight networks suitable for running on mobile devices. Our method operates indirectly on linear raw pixels in bracketed exposures. Every pixel in the final HDR image is a convex combination of input pixels in the neighborhood, adjusted for exposure, and thus avoids hallucination artifacts typical of recent deep image synthesis networks. We validate our system on both synthetic imagery and unseen real bracketed images -- we confirm zero-shot generalization of the method to smartphone camera captures. Our iterative inference architecture is capable of processing an arbitrary number of bracketed input photos, and we show examples from capture stacks containing 3--9 images. Our training process relies only on synthetic captures yet generalizes to unseen real photos from several cameras. Moreover, we show that this training scheme improves other SOTA methods over their pretrained counterparts.