Real Image Denoising with Knowledge Distillation for High-Performance Mobile NPUs
arXiv cs.CV / 5/6/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The paper proposes an NPU-aware hardware–algorithm co-design method for real-world image denoising on mobile NPUs, addressing operator incompatibility and memory-access overhead.
- It uses knowledge distillation from a high-capacity teacher to train a lightweight “student” model (LiteDenoiseNet) optimized for tiled-memory SoC architectures.
- By restricting the network to NPU-native primitives (e.g., 3x3 convolutions, ReLU, nearest-neighbor upsampling) and applying progressive context expansion up to 1024x1024 crops, it achieves strong benchmark PSNR/SSIM scores at full resolution.
- Runtime results under a standardized Full HD protocol show 34.0 ms on MediaTek Dimensity 9500 and 46.1 ms on Qualcomm Snapdragon 8 Elite, with an “Inference Inversion” effect where NPU-compatible design yields up to 3.88× faster dedicated NPU execution than the integrated mobile GPU.
- The 1.96M-parameter student recovers 99.8% of the teacher’s quality using high-alpha knowledge distillation (alpha=0.9), reaching a 21.2× parameter reduction while reducing the PSNR gap to just 0.05 dB; training stats and the model are released via an NN Dataset repository.
Related Articles

Black Hat USA
AI Business

Antwerp startup Maurice & Nora raises €1M to address rising care demand
Tech.eu

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide
Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw
Dev.to

PaioClaw Review: What You Actually Get for $15/mo vs DIY OpenClaw
Dev.to