Efficient INT8 Single-Image Super-Resolution via Deployment-Aware Quantization and Teacher-Guided Training

arXiv cs.CV / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes a deployment-aware INT8 quantized single-image super-resolution (x3 SR) framework that minimizes inference complexity by performing most computation in low-resolution space and using a lightweight re-parameterizable backbone with PixelShuffle reconstruction.
It introduces a three-stage training pipeline that progressively improves reconstruction quality using spatial supervision, Charbonnier and DCT-domain losses, and confidence-weighted distillation from a Mamba-based teacher.
The method applies quantization-aware training directly on the fused deploy graph, further stabilizing INT8 quantization via weight clipping and BatchNorm recalibration.
On the MAI 2026 Quantized 4K Image Super-Resolution Challenge test set, the authors report 29.79 dB PSNR and 0.8634 SSIM, with a final mobile target INT8 submission score of 1.8.
Ablation results indicate that teacher-guided supervision materially improves dynamic INT8 TFLite reconstruction performance, and that the fixed-shape deployable INT8 TFLite artifact achieves the highest reported metrics in the study.

Abstract

Efficient single-image super-resolution (SISR) requires balancing reconstruction fidelity, model compactness, and robustness under low-bit deployment, which is especially challenging for x3 SR. We present a deployment-oriented quantized SISR framework based on an extract-refine-upsample design. The student performs most computation in the low-resolution space and uses a lightweight re-parameterizable backbone with PixelShuffle reconstruction, yielding a compact inference graph. To improve quality without significantly increasing complexity, we adopt a three-stage training pipeline: Stage 1 learns a basic reconstruction mapping with spatial supervision; Stage 2 refines fidelity using Charbonnier loss, DCT-domain supervision, and confidence-weighted output-level distillation from a Mamba-based teacher; and Stage 3 applies quantization-aware training directly on the fused deploy graph. We further use weight clipping and BatchNorm recalibration to improve quantization stability. On the MAI 2026 Quantized 4K Image Super-Resolution Challenge test set, our final AIO MAI submission achieves 29.79 dB PSNR and 0.8634 SSIM, obtaining a final score of 1.8 under the target mobile INT8 deployment setting. Ablation on Stage 3 optimization shows that teacher-guided supervision improves the dynamic INT8 TFLite reconstruction from 29.91 dB/0.853 to 30.0003 dB/0.856, while the fixed-shape deployable INT8 TFLite artifact attains 30.006 dB/0.857.