AI Navigate

LRConv-NeRV: Low Rank Convolution for Efficient Neural Video Compression

arXiv cs.CV / 3/20/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • LRConv-NeRV replaces selected dense 3x3 convolutions in NeRV's decoder with structured low-rank separable convolutions and is trained end-to-end to allow controllable quality–efficiency trade-offs.
  • Applying low-rank factorization only to the final decoder stage yields a 68% reduction in decoder GFLOPs (201.9 to 64.9) and a 9.3% smaller model, with negligible quality loss and about 9.2% bitrate reduction.
  • INT8 post-training quantization preserves reconstruction quality close to the dense baseline, while aggressive early-stage factorization can degrade quality.
  • The approach preserves temporal coherence and presents LRConv-NeRV as a viable architectural alternative for efficient neural video decoding in low-resource settings.

Abstract

Neural Representations for Videos (NeRV) encode entire video sequences within neural network parameters, offering an alternative paradigm to conventional video codecs. However, the convolutional decoder of NeRV remains computationally expensive and memory intensive, limiting its deployment in resource-constrained environments. This paper proposes LRConv-NeRV, an efficient NeRV variant that replaces selected dense 3x3 convolutional layers with structured low-rank separable convolutions, trained end-to-end within the decoder architecture. By progressively applying low-rank factorization from the largest to earlier decoder stages, LRConv-NeRV enables controllable trade-offs between reconstruction quality and efficiency. Extensive experiments demonstrate that applying LRConv only to the final decoder stage reduces decoder complexity by 68%, from 201.9 to 64.9 GFLOPs, and model size by 9.3%, while incurring negligible quality loss and achieving approximately 9.2% bitrate reduction. Under INT8 post-training quantization, LRConv-NeRV preserves reconstruction quality close to the dense NeRV baseline, whereas more aggressive factorization of early decoder stages leads to disproportionate quality degradation. Compared to existing work under layer-aligned settings, LRConv-NeRV achieves a more favorable efficiency versus quality trade-off, offering substantial GFLOPs and parameter reductions while maintaining higher PSNR/MS-SSIM and improved temporal stability. Temporal flicker analysis using LPIPS further shows that the proposed solution preserves temporal coherence close to the NeRV baseline, results establish LRConv-NeRV as a potential architectural alternative for efficient neural video decoding under low-precision and resource-constrained settings.