AI Navigate

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

arXiv cs.AI / 3/12/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep Analysis

Key Points

  • The paper proposes a 'soft sparsity' paradigm using a hardware-efficient Most Significant Bit proxy to skip negligible non-zero multiplications in CNNs.
  • It is implemented as a custom RISC-V instruction and evaluated on LeNet-5 (MNIST), achieving 88.42% reduction in ReLU MACs and 74.87% reduction in Tanh MACs with zero accuracy loss, outperforming zero-skipping by about 5x.
  • Clock-gating of inactive multipliers yields estimated power savings of 35.2% for ReLU and 29.96% for Tanh, though memory access makes the overall power reduction sub-linear relative to operation savings.
  • The results indicate significant potential for more efficient edge inference and could influence future CNN accelerator and hardware design for resource-constrained deployments.

Abstract

Modern CNNs' high computational demands hinder edge deployment, as traditional ``hard'' sparsity (skipping mathematical zeros) loses effectiveness in deep layers or with smooth activations like Tanh. We propose a ``soft sparsity'' paradigm using a hardware efficient Most Significant Bit (MSB) proxy to skip negligible non-zero multiplications. Integrated as a custom RISC-V instruction and evaluated on LeNet-5 (MNIST), this method reduces ReLU MACs by 88.42% and Tanh MACs by 74.87% with zero accuracy loss--outperforming zero-skipping by 5x. By clock-gating inactive multipliers, we estimate power savings of 35.2\% for ReLU and 29.96\% for Tanh. While memory access makes power reduction sub-linear to operation savings, this approach significantly optimizes resource-constrained inference.