Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs
arXiv cs.AI / 3/12/2026
💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep Analysis
Key Points
- The paper proposes a 'soft sparsity' paradigm using a hardware-efficient Most Significant Bit proxy to skip negligible non-zero multiplications in CNNs.
- It is implemented as a custom RISC-V instruction and evaluated on LeNet-5 (MNIST), achieving 88.42% reduction in ReLU MACs and 74.87% reduction in Tanh MACs with zero accuracy loss, outperforming zero-skipping by about 5x.
- Clock-gating of inactive multipliers yields estimated power savings of 35.2% for ReLU and 29.96% for Tanh, though memory access makes the overall power reduction sub-linear relative to operation savings.
- The results indicate significant potential for more efficient edge inference and could influence future CNN accelerator and hardware design for resource-constrained deployments.




