ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing

arXiv cs.CV / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • ImageHD is a new FPGA-based system for on-device continual learning of visual representations using hyperdimensional computing (HDC), aiming to handle non-stationary data streams with low compute and memory overhead.
  • The approach avoids backpropagation and reduces exemplar/memory complexity by using a unified, bounded exemplar memory and a hardware-efficient cluster merging strategy.
  • ImageHD integrates a quantized CNN feature extractor with HDC encoding, similarity search, and bounded cluster management implemented as a streaming dataflow on an AMD Zynq ZCU104 FPGA.
  • The system uses word-packed binary hypervectors to enable highly parallel bitwise computation within tight on-chip resource budgets.
  • On the CORe50 benchmark, ImageHD reports up to 40.4× speedup (or 4.84×) and up to 383× energy efficiency (or 105.1×) versus optimized CPU (GPU) baselines, highlighting real-time edge deployability.

Abstract

On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative through fast, non-iterative online updates. Combined with a compact convolutional neural network (CNN) feature extractor, HDC enables efficient on-device adaptation with strong visual representations. However, prior HDC-based CL systems often depend on multi-tier memory hierarchies and complex cluster management, limiting deployability on resource-constrained hardware. We present ImageHD, an FPGA accelerator for on-device continual learning of visual data based on HDC. ImageHD targets streaming CL under strict latency and on-chip memory constraints, avoiding costly iterative optimization. At the algorithmic level, we introduce a hardware-aware CL method that bounds class exemplars through a unified exemplar memory and a hardware-efficient cluster merging strategy, while incorporating a quantized CNN front-end to reduce deployment overhead without sacrificing accuracy. At the system level, ImageHD is implemented as a streaming dataflow architecture on the AMD Zynq ZCU104 FPGA, integrating HDC encoding, similarity search, and bounded cluster management using word-packed binary hypervectors for massively parallel bitwise computation within tight on-chip resource budgets. On CORe50, ImageHD achieves up to 40.4x (4.84x) speedup and 383x (105.1x) energy efficiency over optimized CPU (GPU) baselines, demonstrating the practicality of HDC-enabled continual learning for real-time edge AI.