GraphLeap: Decoupling Graph Construction and Convolution for Vision GNN Acceleration on FPGA

arXiv cs.CV / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • Vision Graph Neural Networks (ViGs) build a feature-dependent graph at every layer using kNN over patch tokens, and this per-layer graph construction is identified as the dominant bottleneck (50–95% of convolution time) and scales poorly with O(N^2).
  • GraphLeap eliminates the sequential dependency by decoupling graph construction from feature updates: it uses the previous layer’s features to run message passing while concurrently using the current layer’s features to prepare the next layer’s graph (one-layer lookahead).
  • The accuracy impact from using prior-layer features is reported to be minor, and a short lightweight fine-tuning for a few epochs can largely recover the original accuracy.
  • Building on GraphLeap, the authors present the first end-to-end FPGA accelerator for Vision GNNs with a streaming, layer-pipelined architecture that overlaps kNN graph construction and feature update, achieving up to 95.7× speedup vs CPU and 8.5× vs GPU on an Alveo U280.
  • The results suggest real-time Vision GNN inference is feasible, aided by efficient on-chip dataflow design that avoids explicit edge-feature materialization.

Abstract

Vision Graph Neural Networks (ViGs) represent an image as a graph of patch tokens, enabling adaptive, feature-driven neighborhoods. Unlike CNNs with fixed grid biases or Vision Transformers with global token interactions, ViGs rely on dynamic graph convolution: at each layer, a feature-dependent graph is built via k-nearest-neighbor (kNN) search on current patch features, followed by message passing. This per-layer graph construction is the main bottleneck, consuming 50--95\% of graph convolution time on CPUs and GPUs, scaling as O(N^2) with the number of patches N, and creating a sequential dependency between graph construction and feature updates. We introduce GraphLeap, a simple reformulation that removes this dependency by decoupling graph construction from feature update across layers. GraphLeap performs the feature update at layer \ell using a graph built from the previous layer's features, while simultaneously using the current layer's features to construct the graph for layer \ell+1. This one-layer-lookahead graph construction enables concurrent graph construction and message passing. Although using prior-layer features can introduce minor accuracy degradation, lightweight fine-tuning for a few epochs is sufficient to recover the original accuracy. Building on GraphLeap, we present the first end-to-end FPGA accelerator for Vision GNNs. Our streaming, layer-pipelined design overlaps a kNN graph construction engine with a feature update engine, exploits node- and channel-level parallelism, and enables efficient on-chip dataflow without explicit edge-feature materialization. Evaluated on isotropic and pyramidal ViG models on an Alveo U280 FPGA, GraphLeap achieves up to 95.7\times speedup over CPU and 8.5\times speedup over GPU baselines, demonstrating the feasibility of real-time Vision GNN inference.