When Spike Sparsity Does Not Translate to Deployed Cost: VS-WNO on Jetson Orin Nano

arXiv cs.LG / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper evaluates whether spike sparsity in spiking neural operator models actually lowers latency and energy when deployed on a Jetson Orin Nano using standard edge-GPU software stacks.
  • In the “reference-aligned” path, VS-WNO shows clear algorithmic sparsity, with mean spike rates dropping from 54.26% in the first spiking layer to 18.15% in the fourth.
  • In a more “deployment-style” request path, however, that sparsity does not translate into savings: VS-WNO achieves 59.6 ms latency and 228.0 mJ dynamic energy per inference versus dense WNO’s 53.2 ms and 180.7 mJ.
  • Profiling with Nsight Systems suggests the runtime remains launch-dominated and does not effectively suppress dense computation as spike activity decreases, explaining why cost does not improve.
  • The authors conclude that spike sparsity is observable but insufficient for reducing deployed cost on this Jetson-class GPU stack because sparse execution is not realized by the software runtime.

Abstract

Spiking neural operators are appealing for neuromorphic edge computing because event-driven substrates can, in principle, translate sparse activity into lower latency and energy. Whether that advantage survives deployment on commodity edge-GPU software stacks, however, remains unclear. We study this question on a Jetson Orin Nano 8 GB using five pretrained variable-spiking wavelet neural operator (VS-WNO) checkpoints and five matched dense wavelet neural operator (WNO) checkpoints on the Darcy rectangular benchmark. On a reference-aligned path, VS-WNO exhibits substantial algorithmic sparsity, with mean spike rates decreasing from 54.26% at the first spiking layer to 18.15% at the fourth. On a deployment-style request path, however, this sparsity does not reduce deployed cost: VS-WNO reaches 59.6 ms latency and 228.0 mJ dynamic energy per inference, whereas dense WNO reaches 53.2 ms and 180.7 mJ, while also achieving slightly lower reference-path error (1.77% versus 1.81%). Nsight Systems indicates that the request path remains launch-dominated and dense rather than sparsity-aware: for VS-WNO, cudaLaunchKernel accounts for 81.6% of CUDA API time within the latency window, and dense convolution kernels account for 53.8% of GPU kernel time; dense WNO shows the same pattern. On this Jetson-class GPU stack, spike sparsity is measurable but does not reduce deployed cost because the runtime does not suppress dense work as spike activity decreases.