When Spike Sparsity Does Not Translate to Deployed Cost: VS-WNO on Jetson Orin Nano

arXiv cs.LG / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper evaluates whether spike sparsity in spiking neural operator models actually lowers latency and energy when deployed on a Jetson Orin Nano using standard edge-GPU software stacks.
In the “reference-aligned” path, VS-WNO shows clear algorithmic sparsity, with mean spike rates dropping from 54.26% in the first spiking layer to 18.15% in the fourth.
In a more “deployment-style” request path, however, that sparsity does not translate into savings: VS-WNO achieves 59.6 ms latency and 228.0 mJ dynamic energy per inference versus dense WNO’s 53.2 ms and 180.7 mJ.
Profiling with Nsight Systems suggests the runtime remains launch-dominated and does not effectively suppress dense computation as spike activity decreases, explaining why cost does not improve.
The authors conclude that spike sparsity is observable but insufficient for reducing deployed cost on this Jetson-class GPU stack because sparse execution is not realized by the software runtime.

Abstract

Spiking neural operators are appealing for neuromorphic edge computing because event-driven substrates can, in principle, translate sparse activity into lower latency and energy. Whether that advantage survives deployment on commodity edge-GPU software stacks, however, remains unclear. We study this question on a Jetson Orin Nano 8 GB using five pretrained variable-spiking wavelet neural operator (VS-WNO) checkpoints and five matched dense wavelet neural operator (WNO) checkpoints on the Darcy rectangular benchmark. On a reference-aligned path, VS-WNO exhibits substantial algorithmic sparsity, with mean spike rates decreasing from 54.26% at the first spiking layer to 18.15% at the fourth. On a deployment-style request path, however, this sparsity does not reduce deployed cost: VS-WNO reaches 59.6 ms latency and 228.0 mJ dynamic energy per inference, whereas dense WNO reaches 53.2 ms and 180.7 mJ, while also achieving slightly lower reference-path error (1.77% versus 1.81%). Nsight Systems indicates that the request path remains launch-dominated and dense rather than sparsity-aware: for VS-WNO, cudaLaunchKernel accounts for 81.6% of CUDA API time within the latency window, and dense convolution kernels account for 53.8% of GPU kernel time; dense WNO shows the same pattern. On this Jetson-class GPU stack, spike sparsity is measurable but does not reduce deployed cost because the runtime does not suppress dense work as spike activity decreases.

The 2026 Forbes AI 50 List

Reddit r/artificial

Add cryptographic authorization to AI agents in 5 minutes

Dev.to

Building a website with Replit and Vercel

Dev.to

Supercharging Your CI/CD: Integrating TestSprite AI Testing with GitHub Actions

Dev.to

The ULTIMATE Guide to AI Voice Cloning: RVC WebUI (Zero to Hero)

Dev.to

When Spike Sparsity Does Not Translate to Deployed Cost: VS-WNO on Jetson Orin Nano

Key Points

Abstract

Related Articles

The 2026 Forbes AI 50 List

Add cryptographic authorization to AI agents in 5 minutes

Building a website with Replit and Vercel

Supercharging Your CI/CD: Integrating TestSprite AI Testing with GitHub Actions

The ULTIMATE Guide to AI Voice Cloning: RVC WebUI (Zero to Hero)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer