AI Navigate

vLLM on Jetson Orin — pre-built wheel with Marlin GPTQ support (3.8x prefill speedup)

Reddit r/LocalLLaMA / 3/15/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • A new pre-built vLLM wheel for Jetson Orin (AGX, NX, Nano) includes Marlin kernels for SM 8.7 to enable tensor cores during GPTQ inference.
  • The user built vLLM 0.17.0 with SM 8.7 support and packaged it as a wheel for JetPack 6.x / CUDA 12.6 / Python 3.10.
  • Benchmarks show significant speedups: prefill ~3.8x (523 tok/s to 2,001 tok/s), decode improvement from ~22.5 to ~31 tok/s, and end-to-end at 20K context from 47s to 17s (2.8x faster).
  • Installation is a one-line pip install from a HuggingFace wheel, with full benchmarks and setup notes available in the GitHub repo.

Hey all,

If you're running GPTQ models on a Jetson Orin (AGX, NX, or Nano), you've probably noticed that stock vLLM doesn't ship Marlin kernels for SM 8.7. It covers 8.0, 8.6, 8.9, 9.0 — but not the Orin family. Which means your tensor cores just sit there doing nothing during GPTQ inference.

I ran into this while trying to serve Qwen3.5-35B-A3B-GPTQ-Int4 on an AGX Orin 64GB. The performance without Marlin was underwhelming, so I compiled vLLM 0.17.0 with the SM 8.7 target included and packaged it as a wheel.

The difference was significant:

- Prefill went from 523 tok/s (llama.cpp) to 2,001 tok/s — about 3.8x

- Decode improved from ~22.5 to ~31 tok/s at short context (within vllm)

- End-to-end at 20K context: 17s vs 47s with llama.cpp (2.8x faster)

The wheel is on HuggingFace so you can install it with one line:

 pip install https://huggingface.co/thehighnotes/vllm-jetson-orin/resolve/main/vllm-0.17.0+cu126-cp310-cp310-linux_aarch64.whl 

Built for JetPack 6.x / CUDA 12.6 / Python 3.10 (the standard Jetson stack).

Full benchmarks and setup notes in the repo: https://github.com/thehighnotes/vllm-jetson-orin

Hope it helps anyone and am happy to answer questions if anyone's working with a similar setup.

~Mark

submitted by /u/thehighnotes
[link] [comments]