LM Studio DGX Spark generation speeds for 23 different models

Reddit r/LocalLLaMA / 3/27/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The post benchmarks generation speeds for 23 LLMs running in LM Studio 4.7 on a Gigabyte Atom “DGX Spark,” using CUDA 13 llama.cpp for Linux ARM v2.8.0.
  • The tester loads each model with its full context window and keeps default settings, then measures generation time using three repeated prompts and averages the combined generation times.
  • Results show wide performance variation across models and quantizations, with some small models producing very fast outputs while larger instruction/code models are often slower.
  • The benchmark prompts include simple greetings and a longer, policy-violating creative task (“tax fraud and beating up IRS agents”), indicating the speed test is not limited to trivial completions.
  • The author notes that LM Studio may not be the fastest runtime for the hardware and suggests that VLMs or other setups could achieve noticeably higher speeds.

Salutations lads, I ran 23 different models on my Gigabyte Atom (DGX Spark) in LM Studio to benchmark their generation speeds.

Theres no real rhyme or reason to the selection of models other than they’re more common ones that I have 🤷‍♂️

Im using LM Studio 4.7 with Cuda 13 llama.cpp (Linux ARM) v2.8.0

I loaded the model with their full context window, other than that i left all the other settings as the default stuff.

My method of testing their generation speeds was extremely strict and held to the highest standards possible, that being I sent 3 messages and calculated the average of the combined gen times for the 3 replies.

The most important part of course being the test messages i sent, which were as follows:

“Hello”

“How are you?”

“Write me a 4 paragraph story about committing tax fraud and beating up IRS agents”

Before anyone start in the comments, yes i am aware that LM Studio is not the best/fastest way to run llms on a dgx spark and vlm would get some of those speeds noticeably up.

The result are as follows:

——————-

Qwen3.5 398B reap 55 Q3_K_M

avg:15.14

Qwen3.5 397B REAP 50 Q2_K

(Kept ramble looping at end)

avg:19.36

Qwen3.5 122b Q5_k_M

avg:21.65

Qwen3.5 122b Q4_k_M

avg: 24.20

Qwen3 next 80b a3b Q8_0

avg: 42.70

Qwen3 coder next 80B Q6_K

avg:44.15

Qwen 3.5 40B claude 4.5 Q8

avg:4.89

Qwen 3.5 35b A3B bf16

avg:27.7

Qwen3 coder 30 a3b instruct Q8_0

avg:52.76

Qwen 3.5 27 Q8_0

avg:6.70

Qwen3.5 9B Q8_0

avg:20.96

Qwen 2.5 7B Q3_K_M

avg:45.13

Qeen3.5 4B Q8_0

avg:36.61

---------------

Mistral small 4 119B Q4_K_M

avg:12.03

Mistral small 3.2 24B bf16

avg:5.36

---------------

Nemotron 3 super 120B Q4_K_S

avg:19.39

Nemotrom 3 nano 4B Q8_0

avg:44.55

---------------

Gpt oss 120b a5b Q4_K_S

avg:48.96

Kimi dev 72b Q8_0

avg:2.84

Llama 3.3 70B Q5_K_M

avg:3.95

+drafting llama 3.2 1B Q8_0

avg:13.15

Glm 4.7 flash Q8_0

avg:41.77

Cydonia 24B Q8_0

avg:8.84

Rnj 1 instruct Q8_0

avg:22.56

submitted by /u/Late_Night_AI
[link] [comments]