The Low-End Theory! Battle of < $250 Inference

Reddit r/LocalLLaMA / 3/30/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The article compares multiple low-cost GPUs (all under about $250 each) using llama.cpp with the same settings (llama-bench with -ngl 99) to measure inference throughput in tokens/sec.
  • For Qwen3-VL-4B-Instruct-Q4_K_M, the RTX 3060 (12GB) and the CMP100-210 (16GB) lead overall performance among the listed cards, while the Tesla P4 (8GB) is substantially slower.
  • For Mistral-7B-Instruct-v0.3-Q4_K_M, the CMP100-210 (16GB) shows the highest tokens/sec, and the RTX 3060 also performs strongly relative to the Tesla P4 and Tesla P40.
  • For gemma-3-12B-it and Qwen2.5-Coder-14B, the Tesla P4 often cannot load at the tested configuration, while the highest observed throughput comes from the CMP100-210 (16GB), indicating memory/compatibility limits even at low end.
  • Overall, the results suggest that within a sub-$250 inference budget, GPU choice is dominated by effective VRAM capacity and the ability to successfully load quantized models rather than only raw cost.

Low‑End Theory: Battle of the < $250 Inference GPUs

Card Lineup and Cost

Three Tesla P4 cards were purchased for a combined $250, compared against one of each other card type.

Cost Table

Card eBay Price (USD) $/GB
Tesla P4 (8GB) 81 10.13
CMP170HX (10GB) 195 19.5
RTX 3060 (12GB) 160 13.33
CMP100‑210 (16GB) 125 7.81
Tesla P40 (24GB) 225 9.375

Inference Tests (llama.cpp)

All tests run with:
llama-bench -m <MODEL> -ngl 99


Qwen3‑VL‑4B‑Instruct‑Q4_K_M.gguf (2.3GB)

Card Tokens/sec
Tesla P4 (8GB) 35.32
CMP170HX (10GB) 51.66
RTX 3060 (12GB) 76.12
CMP100‑210 (16GB) 81.35
Tesla P40 (24GB) 53.39

Mistral‑7B‑Instruct‑v0.3‑Q4_K_M.gguf (4.1GB)

Card Tokens/sec
Tesla P4 (8GB) 25.73
CMP170HX (10GB) 33.62
RTX 3060 (12GB) 65.29
CMP100‑210 (16GB) 91.44
Tesla P40 (24GB) 42.46

gemma‑3‑12B‑it‑Q4_K_M.gguf (6.8GB)

Card Tokens/sec
Tesla P4 (8GB) Can’t Load
2× Tesla P4 (16GB) 13.95
CMP170HX (10GB) 18.96
RTX 3060 (12GB) 32.97
CMP100‑210 (16GB) 43.84
Tesla P40 (24GB) 21.90

Qwen2.5‑Coder‑14B‑Instruct‑Q4_K_M.gguf (8.4GB)

Card Tokens/sec
Tesla P4 (8GB) Can’t Load
2× Tesla P4 (16GB) 12.65
CMP170HX (10GB) 17.31
RTX 3060 (12GB) 31.90
CMP100‑210 (16GB) 45.44
Tesla P40 (24GB) 20.33

openai_gpt‑oss‑20b‑MXFP4.gguf (11.3GB)

Card Tokens/sec
Tesla P4 (8GB) Can’t Load
2× Tesla P4 (16GB) 34.82
CMP170HX (10GB) Can’t Load
RTX 3060 (12GB) 77.18
CMP100‑210 (16GB) 77.09
Tesla P40 (24GB) 50.41

Codestral‑22B‑v0.1‑Q5_K_M.gguf (14.6GB)

Card Tokens/sec
Tesla P4 (8GB) Can’t Load
2× Tesla P4 (16GB) Can’t Load
3× Tesla P4 (24GB) 7.58
CMP170HX (10GB) Can’t Load
RTX 3060 (12GB) Can’t Load
CMP100‑210 (16GB) Can’t Load
Tesla P40 (24GB) 12.09
submitted by /u/m94301
[link] [comments]