The Low-End Theory! Battle of < $250 Inference

Reddit r/LocalLLaMA / 3/30/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Read original →

共有:

Key Points

The article compares multiple low-cost GPUs (all under about $250 each) using llama.cpp with the same settings (llama-bench with -ngl 99) to measure inference throughput in tokens/sec.
For Qwen3-VL-4B-Instruct-Q4_K_M, the RTX 3060 (12GB) and the CMP100-210 (16GB) lead overall performance among the listed cards, while the Tesla P4 (8GB) is substantially slower.
For Mistral-7B-Instruct-v0.3-Q4_K_M, the CMP100-210 (16GB) shows the highest tokens/sec, and the RTX 3060 also performs strongly relative to the Tesla P4 and Tesla P40.
For gemma-3-12B-it and Qwen2.5-Coder-14B, the Tesla P4 often cannot load at the tested configuration, while the highest observed throughput comes from the CMP100-210 (16GB), indicating memory/compatibility limits even at low end.
Overall, the results suggest that within a sub-$250 inference budget, GPU choice is dominated by effective VRAM capacity and the ability to successfully load quantized models rather than only raw cost.

Low‑End Theory: Battle of the < $250 Inference GPUs

Card Lineup and Cost

Three Tesla P4 cards were purchased for a combined $250, compared against one of each other card type.

Cost Table

Card	eBay Price (USD)	$/GB
Tesla P4 (8GB)	81	10.13
CMP170HX (10GB)	195	19.5
RTX 3060 (12GB)	160	13.33
CMP100‑210 (16GB)	125	7.81
Tesla P40 (24GB)	225	9.375

Inference Tests (llama.cpp)

All tests run with:
llama-bench -m <MODEL> -ngl 99

Qwen3‑VL‑4B‑Instruct‑Q4_K_M.gguf (2.3GB)

Card	Tokens/sec
Tesla P4 (8GB)	35.32
CMP170HX (10GB)	51.66
RTX 3060 (12GB)	76.12
CMP100‑210 (16GB)	81.35
Tesla P40 (24GB)	53.39

Mistral‑7B‑Instruct‑v0.3‑Q4_K_M.gguf (4.1GB)

Card	Tokens/sec
Tesla P4 (8GB)	25.73
CMP170HX (10GB)	33.62
RTX 3060 (12GB)	65.29
CMP100‑210 (16GB)	91.44
Tesla P40 (24GB)	42.46

gemma‑3‑12B‑it‑Q4_K_M.gguf (6.8GB)

Card	Tokens/sec
Tesla P4 (8GB)	Can’t Load
2× Tesla P4 (16GB)	13.95
CMP170HX (10GB)	18.96
RTX 3060 (12GB)	32.97
CMP100‑210 (16GB)	43.84
Tesla P40 (24GB)	21.90

Qwen2.5‑Coder‑14B‑Instruct‑Q4_K_M.gguf (8.4GB)

Card	Tokens/sec
Tesla P4 (8GB)	Can’t Load
2× Tesla P4 (16GB)	12.65
CMP170HX (10GB)	17.31
RTX 3060 (12GB)	31.90
CMP100‑210 (16GB)	45.44
Tesla P40 (24GB)	20.33

openai_gpt‑oss‑20b‑MXFP4.gguf (11.3GB)

Card	Tokens/sec
Tesla P4 (8GB)	Can’t Load
2× Tesla P4 (16GB)	34.82
CMP170HX (10GB)	Can’t Load
RTX 3060 (12GB)	77.18
CMP100‑210 (16GB)	77.09
Tesla P40 (24GB)	50.41

Codestral‑22B‑v0.1‑Q5_K_M.gguf (14.6GB)

Card	Tokens/sec
Tesla P4 (8GB)	Can’t Load
2× Tesla P4 (16GB)	Can’t Load
3× Tesla P4 (24GB)	7.58
CMP170HX (10GB)	Can’t Load
RTX 3060 (12GB)	Can’t Load
CMP100‑210 (16GB)	Can’t Load
Tesla P40 (24GB)	12.09

submitted by /u/m94301
[link] [comments]

Black Hat Asia

AI Business

The Brand Gravity Anomaly: Uncovering AI Developer Friction with a 5-Organ Swarm and Notion MCP

Dev.to

Hyper-Personalization in Action: AI-Driven Media Lists

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

The AI Agent Revolution: How Businesses Are Automating Everything in 2026

Dev.to

The Low-End Theory! Battle of < $250 Inference

Key Points

Low‑End Theory: Battle of the < $250 Inference GPUs

Card Lineup and Cost

Cost Table

Inference Tests (llama.cpp)

Qwen3‑VL‑4B‑Instruct‑Q4_K_M.gguf (2.3GB)

Mistral‑7B‑Instruct‑v0.3‑Q4_K_M.gguf (4.1GB)

gemma‑3‑12B‑it‑Q4_K_M.gguf (6.8GB)

Qwen2.5‑Coder‑14B‑Instruct‑Q4_K_M.gguf (8.4GB)

openai_gpt‑oss‑20b‑MXFP4.gguf (11.3GB)

Codestral‑22B‑v0.1‑Q5_K_M.gguf (14.6GB)

Related Articles

Black Hat Asia

The Brand Gravity Anomaly: Uncovering AI Developer Friction with a 5-Organ Swarm and Notion MCP

Hyper-Personalization in Action: AI-Driven Media Lists

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

The AI Agent Revolution: How Businesses Are Automating Everything in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer