Low‑End Theory: Battle of the < $250 Inference GPUs
Card Lineup and Cost
Three Tesla P4 cards were purchased for a combined $250, compared against one of each other card type.
Cost Table
| Card | eBay Price (USD) | $/GB |
| Tesla P4 (8GB) | 81 | 10.13 |
| CMP170HX (10GB) | 195 | 19.5 |
| RTX 3060 (12GB) | 160 | 13.33 |
| CMP100‑210 (16GB) | 125 | 7.81 |
| Tesla P40 (24GB) | 225 | 9.375 |
Inference Tests (llama.cpp)
All tests run with:
llama-bench -m <MODEL> -ngl 99
Qwen3‑VL‑4B‑Instruct‑Q4_K_M.gguf (2.3GB)
| Card | Tokens/sec |
| Tesla P4 (8GB) | 35.32 |
| CMP170HX (10GB) | 51.66 |
| RTX 3060 (12GB) | 76.12 |
| CMP100‑210 (16GB) | 81.35 |
| Tesla P40 (24GB) | 53.39 |
Mistral‑7B‑Instruct‑v0.3‑Q4_K_M.gguf (4.1GB)
| Card | Tokens/sec |
| Tesla P4 (8GB) | 25.73 |
| CMP170HX (10GB) | 33.62 |
| RTX 3060 (12GB) | 65.29 |
| CMP100‑210 (16GB) | 91.44 |
| Tesla P40 (24GB) | 42.46 |
gemma‑3‑12B‑it‑Q4_K_M.gguf (6.8GB)
| Card | Tokens/sec |
| Tesla P4 (8GB) | Can’t Load |
| 2× Tesla P4 (16GB) | 13.95 |
| CMP170HX (10GB) | 18.96 |
| RTX 3060 (12GB) | 32.97 |
| CMP100‑210 (16GB) | 43.84 |
| Tesla P40 (24GB) | 21.90 |
Qwen2.5‑Coder‑14B‑Instruct‑Q4_K_M.gguf (8.4GB)
| Card | Tokens/sec |
| Tesla P4 (8GB) | Can’t Load |
| 2× Tesla P4 (16GB) | 12.65 |
| CMP170HX (10GB) | 17.31 |
| RTX 3060 (12GB) | 31.90 |
| CMP100‑210 (16GB) | 45.44 |
| Tesla P40 (24GB) | 20.33 |
openai_gpt‑oss‑20b‑MXFP4.gguf (11.3GB)
| Card | Tokens/sec |
| Tesla P4 (8GB) | Can’t Load |
| 2× Tesla P4 (16GB) | 34.82 |
| CMP170HX (10GB) | Can’t Load |
| RTX 3060 (12GB) | 77.18 |
| CMP100‑210 (16GB) | 77.09 |
| Tesla P40 (24GB) | 50.41 |
Codestral‑22B‑v0.1‑Q5_K_M.gguf (14.6GB)
| Card | Tokens/sec |
| Tesla P4 (8GB) | Can’t Load |
| 2× Tesla P4 (16GB) | Can’t Load |
| 3× Tesla P4 (24GB) | 7.58 |
| CMP170HX (10GB) | Can’t Load |
| RTX 3060 (12GB) | Can’t Load |
| CMP100‑210 (16GB) | Can’t Load |
| Tesla P40 (24GB) | 12.09 |
submitted by
/u/m94301 [link] [comments]