| Trying to find the sweet-spot to tradeoff between power and tg/s. 250W seems to be a sweet spot for Qwen3.6-27B. It's interesting that I got higher tg/s at 275W for 1 concurrent request VLLM-server-config from tedivm Benchmark-cmd [link] [comments] |
Power-limit vs TG/s for 2x3090
Reddit r/LocalLLaMA / 4/28/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- A Reddit post discusses finding the best performance trade-off between GPU power limits and throughput (tg/s) when running a 2×3090 setup.
- The author reports that 250W appears to be a “sweet spot” for Qwen3.6-27B, based on their observed results.
- They also note that throughput increased at 275W for a single concurrent request, suggesting power/throughput scaling can vary with workload concurrency.
- The post includes specific vLLM server configuration and a benchmark command used to measure results (vLLM with quantization, chunked prefill, prefix caching, and speculative configuration).
Related Articles

Black Hat USA
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
How I Automate My Dev Workflow with Claude Code Hooks
Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System
Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)
Dev.to