calculated my costs per 1M tokens for Qwen3.5 27B

Reddit r/LocalLLaMA / 3/27/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The author estimates the electricity cost of running Qwen 3.5 27B locally by measuring both prompt-processing throughput and text generation throughput on a dual-GPU setup using vLLM.

I was curious about the real electric costs of running qwen 3.5 27B on my hardware. For this I measured TPS for prompt processing and for generation and power consumption.

I was running it with vLLM on a rtx 3090 + rtx pro 4000. I measured 53.8 tps in generation and 1,691 tps in prompt processing uncached. This was through a python script calling the real api. My electric costs are around 0.30€/kWh.

Nvidia tools showed my around 470W while sampling of GPU power, with some other components in the pc I calculated with 535W. (Came to this with around 100W idle as I know for my system, subtracting the GPU idles that nvidia tools shows).

So after long bla bla here are the result:

Input uncached 0.026€ / 1M tokens

Output: 0.829€ / 1M tokens

Maybe I will redo the test with running through llama.cpp only on gpu1 and only on gpu2. The rtx pro 4000 with 145W max power should be more cheap I think, but it's also slower running in this setup.

submitted by /u/moneyspirit25
[link] [comments]