I was curious about the real electric costs of running qwen 3.5 27B on my hardware. For this I measured TPS for prompt processing and for generation and power consumption.
I was running it with vLLM on a rtx 3090 + rtx pro 4000. I measured 53.8 tps in generation and 1,691 tps in prompt processing uncached. This was through a python script calling the real api. My electric costs are around 0.30€/kWh.
Nvidia tools showed my around 470W while sampling of GPU power, with some other components in the pc I calculated with 535W. (Came to this with around 100W idle as I know for my system, subtracting the GPU idles that nvidia tools shows).
So after long bla bla here are the result:
Input uncached 0.026€ / 1M tokens
Output: 0.829€ / 1M tokens
Maybe I will redo the test with running through llama.cpp only on gpu1 and only on gpu2. The rtx pro 4000 with 145W max power should be more cheap I think, but it's also slower running in this setup.
[link] [comments]