I am from a country with costly electric power. I really like my 6x RTX 3080 20GB GPU-Server, but the power consumption - especially when running for 24x7 or 14x7 Hours, it is quite intense.
I have been lurking a long time on buying a strix halo ( Yeah, their prices gone up ) or even a DGX Spark or one of its cheaper clones. It's clear to me that I am losing compute power, as the bandwidth is indeed smaller.
Since I am using more and more agents, which can run around the clock, it is not that important for me to have very fast token generation, but prompt processing is getting more and more important as the context is increasing with more agentic use cases.
My thoughts:
GB10 (Nvidia DGX Spark or Clones)
- May be good performance when using fp4 while still having a fair quality
- Keeping the CUDA Environment
- Expansion is limited due to single and short m.2 SSD - except for buying a second GB10
Strix-Halo / Ryzen AI 395 Max
- Nearly 50% cheaper than GB10 Clones
- Possibly a hacky solution to add a second GPU as many models offer PCIe Slots ( Minisforum, Framework) or a second x4 m.2 Slot (Bosgame M5) to be able to increase capacity and speed when tuning the split-modes.
- I am afraid of the vulkan/rocm eco-system and multiple GPUs if required.
Bonus Thoughts: What will be coming out from Apple in the summer? The M5 Max on Macbook Pro (Alex Ziskind Videos) showed that even the Non-Ultra Mac do offer quite nice PP values when compared to Strix-Halo and GB10.
What are your thoughts on this, and what hints and experiences could you share with me?
[link] [comments]




