I'm currently running a 4x RTX 3090 system (96GB VRAM, DDR4 2133 RAM) and have tested opencode and pi.dev using Qwen3.5-122B-A10B (AWQ) up to 200k context for web app coding (html/js/python). I'm now seriously considering picking up two Sparks paired with MiniMax M2.7 for local inference.
Two units are needed to keep prompt processing at acceptable speeds. Output tokens/sec stays the same regardless (~15 tok/s at ~100k context, based on what I've seen here). Combined 2 * 128 GB = 256 GB VRAM leaves headroom for future models (next MiniMax version, Qwen3.6-122B).
Idle power draw: ~50 W per Spark measured at the wall. My 4x 3090 rig idles at ~130 W (all cards power-limited to 275 W, 22W idle per card in nvidia-smi; under full load with the 122B model it peaks at ~750 W).
I need context up to ~120k tokens for coding sessions. Based on the numbers above, two sparks with MiniMax M2.7 should deliver acceptable speeds in that range which would be enough for me.
I can't properly benchmark MiniMax M2.7 on my current setup, 96 GB VRAM isn't enough to load it comfortably, and the slow DDR4 2133 RAM makes prompt processing a bottleneck anyway.
I'm curious what your experience is. How much better is MiniMax M2.7 than Qwen3.5-122B-A10B (AWQ) for real-world coding tasks (HTML/JS/Python)? Thanks in advance.
[link] [comments]




