Considering two Sparks for local coding

Reddit r/LocalLLaMA / 5/5/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

A user with a 4x RTX 3090 (96GB VRAM) has tested Qwen3.5-122B-A10B (AWQ) for local web app coding up to ~200k context, achieving about ~15 tokens/sec around ~100k context.
They are considering buying two “Sparks” units paired with MiniMax M2.7 to keep prompt processing fast enough by using two devices for parallel throughput.
The plan would allocate ~256GB total VRAM (2 x 128GB), leaving headroom for future model upgrades like a next MiniMax version or Qwen3.6-122B.
They estimate power usage at ~50W idle per Spark and compare it with their current 4x3090 rig’s idle (~130W) and full-load peak (~750W), aiming for better efficiency.
They can’t benchmark MiniMax M2.7 on their current machine due to insufficient 96GB VRAM and suspect slow DDR4 2133 RAM is already a bottleneck for prompt processing, so they ask others about real-world coding gains over Qwen3.5-122B-A10B (HTML/JS/Python).

I'm currently running a 4x RTX 3090 system (96GB VRAM, DDR4 2133 RAM) and have tested opencode and pi.dev using Qwen3.5-122B-A10B (AWQ) up to 200k context for web app coding (html/js/python). I'm now seriously considering picking up two Sparks paired with MiniMax M2.7 for local inference.

Two units are needed to keep prompt processing at acceptable speeds. Output tokens/sec stays the same regardless (~15 tok/s at ~100k context, based on what I've seen here). Combined 2 * 128 GB = 256 GB VRAM leaves headroom for future models (next MiniMax version, Qwen3.6-122B).

Idle power draw: ~50 W per Spark measured at the wall. My 4x 3090 rig idles at ~130 W (all cards power-limited to 275 W, 22W idle per card in nvidia-smi; under full load with the 122B model it peaks at ~750 W).

I need context up to ~120k tokens for coding sessions. Based on the numbers above, two sparks with MiniMax M2.7 should deliver acceptable speeds in that range which would be enough for me.

I can't properly benchmark MiniMax M2.7 on my current setup, 96 GB VRAM isn't enough to load it comfortably, and the slow DDR4 2133 RAM makes prompt processing a bottleneck anyway.

I'm curious what your experience is. How much better is MiniMax M2.7 than Qwen3.5-122B-A10B (AWQ) for real-world coding tasks (HTML/JS/Python)? Thanks in advance.

submitted by /u/chikengunya
[link] [comments]