After some sglang patching and countless experiments, managed to get reap-ed nvfp4 version running stable and FAST on 4 x RTX 6000 Pros (limited to 350W). Very happy with performance and quality. Inference software is still under-optimized for those cards. I think we will see their true potential unfold this or early next year.
Throughput by Context Depth
| Prefilled | PP@4096 | TG@512 |
|---|---|---|
| 0 | 2229.0 | 42.03 |
| 4K | 1943.6 | 41.41 |
| 16K | 1558.9 | 39.72 |
| 32K | 1234.2 | 38.19 |
| 64K | 863.5 | 35.87 |
TG Peak (burst throughput)
43.00 42.00 40.00 39.00 37.00
Overall experience with opencode is pretty close to Sonnet + Claude Code. 100-200k sessions are stable.
Will play with different concurrency settings this weekend.
Anyone seen better performance on this hardware?
PS: concurrency = 2 worked great. Generation hits 65 tps average.
[link] [comments]




