| Special thanks for u/Sea-Speaker1700 to make possible run mxfp4 on R0700 GPU, first guide to run 122B models here Well, 397B model works amazing, super fast. Use this Dockerfile to build image, original image provided by u/Sea-Speaker1700 build patched version
Launch script, keep your device id, replace $1 with model name, $2 with out port. Loading model 400-600s first time, and then got 30 t/s on tg, 3.5-3.7k on pp in one request. in 4x requests you will got up to 100 t/s. I limit power per gpu (210W), if power limit 300W per gpu will speedup model. Best result with this model i have when thinking budget is 0 tokens for coding tasks. [link] [comments] |
Run Qwen3.5-397B-A13B with vLLM and 8xR9700
Reddit r/LocalLLaMA / 4/12/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The post explains how to run the Qwen3.5-397B-A13B (listed as a 397B MXFP4 variant) on an 8xR9700 setup using vLLM on ROCm via a custom Docker image.
- It provides a Dockerfile that installs an updated Transformers version and applies a Triton patch to adjust a topk-related constant for compatibility/performance.
- It links to an MXFP4 model checkpoint hosted on Hugging Face and gives step-by-step commands for cloning the model with Git LFS.
- It includes a detailed docker run launch command configuring multiple GPU device mappings, HIP/ROCR visibility, shared memory, and vLLM settings such as prefix caching and near-full GPU memory utilization.
- The author claims the 397B model runs “super fast,” positioning the guide as an approach to enable large (over-100B) model inference on specific ROCm hardware.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Best AI Video Generator in 2026: Top Tools Tested & Compared
Dev.to

The Future of Agent Integration: A2A vs ANP and the Three-Layer Security Architecture
Dev.to