New Local LLM Rig: Ryzen 9700X + Radeon R9700. Getting ~120 tok/s! What models fit best?

Reddit r/LocalLLaMA / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A user shares a newly built local LLM inference workstation featuring a Ryzen 7 9700X CPU, Radeon AI PRO R9700 GPU (32GB VRAM), 64GB DDR5, Fedora Workstation, and LM Studio using the Vulkan backend.
They report achieving about 120 tokens per second on simple prompts with the qwen3.6-35b-a3b model.
The user asks for guidance on the largest model architecture they can run comfortably on this hardware.
They specifically wonder whether they should prioritize Q4_K_M quantization settings for better fit and performance.
The post is essentially a community-driven request for model and quantization recommendations tailored to their setup.

Hi ! I just finished building a workstation specifically for local inference and wanted to get your thoughts on my setup and model recommendations.

•GPU: AMD Radeon AI PRO R9700 (32GB GDDR6 VRAM)

•CPU: AMD Ryzen 7 9700X

•RAM: 64GB DDR5

•OS: Fedora Workstation

•Software: LM Studio (Vulkan backend), wanna test LLAMA

•Performance: Currently hitting a steady ~120 tok/s on simple prompts. (qwen3.6-35b-a3b)

What is the largest model architecture you recommend running comfortably? Should I be focusing on Q4_K_M quantizations ?

AI Business

AI Business

Reddit r/MachineLearning

Dev.to

Dev.to