New Local LLM Rig: Ryzen 9700X + Radeon R9700. Getting ~120 tok/s! What models fit best?

Reddit r/LocalLLaMA / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A user shares a newly built local LLM inference workstation featuring a Ryzen 7 9700X CPU, Radeon AI PRO R9700 GPU (32GB VRAM), 64GB DDR5, Fedora Workstation, and LM Studio using the Vulkan backend.
  • They report achieving about 120 tokens per second on simple prompts with the qwen3.6-35b-a3b model.
  • The user asks for guidance on the largest model architecture they can run comfortably on this hardware.
  • They specifically wonder whether they should prioritize Q4_K_M quantization settings for better fit and performance.
  • The post is essentially a community-driven request for model and quantization recommendations tailored to their setup.

Hi ! I just finished building a workstation specifically for local inference and wanted to get your thoughts on my setup and model recommendations.

•GPU: AMD Radeon AI PRO R9700 (32GB GDDR6 VRAM)

•CPU: AMD Ryzen 7 9700X

•RAM: 64GB DDR5

•OS: Fedora Workstation

•Software: LM Studio (Vulkan backend), wanna test LLAMA

•Performance: Currently hitting a steady ~120 tok/s on simple prompts. (qwen3.6-35b-a3b)

What is the largest model architecture you recommend running comfortably? Should I be focusing on Q4_K_M quantizations ?

submitted by /u/jsorres
[link] [comments]