Hi ! I just finished building a workstation specifically for local inference and wanted to get your thoughts on my setup and model recommendations.
•GPU: AMD Radeon AI PRO R9700 (32GB GDDR6 VRAM)
•CPU: AMD Ryzen 7 9700X
•RAM: 64GB DDR5
•OS: Fedora Workstation
•Software: LM Studio (Vulkan backend), wanna test LLAMA
•Performance: Currently hitting a steady ~120 tok/s on simple prompts. (qwen3.6-35b-a3b)
What is the largest model architecture you recommend running comfortably? Should I be focusing on Q4_K_M quantizations ?
[link] [comments]



