Mistral-Medium-3.5-128B-Q3_K_M on 3x3090 (72GB VRAM)

Reddit r/LocalLLaMA / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The post demonstrates the local inference speed of Mistral Medium 3.5 128B using a Q3 quantized model on a multi-GPU setup of three NVIDIA 3090 cards with a combined 72GB VRAM.
  • It includes performance screenshots and output rendered in multiple formats, suggesting the author ran benchmarks and verified end-to-end responsiveness.
  • The use of 3x3090 indicates a practical approach for running larger LLMs locally by relying on quantization (Q3) and multi-GPU distribution.
  • Overall, the content focuses on real-world throughput/latency behavior rather than describing a new model release or vendor announcement.