| Here is the actual speed of Mistral Medium Q3 running locally on 3x3090 first some Python then svg then html [link] [comments] |
Mistral-Medium-3.5-128B-Q3_K_M on 3x3090 (72GB VRAM)
Reddit r/LocalLLaMA / 5/4/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The post demonstrates the local inference speed of Mistral Medium 3.5 128B using a Q3 quantized model on a multi-GPU setup of three NVIDIA 3090 cards with a combined 72GB VRAM.
- It includes performance screenshots and output rendered in multiple formats, suggesting the author ran benchmarks and verified end-to-end responsiveness.
- The use of 3x3090 indicates a practical approach for running larger LLMs locally by relying on quantization (Q3) and multi-GPU distribution.
- Overall, the content focuses on real-world throughput/latency behavior rather than describing a new model release or vendor announcement.
Related Articles

Black Hat USA
AI Business

5 AI Prompts That Write Better Marketing Copy Than Most Humans
Dev.to

Giving an AI agent a recon toolbox: wiring 30+ security tools into an MCP server
Dev.to

I'm Offering AI-Powered Copywriting Services - Starting at /Post
Dev.to

Agent Workspace as Code: stop copy-pasting your CLAUDE.md across projects
Dev.to