RTX 5080 with 16 GB VRAM, 64 GB RAM best quantized model for programming?

Reddit r/LocalLLaMA / 5/3/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post asks which quantized AI model users can run locally on an RTX 5080 with 16GB VRAM and 64GB system RAM for agentic programming tasks.
  • It frames the problem around fitting model size and quantization level to the hardware constraints to achieve usable performance.
  • The question is directed at the Local LLaMA community and seeks practical recommendations rather than a specific model announcement.
  • Overall, it’s a setup-and-requirements inquiry about choosing an appropriate local LLM configuration for agent-like coding workflows.

I have an RTX 5080 with 16 GB of VRAM and 64 GB of RAM. What's the best quantized model I can run locally on this setup for agentic programming?

submitted by /u/Additional-Ordinary2
[link] [comments]