Venturing into the world of local LLM's, would love some pointers!

Reddit r/LocalLLaMA / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The author is transitioning from cloud-based models to running local LLMs on laptops/GPU setups, noting how far the ecosystem has progressed in a few years.
  • They’re evaluating new models like Gemma 4 and Qwen 3.6, and report running qwen3.6-35b-a3b on a MacBook Pro with 48GB RAM at roughly 50 tokens per second.
  • Their main motivation is to reduce friction from cloud/model limits at work, especially when Claude-based workflows hit usage caps.
  • They are looking for community guidance on best practices for deploying local LLMs in a work setting, including how to think about quantization and tools such as Unsloth.
  • Overall, the post is a practical inquiry and early experimentation rather than an announcement, focusing on “what’s possible” and implementation pointers.

Hi everyone!

Very exciting times we live in where we can run models from laptops and GPU's which 4 years ago would've been SOTA.

I have been working with cloud models for years now, and I am now starting to dig into local models.

At work, I am leading a few different AI projects across the biz, and with our devs (who all love claude and have seen real value from it), our biggest pain point is the limits at the moment.

SO, I have started to have a play to see what the art of the possible is with local models. I have been keeping an eye on it for a while, but Gemma 4 peaked my interest, and then luckily the new Qwen 3.6 model popped out too.

We run MBP's for dev teams at work (mine has 48GB memory), so I am able to run the new qwen3.6-35b-a3b model at around 50 tok/s, which is great. I'd be keen to understand more from others how they are considering using these at work to bridge the gap of when claude limits cap out.

I also have a lot to learn about quant(?) and unsloth is a thing I keep seeing banded around.

submitted by /u/itsDitch
[link] [comments]