Venturing into the world of local LLM's, would love some pointers!

Reddit r/LocalLLaMA / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Read original →

共有:

Key Points

The author is transitioning from cloud-based models to running local LLMs on laptops/GPU setups, noting how far the ecosystem has progressed in a few years.
They’re evaluating new models like Gemma 4 and Qwen 3.6, and report running qwen3.6-35b-a3b on a MacBook Pro with 48GB RAM at roughly 50 tokens per second.
Their main motivation is to reduce friction from cloud/model limits at work, especially when Claude-based workflows hit usage caps.
They are looking for community guidance on best practices for deploying local LLMs in a work setting, including how to think about quantization and tools such as Unsloth.
Overall, the post is a practical inquiry and early experimentation rather than an announcement, focusing on “what’s possible” and implementation pointers.

Hi everyone!

Very exciting times we live in where we can run models from laptops and GPU's which 4 years ago would've been SOTA.

I have been working with cloud models for years now, and I am now starting to dig into local models.

At work, I am leading a few different AI projects across the biz, and with our devs (who all love claude and have seen real value from it), our biggest pain point is the limits at the moment.

SO, I have started to have a play to see what the art of the possible is with local models. I have been keeping an eye on it for a while, but Gemma 4 peaked my interest, and then luckily the new Qwen 3.6 model popped out too.

We run MBP's for dev teams at work (mine has 48GB memory), so I am able to run the new qwen3.6-35b-a3b model at around 50 tok/s, which is great. I'd be keen to understand more from others how they are considering using these at work to bridge the gap of when claude limits cap out.

I also have a lot to learn about quant(?) and unsloth is a thing I keep seeing banded around.

submitted by /u/itsDitch
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/20DailyView insight →

Black Hat USA

AI Business

Black Hat Asia

AI Business

How to Turn Any SaaS Into a Telegram Bot in 30 Minutes Using OpenClaw

Dev.to

Headless everything for personal AI

Simon Willison's Blog

Which model to summarize rss news articles

Reddit r/LocalLLaMA

Venturing into the world of local LLM's, would love some pointers!

Key Points

💡 Insights using this article

Related Articles

Black Hat USA

Black Hat Asia

How to Turn Any SaaS Into a Telegram Bot in 30 Minutes Using OpenClaw

Headless everything for personal AI

Which model to summarize rss news articles

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer