I’m running a Mac Studio Ultra (512GB RAM) and I’ve been experimenting with local LLMs on it over the past few months.
Most of my work is in data heavy prototyping and small scale model experimentation (mainly testing inference pipelines, working with embeddings, and occasionally running larger context models for research style analysis). I also do a lot of software development around AI tooling and automation workflows, but nothing at a production training scale.
To be honest, I feel like the machine is way beyond what I actually need for my current workflow.
So I’m trying to understand how others are utilizing similar setups more effectively.
A few things I’m curious about:
What are you realistically running on systems with this much RAM?
Are people actually benefiting from going beyond ~70B models in local setups?
At what point does GPU/compute become the real limitation instead of memory?
Any workflows where a setup like this actually shines (multi model pipelines, heavy context, parallel inference, etc.)?
Right now I mostly use tools like Ollama / MLX / Python based inference stacks, but I feel like I’m not really leveraging the hardware properly.
[link] [comments]



