Almost 10,000 Apple Silicon benchmark runs submitted by the community — here's what the data actually shows

Reddit r/LocalLLaMA / 3/13/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The author built oMLX, an SSD-cached local inference server for Apple Silicon, with a benchmark submission feature to standardize results after frustration with scattered, hard-to-compare posts.
The dataset now contains nearly 10,000 benchmark runs, revealing patterns and cross-chip relationships that small samples miss.
Key findings include M5 Max reaching about 1,200 PP tok/s at 1k-8k context on Qwen 3.5 122b 4-bit, M3 Ultra around 893 PP tok/s at 1k but staying consistent through 8k, and M4 Max around the 500s across most contexts.
A direct comparison page (omlx.ai/c/jmxd8a4) lets you explore the data and compare chips side-by-side, making it practical for hardware and deployment decisions.
Submitting results is built into oMLX and takes about 30 seconds, encouraging broad participation to improve reliability.

Almost 10,000 Apple Silicon benchmark runs submitted by the community — here's what the data actually shows

This started with a frustration I think a lot of people here share.

The closest thing to a real reference has been the llama.cpp GitHub discussion #4167, genuinely useful, but hundreds of comments spanning two years with no way to filter by chip or compare models side by side. Beyond that, everything is scattered: reddit posts from three months ago, someone's gist, one person reporting tok/s and another reporting "feels fast." None of it is comparable.

So I started keeping my own results in a spreadsheet. Then the spreadsheet got unwieldy.
Then I just built oMLX: SSD-cached local inference server for Apple Silicon with a benchmark submission built in.

It went a little unexpected: the app hit 3.8k GitHub stars in 3 days after going viral in some communities I wasn't even targeting. Benchmark submissions came in like a flood, and now there are nearly 10,000 runs in the dataset.

With that much data, patterns start to emerge that you just can't see from a handful of runs:

M5 Max hits ~1,200 PP tok/s at 1k-8k context on Qwen 3.5 122b 4bit, then holds above 1,000 through 16k
M3 Ultra starts around 893 PP tok/s at 1k and stays consistent through 8k before dropping off
M4 Max sits in the 500s across almost all context lengths — predictable, but clearly in a different tier
The crossover points between chips at longer contexts tell a more interesting story than the headline numbers

Here's a direct comparison you can explore: https://omlx.ai/c/jmxd8a4

Even if you're not on Apple Silicon, this is probably the most comprehensive community-sourced MLX inference dataset that exists right now. Worth a look if you're deciding between chips or just curious what real-world local inference ceilings look like at this scale.

If you are on Apple Silicon - every run makes the comparison more reliable for everyone. Submission is built into oMLX and takes about 30 seconds.

What chip are you on, and have you noticed throughput behavior at longer contexts?

submitted by /u/cryingneko
[link] [comments]

Astral to Join OpenAI

Dev.to

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic

Dev.to

Your AI coding agent is installing vulnerable packages. I built the fix.

Dev.to

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Almost 10,000 Apple Silicon benchmark runs submitted by the community — here's what the data actually shows

Key Points

Related Articles

Astral to Join OpenAI

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic

Your AI coding agent is installing vulnerable packages. I built the fix.

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer