| I think I figured out why apple says 4x the peak GPU AI compute. It's because they load it with a bunch of power for a few seconds. So it looks like half the performance comes from AI accelerators and the other half from dumping more watts in (or the AI accelerators use more watts). Press release: This is good for short bursty prompts but longer ones I imagine the speed gains diminish. After doing more tests the sweet spot is around 16K tokens, coincidentally that is what apple tested in the footnotes:
I did some thermal testing with 10 second cool down in between inference just for kicks as well. [link] [comments] |
M5 Max Actual Pre-fill performance gains
Reddit r/LocalLLaMA / 3/24/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage
Key Points
- The article discusses why Apple’s claimed “over 4x peak GPU AI compute” for M5 Pro/M5 Max may reflect short, power-bursty performance rather than sustained throughput.
- It suggests that both AI accelerator behavior and increased power/thermal headroom contribute to the visible peak gains, making results strongest for short prompts.
- Based on further user testing, the postulated performance “sweet spot” occurs around ~16K tokens, aligning with Apple’s own footnote testing conditions.
- The cited test setup measures time-to-first-token using a 14B-parameter model (4-bit weights, FP16 activations) on different MacBook Pro generations with MLX/ mlx-lm, emphasizing prefill behavior.
- The discussion notes that any speed advantages may taper off for longer prompts as the workload extends beyond the initial high-power window.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Interactive Web Visualization of GPT-2
Reddit r/artificial
Stop Treating AI Interview Fraud Like a Proctoring Problem
Dev.to
[R] Causal self-attention as a probabilistic model over embeddings
Reddit r/MachineLearning
The 5 software development trends that actually matter in 2026 (and what they mean for your startup)
Dev.to
InVideo AI Review: Fast Finished
Dev.to