M5 Max just arrived - benchmarks incoming

Reddit r/LocalLLaMA / 3/11/2026

📰 NewsTools & Practical Usage

共有:

Key Points

The M5 Max 128GB 14" hardware has just arrived and is now undergoing benchmark testing using AI language models.
Initial tests using BatchGenerator were unsatisfactory, so testing was redone in a clean Python virtual environment with mlx_lm's stream_generate, causing some delay.
Benchmarks results for multiple models including Qwen3.5-122B-A10B-4bit and Qwen3-Coder-Next-8bit have been shared showing tokens per second and memory usage.
The summary focuses on raw performance metrics rather than qualitative assessments or video content.
The author omitted one model, Qwen3.5-35B-A3B-4bit, due to not having it downloaded, indicating partial but significant coverage of relevant models.

M5 Max just arrived - benchmarks incoming

The M5 Max 128GB 14" has just arrived. I've been looking forward to putting this through its paces. Testing begins now. Results will be posted as comments below — no video, no lengthy writeup, just the raw numbers. Clean and simple.

Apologies for the delay. I initially ran the tests using BatchGenerator, but the speeds weren't quite what I expected. I ended up setting up a fresh Python virtual environment and re-running everything with pure mlx_lm using stream_generate, which is what pushed the update back.

I know many of you have been waiting - I'm sorry for keeping you! I take it as a sign of just how much excitement there is around the M5 Max.(I was genuinely hyped for this one myself.) Personally, I'm really happy with the results. What do you all think?

Models Tested

Qwen3.5-122B-A10B-4bit
Qwen3-Coder-Next-8bit
Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit
gpt-oss-120b-MXFP4-Q8

As for Qwen3.5-35B-A3B-4bit — I don't actually have that one downloaded, so unfortunately I wasn't able to include it. Sorry about that!

Results were originally posted as comments, and have since been compiled here in the main post for easier access

Qwen3.5-122B-A10B-4bit (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3.5-122B-A10B-4bit --prompt "$(cat /tmp/prompt_4096.txt)" --max-tokens 128 ========== Prompt: 4106 tokens, 881.466 tokens-per-sec Generation: 128 tokens, 65.853 tokens-per-sec Peak memory: 71.910 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3.5-122B-A10B-4bit --prompt "$(cat /tmp/prompt_16384.txt)" --max-tokens 128 ========== Prompt: 16394 tokens, 1239.734 tokens-per-sec Generation: 128 tokens, 60.639 tokens-per-sec Peak memory: 73.803 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3.5-122B-A10B-4bit --prompt "$(cat /tmp/prompt_32768.txt)" --max-tokens 128 ========== Prompt: 32778 tokens, 1067.824 tokens-per-sec Generation: 128 tokens, 54.923 tokens-per-sec Peak memory: 76.397 GB Qwen3-Coder-Next-8bit (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3-Coder-Next-8bit --prompt "$(cat /tmp/prompt_4096.txt)" --max-tokens 128 ========== Prompt: 4105 tokens, 754.927 tokens-per-sec Generation: 60 tokens, 79.296 tokens-per-sec Peak memory: 87.068 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3-Coder-Next-8bit --prompt "$(cat /tmp/prompt_16384.txt)" --max-tokens 128 ========== Prompt: 16393 tokens, 1802.144 tokens-per-sec Generation: 60 tokens, 74.293 tokens-per-sec Peak memory: 88.176 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3-Coder-Next-8bit --prompt "$(cat /tmp/prompt_32768.txt)" --max-tokens 128 ========== Prompt: 32777 tokens, 1887.158 tokens-per-sec Generation: 58 tokens, 68.624 tokens-per-sec Peak memory: 89.652 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3-Coder-Next-8bit --prompt "$(cat /tmp/prompt_65536.txt)" --max-tokens 128 ========== Prompt: 65545 tokens, 1432.730 tokens-per-sec Generation: 61 tokens, 48.212 tokens-per-sec Peak memory: 92.605 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3-Coder-Next-8bit --prompt "$(cat /tmp/prompt_16384.txt)" --max-tokens 128 ========== Prompt: 16393 tokens, 1802.144 tokens-per-sec Generation: 60 tokens, 74.293 tokens-per-sec Peak memory: 88.176 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3-Coder-Next-8bit --prompt "$(cat /tmp/prompt_32768.txt)" --max-tokens 128 ========== Prompt: 32777 tokens, 1887.158 tokens-per-sec Generation: 58 tokens, 68.624 tokens-per-sec Peak memory: 89.652 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3-Coder-Next-8bit --prompt "$(cat /tmp/prompt_65536.txt)" --max-tokens 128 ========== Prompt: 65545 tokens, 1432.730 tokens-per-sec Generation: 61 tokens, 48.212 tokens-per-sec Peak memory: 92.605 GB Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit --prompt "$(cat /tmp/prompt_4096.txt)" --max-tokens 128 ========== Prompt: 4107 tokens, 811.134 tokens-per-sec Generation: 128 tokens, 23.648 tokens-per-sec Peak memory: 25.319 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit --prompt "$(cat /tmp/prompt_16384.txt)" --max-tokens 128 ========== Prompt: 16395 tokens, 686.682 tokens-per-sec Generation: 128 tokens, 20.311 tokens-per-sec Peak memory: 27.332 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit --prompt "$(cat /tmp/prompt_32768.txt)" --max-tokens 128 ========== Prompt: 32779 tokens, 591.383 tokens-per-sec Generation: 128 tokens, 14.908 tokens-per-sec Peak memory: 30.016 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-6bit --prompt "$(cat /tmp/prompt_65536.txt)" --max-tokens 128 ========== Prompt: 65547 tokens, 475.828 tokens-per-sec Generation: 128 tokens, 14.225 tokens-per-sec Peak memory: 35.425 GB gpt-oss-120b-MXFP4-Q8 (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/gpt-oss-120b-MXFP4-Q8 --prompt "$(cat /tmp/prompt_4096.txt)" --max-tokens 128 ========== Prompt: 4164 tokens, 1325.062 tokens-per-sec Generation: 128 tokens, 87.873 tokens-per-sec Peak memory: 64.408 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/gpt-oss-120b-MXFP4-Q8 --prompt "$(cat /tmp/prompt_16384.txt)" --max-tokens 128 ========== Prompt: 16452 tokens, 2710.460 tokens-per-sec Generation: 128 tokens, 75.963 tokens-per-sec Peak memory: 64.857 GB (mlx) cryingneko@MacBook-Pro mlx-lm % mlx_lm.generate --model /Volumes/SSD/Models/gpt-oss-120b-MXFP4-Q8 --prompt "$(cat /tmp/prompt_32768.txt)" --max-tokens 128 ========== Prompt: 32836 tokens, 2537.420 tokens-per-sec Generation: 128 tokens, 64.469 tokens-per-sec Peak memory: 65.461 GB

submitted by /u/cryingneko
[link] [comments]

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

Ledge.ai

AIと創作

note

働くライター｜AI×note

note

まな式AI活用術で、人生が動き出した人たち

note

【教えてAI】「カメラのいらないテレビ電話」「POPOPO」って何？

note

M5 Max just arrived - benchmarks incoming

Key Points

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

AIと創作

働くライター｜AI×note

まな式AI活用術で、人生が動き出した人たち

【教えてAI】「カメラのいらないテレビ電話」「POPOPO」って何？

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開 初のライセンス提供 のサムネイル画像

AIと創作

働くライター｜AI×note

まな式AI活用術で、人生が動き出した人たち

【教えてAI】「カメラのいらないテレビ電話」「POPOPO」って何？

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像