M5 Pro LLM benchmark

Reddit r/LocalLLaMA / 3/12/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

The benchmark runs on an M5 Pro 18-core Mac with 24 GB RAM comparing several models, including gpt-oss-20B MXFP4 MoE and Qwen_Qwen3.5-9B-Q6_K.gguf, to assess on-device LLM performance.
The gpt-oss 20B MXFP4 MoE entry used 11.27 GiB and 20.91B parameters and achieved about 1,727.85 ± 5.51 t/s on the pp512 test and 84.07 ± 0.82 t/s on the tg128 test on an MTL0 device.
The author notes that with only 16 GB RAM on their M1 Pro they could load only one of the three models, highlighting RAM constraints for local benchmarking.
The report includes detailed Metal pipeline initialization and GPU capability information (MTL0, GPU families, unified memory, bfloat/tensor support), illustrating the hardware/software environment used.
The post aims to be useful for others evaluating hardware choices for LLM benchmarking on consumer hardware.

I thinking of upgrading my M1 Pro machine and went to the store tonight and ran a few benchmarks. I have seen almost nothing using about the Pro, all the reviews are on the Max. Here are a couple of llama-bench results for 3 models (and comparisons to my personal M1 Pro and work M2 Max). Sadly, my M1 Pro only has 16gb so only was able to load 1 of the 3 models. Hopefully this is useful for people!

M5 Pro 18 Core

========================================== Llama Benchmarking Report ========================================== OS: Darwin CPU: Apple_M5_Pro RAM: 24 GB Date: 20260311_195705 ========================================== --- Model: gpt-oss-20b-mxfp4.gguf --- --- Device: MTL0 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x103b730e0 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x103b728e0 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.005 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 19069.67 MB | model | size | params | backend | threads | dev | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------ | --------------: | -------------------: | | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | MTL,BLAS | 6 | MTL0 | pp512 | 1727.85 ± 5.51 | | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | MTL,BLAS | 6 | MTL0 | tg128 | 84.07 ± 0.82 | build: ec947d2b1 (8270) Status (MTL0): SUCCESS ------------------------------------------ --- Model: Qwen_Qwen3.5-9B-Q6_K.gguf --- --- Device: MTL0 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x105886820 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x105886700 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.008 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 19069.67 MB | model | size | params | backend | threads | dev | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------ | --------------: | -------------------: | | qwen35 9B Q6_K | 7.12 GiB | 8.95 B | MTL,BLAS | 6 | MTL0 | pp512 | 807.89 ± 1.13 | | qwen35 9B Q6_K | 7.12 GiB | 8.95 B | MTL,BLAS | 6 | MTL0 | tg128 | 30.68 ± 0.42 | build: ec947d2b1 (8270) Status (MTL0): SUCCESS ------------------------------------------ --- Model: Qwen3.5-35B-A3B-UD-IQ2_XXS.gguf --- --- Device: MTL0 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x101c479a0 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x101c476e0 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.005 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 19069.67 MB | model | size | params | backend | threads | dev | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------ | --------------: | -------------------: | | qwen35moe 35B.A3B IQ2_XXS - 2.0625 bpw | 9.91 GiB | 34.66 B | MTL,BLAS | 6 | MTL0 | pp512 | 1234.75 ± 5.75 | | qwen35moe 35B.A3B IQ2_XXS - 2.0625 bpw | 9.91 GiB | 34.66 B | MTL,BLAS | 6 | MTL0 | tg128 | 53.71 ± 0.24 | build: ec947d2b1 (8270) Status (MTL0): SUCCESS ------------------------------------------

M2 Max

========================================== Llama Benchmarking Report ========================================== OS: Darwin CPU: Apple_M2_Max RAM: 32 GB Date: 20260311_094015 ========================================== --- Model: gpt-oss-20b-mxfp4.gguf --- ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.014 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 22906.50 MB | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | MTL,BLAS | 8 | pp512 | 1224.14 ± 2.37 | | gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | MTL,BLAS | 8 | tg128 | 88.01 ± 1.96 | build: 0beb8db3a (8250) Status: SUCCESS ------------------------------------------ --- Model: Qwen_Qwen3.5-9B-Q6_K.gguf --- ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.008 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 22906.50 MB | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | qwen35 9B Q6_K | 7.12 GiB | 8.95 B | MTL,BLAS | 8 | pp512 | 553.54 ± 2.74 | | qwen35 9B Q6_K | 7.12 GiB | 8.95 B | MTL,BLAS | 8 | tg128 | 31.08 ± 0.39 | build: 0beb8db3a (8250) Status: SUCCESS ------------------------------------------ --- Model: Qwen3.5-35B-A3B-UD-IQ2_XXS.gguf --- ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.007 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 22906.50 MB | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | qwen35moe 35B.A3B IQ2_XXS - 2.0625 bpw | 9.91 GiB | 34.66 B | MTL,BLAS | 8 | pp512 | 804.50 ± 4.09 | | qwen35moe 35B.A3B IQ2_XXS - 2.0625 bpw | 9.91 GiB | 34.66 B | MTL,BLAS | 8 | tg128 | 42.22 ± 0.35 | build: 0beb8db3a (8250) Status: SUCCESS ------------------------------------------

M1 Pro

========================================== Llama Benchmarking Report ========================================== OS: Darwin CPU: Apple_M1_Pro RAM: 16 GB Date: 20260311_100338 ========================================== --- Model: Qwen_Qwen3.5-9B-Q6_K.gguf --- --- Device: MTL0 --- ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.007 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple7 (1007) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 11453.25 MB | model | size | params | backend | threads | dev | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------ | --------------: | -------------------: | | qwen35 9B Q6_K | 7.12 GiB | 8.95 B | MTL,BLAS | 8 | MTL0 | pp512 | 204.59 ± 0.22 | | qwen35 9B Q6_K | 7.12 GiB | 8.95 B | MTL,BLAS | 8 | MTL0 | tg128 | 14.52 ± 0.95 | build: 96cfc4992 (8260) Status (MTL0): SUCCESS

submitted by /u/Fit-Later-389
[link] [comments]

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

Ledge.ai

AIと創作

note

働くライター｜AI×note

note

まな式AI活用術で、人生が動き出した人たち

note

【教えてAI】「カメラのいらないテレビ電話」「POPOPO」って何？

note

M5 Pro LLM benchmark

Key Points

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

AIと創作

働くライター｜AI×note

まな式AI活用術で、人生が動き出した人たち

【教えてAI】「カメラのいらないテレビ電話」「POPOPO」って何？

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開 初のライセンス提供 のサムネイル画像

AIと創作

働くライター｜AI×note

まな式AI活用術で、人生が動き出した人たち

【教えてAI】「カメラのいらないテレビ電話」「POPOPO」って何？

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像