Has anyone run the standard llama-cpp llama2-7B q4_0 benchmark on an M5 Max?

Reddit r/LocalLLaMA / 3/24/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

共有:

Key Points

The post asks whether anyone has benchmarked the standard llama.cpp/llama-bench setup for Llama 2 7B (q4_0) on an M5 Max Apple Silicon machine.
It specifically requests publishing the output figures labeled as PP and TG from a particular llama-bench command configuration (512 context/parameters, n=128, and full GPU offload via -ngl 99).
The motivation is that the author couldn’t find reported results in the existing llama.cpp metal performance tracking discussion thread referenced.
It is essentially a call for community-provided performance data to fill a gap in tracked Metal backend benchmarks.
No new model release or benchmark result is presented in the post itself; it’s a request to collect measurements from others.

If anyone has access to this machine could you post the PP and TG results of:

./llama-bench \ -m llama-7b-v2/ggml-model-q4_0.gguf \ -p 512 -n 128 -ngl 99

Reddit r/artificial

Dev.to

Dev.to

Dev.to

Dev.to