Has anyone run the standard llama-cpp llama2-7B q4_0 benchmark on an M5 Max?

Reddit r/LocalLLaMA / 3/24/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The post asks whether anyone has benchmarked the standard llama.cpp/llama-bench setup for Llama 2 7B (q4_0) on an M5 Max Apple Silicon machine.
  • It specifically requests publishing the output figures labeled as PP and TG from a particular llama-bench command configuration (512 context/parameters, n=128, and full GPU offload via -ngl 99).
  • The motivation is that the author couldn’t find reported results in the existing llama.cpp metal performance tracking discussion thread referenced.
  • It is essentially a call for community-provided performance data to fill a gap in tracked Metal backend benchmarks.
  • No new model release or benchmark result is presented in the post itself; it’s a request to collect measurements from others.

Not seeing any reports in the llama-cpp metal performance tracking github issue .

If anyone has access to this machine could you post the PP and TG results of:

./llama-bench \ -m llama-7b-v2/ggml-model-q4_0.gguf \ -p 512 -n 128 -ngl 99 
submitted by /u/ForsookComparison
[link] [comments]