backend-agnostic tensor parallelism has been merged into llama.cpp

Reddit r/LocalLLaMA / 4/9/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • llama.cpp has merged “backend-agnostic tensor parallelism,” enabling faster model execution when multiple GPUs are available.
  • The update introduces a new option (-sm tensor) to try, while -sm layer remains the default behavior.
  • “Backend-agnostic” indicates you don’t need CUDA specifically to benefit from tensor parallelism.
  • The feature is marked experimental, with potentially poor results depending on the model and requiring trial and error across configurations.
backend-agnostic tensor parallelism has been merged into llama.cpp

if you have more than one GPU - your models can now run much faster

-sm layer is the default behaviour, -sm tensor is the new thing to try

"backend-agnostic" means you don't need CUDA to enjoy this

This is experimental, and in your case the results may be poor (try different models). You have been warned!!!

submitted by /u/jacek2023
[link] [comments]