| if you have more than one GPU - your models can now run much faster -sm layer is the default behaviour, -sm tensor is the new thing to try "backend-agnostic" means you don't need CUDA to enjoy this This is experimental, and in your case the results may be poor (try different models). You have been warned!!! [link] [comments] |
backend-agnostic tensor parallelism has been merged into llama.cpp
Reddit r/LocalLLaMA / 4/9/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage
Key Points
- llama.cpp has merged “backend-agnostic tensor parallelism,” enabling faster model execution when multiple GPUs are available.
- The update introduces a new option (-sm tensor) to try, while -sm layer remains the default behavior.
- “Backend-agnostic” indicates you don’t need CUDA specifically to benefit from tensor parallelism.
- The feature is marked experimental, with potentially poor results depending on the model and requiring trial and error across configurations.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents
MarkTechPost
I tested and ranked every ai companion app I tried and here's my honest breakdown
Reddit r/artificial

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to