| I wanted to share an open-source app that I built for running LLMs locally on my setup. My setupHardware
Software: Ubuntu 25.10, llama.cpp built from source for cuda + vulkan, rocm. How I use this appI generally run two models in parallel using different Llama backends simultaneously - Qwen3.6 27b UD-Q6-KXL or NVFP4 on CUDA, and Qwen3.6 35b A3B UD-Q6-KXL on the Strix Halo unified memory. I mostly use them with opencode for coding. The built in model-router comes in handy. What else can the app doDoes basic things any llama.cpp wrappers can do + some other things. Overall it's a convenience app to spin up llama-server instances for any purposes. And it's open-source.
More info on the Read Me, along with some guides. It's an early-stage alpha release, so expect some minor bugs - I have mostly fixed the major ones. Feature requests as well as bug reports are welcome. --- Setting up ROCm on Strix Halo (Ubuntu 25.10)Strix Halo on Linux needs some setup before ROCm works natively for gfx1151. I am aware of the docker-based toolboxes for Strix Halo. They work and are a good option. I just wanted bare-metal without containers. I am including the steps below for those interested in trying it out.
Additionally, I build llama.cpp from source for CUDA 13.2 (for RTX Pro 5000) with the standard --- PS. Apple Mac: I dont own a Mac so I am unable to test the app on MacOS yet. Feel free to build from source, or share the build with me so I can add it to the releases on GitHub, I can shout-out to your GitHub handle in the ReadMe, thanks :) [link] [comments] |
Warpdrv - my open-source Llama.cpp launcher for daily-driving Qwen 35b + 27b on Strix Halo + RTX Pro.
Reddit r/LocalLLaMA / 5/3/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The author released Warpdrv, an open-source desktop launcher designed to run LLMs locally using llama.cpp and to conveniently manage multiple backend sessions.
- The setup focuses on running Qwen 3.6 27B and 35B in parallel, using CUDA for one model and Strix Halo unified memory with ROCm/Vulkan support for the other.
- Warpdrv includes features such as chat tool calling via MCP.json, a model router for coding-focused workflows, and experimental KV-cache checkpointing.
- The project does not bundle a prebuilt llama.cpp binary, but provides configurable “recipes” (bash scripts with UI) to build backends with one click.
- The post also shares early, bare-metal instructions for getting ROCm working on Strix Halo under Ubuntu 25.10, including kernel, BIOS, and configuration steps.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat USA
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Panduan Lengkap TestSprite MCP Server: Dari Instalasi hingga Pengujian Pertama
Dev.to

Accelerating CNN inference on FPGAs: A Survey
Dev.to

Pudgy Penguins'de AI Tabanlı Tokenomik Etkileşimlerin DeFi Uygulamaları
Dev.to