Got hipfire running in Docker on my RX 7900 XTX alongside llamacpp

Reddit r/LocalLLaMA / 5/1/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The author reports successfully getting hipfire running in Docker on an AMD Radeon RX 7900 XTX alongside an existing llama.cpp setup without changing the existing stack.
  • They tested Qwen3.6 27B (MQ4) and observed that key components like the TriAttention sidecar and DFlash draft load correctly according to logs.
  • In their early results, throughput is around 40 tokens per second (AR), and the API behavior appears clean, though they haven’t confirmed whether DFlash is actively engaging.
  • They note a practical dockerization detail: hipfire is not a single executable binary, but a Bun/TypeScript HTTP server that launches the engine as a subprocess.
  • The author says they may publish a Dockerfile and docker-compose setup on GitHub soon and is open to questions.

Been dealing with long context failures on Qwen3.6 27B and stumbled onto hipfire. Spent an evening dockerizing it so it runs alongside an existing llamacpp stack without touching anything.

Running Qwen3.6 27B MQ4 on a 7900 XTX. The TriAttention sidecar and DFlash draft both load correctly per the logs. ~40 tok/s AR, haven't confirmed DFlash is actually engaging yet. Still early but it responds correctly and the API is clean.

One thing that tripped me up: hipfire isn't a single binary you just run. The CLI is a Bun/TypeScript HTTP server that spawns the engine as a subprocess. Relevant if you're trying to dockerize it.

If there's interest I'll put the Dockerfile and compose setup on GitHub tomorrow. Happy to answer questions in the meantime.

submitted by /u/AgentErgoloid
[link] [comments]