Been dealing with long context failures on Qwen3.6 27B and stumbled onto hipfire. Spent an evening dockerizing it so it runs alongside an existing llamacpp stack without touching anything.
Running Qwen3.6 27B MQ4 on a 7900 XTX. The TriAttention sidecar and DFlash draft both load correctly per the logs. ~40 tok/s AR, haven't confirmed DFlash is actually engaging yet. Still early but it responds correctly and the API is clean.
One thing that tripped me up: hipfire isn't a single binary you just run. The CLI is a Bun/TypeScript HTTP server that spawns the engine as a subprocess. Relevant if you're trying to dockerize it.
If there's interest I'll put the Dockerfile and compose setup on GitHub tomorrow. Happy to answer questions in the meantime.
[link] [comments]



