6-GPU multiplexer from K80s ‚ hot-swap between models in 0.3ms

Reddit r/LocalLLaMA / 3/18/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

The author built a system that multiplexes six GPU dies through a single PCIe slot using a custom Linux kernel module, enabling model hot-swapping in under 1 millisecond.
The hardware setup uses BTC-S37 mining motherboards and three NVIDIA K80 cards, providing six dies and about 72GB VRAM for roughly $200.
Performance reported includes 38 tokens per second decode on RWKV-X 0.2B (INT8) and a 0.3 ms average switch time between dies, with 10 rapid swap cycles and no degradation.
Each die retains its loaded model persistently, and the inference engine is implemented in pure C with zero Python dependencies.
The project aims to fill eight slots on the board so models can be loaded and switched on dirt-cheap hardware, highlighting a practical path for budget AI inference.

6-GPU multiplexer from K80s ‚ hot-swap between models in 0.3ms

So after working on boot AI I had purchased some old bitcoin mining hardware to see if I could run old nvidia card on them. So I built a system that multiplexes 6 GPU dies through a single PCIe slot using a custom Linux kernel module. Switch between loaded models in under a millisecond.

Hardware:

- BTC-S37 mining motherboard (Picked up 6 on ebay from a total bro getting rid of his old gpu mining setup.)

- 3x NVIDIA K80 cards = 6 dies, 72GB VRAM total

- Total: ~$200 for 72GB of GPU VRAM

Results:

- 38 tok/s decode on RWKV-X 0.2B (INT8)

- 0.3ms average switch time between dies

- 10 rapid swap cycles, zero degradation

- Each die holds its own model persistently

The inference engine is pure C with zero Python dependencies. Still early but the goal is to have all 8 slots filled on the board so models can be loaded and switchable at will on dirt-cheap hardware.

Why? because I'm to broke to afford better hardware and I am capable enough to write the kernel objects needed to get it running. This mother board of the shelf cant even run one of these cards. Super fun project. Now I need to optimize and get a better models running on it.

submitted by /u/Electrical_Ninja3805
[link] [comments]