| So after working on boot AI I had purchased some old bitcoin mining hardware to see if I could run old nvidia card on them. So I built a system that multiplexes 6 GPU dies through a single PCIe slot using a custom Linux kernel module. Switch between loaded models in under a millisecond. Hardware: - BTC-S37 mining motherboard (Picked up 6 on ebay from a total bro getting rid of his old gpu mining setup.) - 3x NVIDIA K80 cards = 6 dies, 72GB VRAM total - Total: ~$200 for 72GB of GPU VRAM Results: - 38 tok/s decode on RWKV-X 0.2B (INT8) - 0.3ms average switch time between dies - 10 rapid swap cycles, zero degradation - Each die holds its own model persistently The inference engine is pure C with zero Python dependencies. Still early but the goal is to have all 8 slots filled on the board so models can be loaded and switchable at will on dirt-cheap hardware. Why? because I'm to broke to afford better hardware and I am capable enough to write the kernel objects needed to get it running. This mother board of the shelf cant even run one of these cards. Super fun project. Now I need to optimize and get a better models running on it. [link] [comments] |
6-GPU multiplexer from K80s ‚ hot-swap between models in 0.3ms
Reddit r/LocalLLaMA / 3/18/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- The author built a system that multiplexes six GPU dies through a single PCIe slot using a custom Linux kernel module, enabling model hot-swapping in under 1 millisecond.
- The hardware setup uses BTC-S37 mining motherboards and three NVIDIA K80 cards, providing six dies and about 72GB VRAM for roughly $200.
- Performance reported includes 38 tokens per second decode on RWKV-X 0.2B (INT8) and a 0.3 ms average switch time between dies, with 10 rapid swap cycles and no degradation.
- Each die retains its loaded model persistently, and the inference engine is implemented in pure C with zero Python dependencies.
- The project aims to fill eight slots on the board so models can be loaded and switched on dirt-cheap hardware, highlighting a practical path for budget AI inference.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to

Perplexity Hub
Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to