Kimi K2.5 - running locally without GPU; splitting across multiple PCs?

Reddit r/LocalLLaMA / 3/28/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The author reports early testing of Kimi K2.5 (Unsloth 4-bit UD K-XL quant, ~620GB) running locally on dual Xeon servers without a GPU, achieving about 1 token per second with power-saving RAM settings.
  • They observed that the setup is slow but usable for long prompts, concluding that inference doesn’t “wake up” the system heavily even under low-power configuration.
  • The author plans to evaluate whether splitting the workload across two identical servers with large RAM (e.g., 512GB each) connected via Ethernet can improve performance.
  • Their hypothesis is that doubled memory capacity and increased core count/bandwidth could help, but they are concerned about whether the Ethernet link becomes the bottleneck compared to local memory access.
  • They ask for advice—particularly around networking/topology—mentioning available network hardware (10GbE and various 1GbE ports, plus some fiber-capable cards) on spare ISP servers.

I recently got some old servers, and have done some early testing of Kimi K2.5. So far, I have tried running the unsloth 4-bit UD K XL quant (~620gb) on just one computer with 768GB RAM. I had max power saving mode on (memory forced down to 800MHz, and the Xeons only reached 61 degrees C! I got 1 token per second with this configuration … and it doesn’t sound like SkyNet is waking up whenever I run inference!

1 token/sec seems ‘uselessly slow’, but I can write a detailed prompt, go make a cup of tea, come back, and the task is completed :)

I am interested in linking multiple PCs together to see if it could improve performance. I bought 3 nearly identical servers (IBM X3650 M4), 2 working, one faulty. I got 32 sticks of ‘Hypercloud’ 32gb DDR3 RAM modules with the working servers, and 384gb of 16gb DIMMs with the broken server (also, you can’t mix memory types in one server). The 384gb went down to 368gb, as the broken server turned out to be fine, except it had one bad stick of RAM!

I am wondering whether moving Kimi K2.5 to “2x servers, each with 512gb RAM, linked by ethernet”, might be faster than running everything on a single computer? The rationale being doubled memory bandwidth, and twice the number of cores … balanced against the speed of the ethernet link?

I’m going to do this test soon (and I will increase the memory speed settings in the BIOS), but wondering if anyone has experience or advice around this, especially networking? Two of the servers were unused spares from an ISP, and have some fibre optic network cards, one had a 10gb Ethernet card, and all have loads of 1gb ethernet ports :)

Summary of tests (will expand over time)

***** Test 1 (one PC, RAM set to slowest speed)

model : Kimi K2.5 unsloth UD 4-bit K-XL quant (~620gb IIRC)

platform : IBM X3650 M4, dual 8-core Xeon, 768GB HyperCloud DDR3 RAM, no GPU (note : I set the RAM to ‘minimal power usage, 800MHz, for this)

result : 1 token per second

submitted by /u/Shipworms
[link] [comments]
広告