Distributed Training of Local LLMs made easier with mDNS + ZeroConf for local hardware!

Reddit r/LocalLLaMA / 5/2/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The article describes integrating “grove” into smolcluster to simplify distributed training on local hardware by automatically discovering nodes using mDNS/ZeroConf rather than manual network setup.
  • It highlights that grove removes the need for per-node SSH, static IPs, and network configuration, and provides a live terminal dashboard with per-rank metrics like loss, gradient norm, tokens/sec, and network I/O.
  • The post explains that on macOS, nodes use mDNS for discovery, while on Linux/Jetson it falls back to TCP while still using mDNS, aiming for a smoother cross-platform experience.
  • The author claims existing training approaches (e.g., FSDP, SyncPS, ClassicDP) can be run with “two commands” using grove within smolcluster, and demonstrates a simple 3-node workflow with “start” on the coordinator and “join” on workers.
  • Testing is underway on a three-Mac-Mini setup with planned verification on Jetson boards, and the author directs readers to smolcluster.com and credits the contributor who released grove.
Distributed Training of Local LLMs made easier with mDNS + ZeroConf for local hardware!

just integrated grove into smolcluster and it's genuinely one of the cleanest pieces of infra I've plugged in

  • grove is a package built by some really sharp person, it handles zero-config node discovery and gives you a live terminal dashboard for distributed training.

I did faced the same problem, the problem of having to setup the SSH, networking, cables etc for every node I want to add to my cluster for training since I began to use smolcluster for my own projects , sigh...you know the pain right?

though the best I could is search and realize what I need is auto discovery of nodes, aka mDNS!

Its something that AirDrop uses for seamless auto discovery and data transfer between macOS devices, and Zeroconf for non-macOS ones, though sadly, couldn't come up with a working solution (skill issue it seems haha).

And thats where I found grove, I didn't build grove, I just integrated it.

  • what it does:

on Mac, nodes discover each other over mDNS — no IPs, no SSH config, nothing! on Linux/Jetson it falls back to TCP + mDNS gives you a live per-rank TUI showing rank, host, loss, grad norm, tokens/sec, network I/O in real time

  • the integration side:

every smolcluster training algorithm , i.e., FSDP, SyncPS, ClassicDP etc I have reimplemented using pure socket in Python for educational purposes, all of those you can now easily run without worrying about IPs, SSH, networking etc! directly within 2 commands! (before it was like 10 steps ufff - well it still is if you want some serious runs).

  • usage on a 3-node cluster:

run grove start <script> -n 3 on the coordinator run grove join on each worker the cluster forms itself

that's the whole setup. no static IPs, no config files, no manual port forwarding.

been running this on my 3x Mac Minis and testing on Jetson boards soon.

check it out today at smolcluster[dot]com!

PS: shoutout to @swar_ja for releasing grove!

submitted by /u/East-Muffin-6472
[link] [comments]