| To reduce communication overhead, Covenant AI used their introduced method SparseLoco, built on top of DiLoCo that reduces synchronization frequency and uses a local AdamW optimizer, it also adds aggressive top-K sparsification to solve the bandwidth bottleneck. [link] [comments] |
1Covenant/Covenant-72B: Largest model so far to be trained on decentralized permissionless GPU nodes
Reddit r/LocalLLaMA / 3/17/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- Covenant AI announced Covenant-72B as the largest model to be trained on decentralized permissionless GPU nodes, marking a milestone in distributed ML.
- The training uses SparseLoco built on top of DiLoCo to reduce synchronization frequency and relies on a local AdamW optimizer to cut communication overhead.
- It also employs aggressive top-K sparsification to address bandwidth bottlenecks in decentralized training setups.
- The information is linked to a Reddit post and a HuggingFace repository, illustrating community-driven experimentation.
Related Articles
The massive shift toward edge computing and local processing
Dev.to
Self-Refining Agents in Spec-Driven Development
Dev.to
The Three-Agent Protocol Is Transferable. The Discipline Isn't.
Dev.to
I built an abuse database for AI agents. It's free and open.
Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop
Reddit r/LocalLLaMA