| To reduce communication overhead, Covenant AI used their introduced method SparseLoco, built on top of DiLoCo that reduces synchronization frequency and uses a local AdamW optimizer, it also adds aggressive top-K sparsification to solve the bandwidth bottleneck. [link] [comments] |
1Covenant/Covenant-72B: Largest model so far to be trained on decentralized permissionless GPU nodes
Reddit r/LocalLLaMA / 3/17/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- Covenant AI announced Covenant-72B as the largest model to be trained on decentralized permissionless GPU nodes, marking a milestone in distributed ML.
- The training uses SparseLoco built on top of DiLoCo to reduce synchronization frequency and relies on a local AdamW optimizer to cut communication overhead.
- It also employs aggressive top-K sparsification to address bandwidth bottlenecks in decentralized training setups.
- The information is linked to a Reddit post and a HuggingFace repository, illustrating community-driven experimentation.


![[Boost]](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D800%252Cheight%3D%252Cfit%3Dscale-down%252Cgravity%3Dauto%252Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Fuser%252Fprofile_image%252F3833034%252F44fa15e0-8eb9-4843-a424-a4a7b3538f43.jpeg&w=3840&q=75)