Macrocosmos has released a paper on ResBM (Residual Bottleneck Models), a new transformer-based architecture designed for low-bandwidth pipeline-parallel training.
https://arxiv.org/abs/2604.11947
ResBM introduces a residual encoder-decoder bottleneck across pipeline boundaries, with the goal of reducing inter-stage communication while preserving an explicit low-rank identity path. The paper reports SOTA 128× activation compression without significant loss in convergence relative to uncompressed baselines.
In their experiments, the strongest compressed results use Muon, and the paper positions ResBM as a development in decentralized / internet-grade pipeline parallel training.
Full disclosure: I work at Macrocosmos. Sharing this paper from the engineering team
[link] [comments]


