ResBM: a new transformer-based architecture for low-bandwidth pipeline-parallel training, achieving 128× activation compression [R]

Reddit r/MachineLearning / 4/17/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

ResBM (Residual Bottleneck Models) is a new transformer-based architecture aimed at making pipeline-parallel training more efficient under low-bandwidth conditions by reducing inter-stage communication.
The model uses a residual encoder–decoder bottleneck across pipeline boundaries while maintaining an explicit low-rank identity path to preserve training behavior.
The paper reports state-of-the-art results, including 128× activation compression, with little reported impact on convergence compared to uncompressed baselines.
Experiments indicate the strongest compressed performance when using the Muon optimizer, and the work frames ResBM as relevant to decentralized or “internet-grade” pipeline parallel training.

Macrocosmos has released a paper on ResBM (Residual Bottleneck Models), a new transformer-based architecture designed for low-bandwidth pipeline-parallel training.

https://arxiv.org/abs/2604.11947

ResBM introduces a residual encoder-decoder bottleneck across pipeline boundaries, with the goal of reducing inter-stage communication while preserving an explicit low-rank identity path. The paper reports SOTA 128× activation compression without significant loss in convergence relative to uncompressed baselines.

In their experiments, the strongest compressed results use Muon, and the paper positions ResBM as a development in decentralized / internet-grade pipeline parallel training.

submitted by /u/network-kai
[link] [comments]