ResBM: a new transformer-based architecture for low-bandwidth pipeline-parallel training, achieving 128× activation compression [R]

Reddit r/MachineLearning / 4/17/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • ResBM (Residual Bottleneck Models) is a new transformer-based architecture aimed at making pipeline-parallel training more efficient under low-bandwidth conditions by reducing inter-stage communication.
  • The model uses a residual encoder–decoder bottleneck across pipeline boundaries while maintaining an explicit low-rank identity path to preserve training behavior.
  • The paper reports state-of-the-art results, including 128× activation compression, with little reported impact on convergence compared to uncompressed baselines.
  • Experiments indicate the strongest compressed performance when using the Muon optimizer, and the work frames ResBM as relevant to decentralized or “internet-grade” pipeline parallel training.

Macrocosmos has released a paper on ResBM (Residual Bottleneck Models), a new transformer-based architecture designed for low-bandwidth pipeline-parallel training.

https://arxiv.org/abs/2604.11947

ResBM introduces a residual encoder-decoder bottleneck across pipeline boundaries, with the goal of reducing inter-stage communication while preserving an explicit low-rank identity path. The paper reports SOTA 128× activation compression without significant loss in convergence relative to uncompressed baselines.

In their experiments, the strongest compressed results use Muon, and the paper positions ResBM as a development in decentralized / internet-grade pipeline parallel training.

submitted by /u/network-kai
[link] [comments]