A new transformer variant has been created to facilitate more efficient model training in distributed settings. 128x compression with no significant loss in convergence rates, increases in memory, or compute overhead

Reddit r/LocalLLaMA / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Macrocosmos released a paper introducing ResBM (Residual Bottleneck Models), a new transformer architecture aimed at reducing inter-stage communication in low-bandwidth, pipeline-parallel distributed training.
ResBM adds a residual encoder-decoder bottleneck across pipeline boundaries while preserving an explicit low-rank identity path to maintain training effectiveness.
The paper reports state-of-the-art results showing 128× activation compression with no significant loss in convergence compared with uncompressed baselines.
The strongest results in experiments use Muon, and the work is positioned as useful for decentralized or “internet-grade” pipeline parallel training setups.
The post notes the sharing is from Macrocosmos’ engineering team, indicating close ties to the authorship and evaluation of the approach.

Macrocosmos has released a paper on ResBM (Residual Bottleneck Models), a new transformer-based architecture designed for low-bandwidth pipeline-parallel training.

https://arxiv.org/abs/2604.11947

ResBM introduces a residual encoder-decoder bottleneck across pipeline boundaries, with the goal of reducing inter-stage communication while preserving an explicit low-rank identity path. The paper reports SOTA 128× activation compression without significant loss in convergence relative to uncompressed baselines.

In their experiments, the strongest compressed results use Muon, and the paper positions ResBM as a development in decentralized / internet-grade pipeline parallel training.

Full disclosure: I work at Macrocosmos. Sharing this paper from the engineering team

submitted by /u/network-kai
[link] [comments]

Black Hat Asia

AI Business

Introducing Claude Opus 4.7

Anthropic News

AI traffic to US retailers rose 393% in Q1, and it’s boosting their revenue too

TechCrunch

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability

Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp

Dev.to

A new transformer variant has been created to facilitate more efficient model training in distributed settings. 128x compression with no significant loss in convergence rates, increases in memory, or compute overhead

Key Points

Related Articles

Black Hat Asia

Introducing Claude Opus 4.7

AI traffic to US retailers rose 393% in Q1, and it’s boosting their revenue too

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer