When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration

arXiv cs.LG / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses high memory and communication costs in latent multi-agent LLM systems that use full KV-cache relays for information exchange.
It proposes Orthogonal Backfill (OBF), an eviction-style KV compression method that adds a low-rank orthogonal residual from discarded KV states into the retained KV states to reduce information loss.
Experiments on nine benchmarks (mathematical reasoning, coding, and knowledge-intensive QA) show OBF matches full KV relay performance while cutting communication cost by about 79.8%–89.4%.
OBF achieves best results on 7 of 9 benchmarks and suggests that preserving the most useful latent information can outperform simply transmitting more.
The authors release a public codebase to support replication and further development of the approach.

Abstract

Communication in Large Language Model (LLM)-based multi-agent systems is moving beyond discrete tokens to preserve richer context. Recent work such as LatentMAS enables agents to exchange latent messages through full key-value (KV) caches. However, full KV relay incurs high memory and communication cost. We adapt eviction-style KV compression to this setting and introduce Orthogonal Backfill (OBF) to mitigate information loss from hard eviction. OBF injects a low-rank orthogonal residual from discarded KV states into the retained KV states. We evaluate proposed method against full KV relay on nine standard benchmarks spanning mathematical reasoning, coding, and knowledge-intensive QA. It achieves performance comparable to full KV relay while reducing communication cost by 79.8%--89.4%. OBF further improves the performance and achieves the best results on 7 of the 9 benchmarks. This suggests that more information does not necessarily lead to better communication; preserving the most useful information matters more. Our codebase is publicly available on https://github.com/markli404/When-Less-Latent-Leads-to-Better-Relay.

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

Dev.to

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

"The Hidden Costs of AI Agent Deployment: A CFO's Guide to True ROI in Enterpris

Dev.to

"The Real Cost of AI Compute: Why Token Efficiency Separates Viable Agents from

Dev.to

When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration

Key Points

Abstract

Related Articles

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

"The Hidden Costs of AI Agent Deployment: A CFO's Guide to True ROI in Enterpris

"The Real Cost of AI Compute: Why Token Efficiency Separates Viable Agents from

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer