Moonshot AI and Tsinghua Researchers Propose PrfaaS: A Cross-Datacenter KVCache Architecture that Rethinks How LLMs are Served at Scale

MarkTechPost / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

Moonshot AIと清華大学の研究チームは、LLM推論サービスのボトルネックを見直すための「PrfaaS」を提案しています。
従来は高帯域RDMAネットワークの都合で、LLMのprefillとdecodeが同一データセンター（場合によっては同一ラック）に閉じ込められていました。
提案方式では、異なるデータセンター間でKVキャッシュを扱えるようにするクロス・データセンター型のKVCacheアーキテクチャを目指しています。
これにより、LLMの大規模サービングにおけるネットワーク配置や推論効率の改善につながる可能性があります。

For years, the way large language models handle inference has been stuck inside a box — literally. The high-bandwidth RDMA networks that make modern LLM serving work have confined both prefill and decode to the same datacenter, sometimes even the same rack. A team of researchers at Moonshot AI and Tsinghua University is making the […]

The post Moonshot AI and Tsinghua Researchers Propose PrfaaS: A Cross-Datacenter KVCache Architecture that Rethinks How LLMs are Served at Scale appeared first on MarkTechPost.