SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling

arXiv cs.LG / 4/15/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

SOLARIS is a proposed inference-scaling framework that uses speculative-decoding ideas to precompute latent representations for likely user-item pairs before they are requested in real time.
By asynchronously generating foundation-model embeddings ahead of time, SOLARIS decouples expensive model inference from the latency-critical online serving path, reducing the need for knowledge-distillation-based quality compromises.
The method predicts which user-item interactions will occur, then proactively computes the corresponding foundation-model representations so online serving can rely on precomputed outputs.
The paper reports deployment at Meta across the advertising system, handling billions of daily requests, and observes a 0.67% revenue-driving top-line metrics gain.
Overall, SOLARIS reframes “too expensive to serve online” models as feasible through speculative offloading of intermediate representations rather than solely compressing models via distillation.

Abstract

Recent advances in recommendation scaling laws have led to foundation models of unprecedented complexity. While these models offer superior performance, their computational demands make real-time serving impractical, often forcing practitioners to rely on knowledge distillation-compromising serving quality for efficiency. To address this challenge, we present SOLARIS (Speculative Offloading of Latent-bAsed Representation for Inference Scaling), a novel framework inspired by speculative decoding. SOLARIS proactively precomputes user-item interaction embeddings by predicting which user-item pairs are likely to appear in future requests, and asynchronously generating their foundation model representations ahead of time. This approach decouples the costly foundation model inference from the latency-critical serving path, enabling real-time knowledge transfer from models previously considered too expensive for online use. Deployed across Meta's advertising system serving billions of daily requests, SOLARIS achieves 0.67% revenue-driving top-line metrics gain, demonstrating its effectiveness at scale.

Black Hat Asia

AI Business

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Reddit r/MachineLearning

I built a trading intelligence MCP server in 2 days — here's how

Dev.to

Voice-Controlled AI Agent Using Whisper and Local LLM

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling

Key Points

Abstract

Related Articles

Black Hat Asia

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

I built a trading intelligence MCP server in 2 days — here's how

Voice-Controlled AI Agent Using Whisper and Local LLM

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer