Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

arXiv cs.AI / 4/25/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes a framework that adaptively allocates test-time compute while simultaneously adjusting how the model generates outputs.
It uses a warm-up step to find easy queries and build an initial set of question–response pairs drawn from the test set itself.
In the adaptive phase, additional compute is focused on unresolved queries and their generation distributions are reshaped via evolving in-context demonstrations.
The evolving demonstrations condition each generation on previously successful responses from semantically related queries, avoiding repeated sampling from a fixed distribution.
Experiments on math, coding, and reasoning benchmarks show consistent improvements over baselines while using substantially less inference-time compute.

Abstract

While scaling test-time compute can substantially improve model performance, existing approaches either rely on static compute allocation or sample from fixed generation distributions. In this work, we introduce a test-time compute allocation framework that jointly adapts where computation is spent and how generation is performed. Our method begins with a warm-up phase that identifies easy queries and assembles an initial pool of question-response pairs from the test set itself. An adaptive phase then concentrates further computation on unresolved queries while reshaping their generation distributions through evolving in-context demonstrations -- conditioning each generation on successful responses from semantically related queries rather than resampling from a fixed distribution. Experiments across math, coding, and reasoning benchmarks demonstrate that our approach consistently outperforms existing baselines while consuming substantially less inference-time compute.

Underwhelming or underrated? DeepSeek V4 shows “impressive” gains

SCMP Tech

Debugging AI Agents in Production: ADK+Gemini Cloud Assist | Google Cloud NEXT '26

Dev.to

🤖 Learn Harness Engineering by Building a Mini Openclaw 🦞

Dev.to

Teaching Small Language Models to Remember: Giving LLMs a Notebook with Differentiable Neural Computers

Dev.to

Training LFM-2.5-350M on Reddit post summarization with GRPO on my 3x Mac Minis — final evals and t-test evals are here [P]

Reddit r/MachineLearning

Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

Key Points

Abstract

Related Articles

Underwhelming or underrated? DeepSeek V4 shows “impressive” gains

Debugging AI Agents in Production: ADK+Gemini Cloud Assist | Google Cloud NEXT '26

🤖 Learn Harness Engineering by Building a Mini Openclaw 🦞

Teaching Small Language Models to Remember: Giving LLMs a Notebook with Differentiable Neural Computers

Training LFM-2.5-350M on Reddit post summarization with GRPO on my 3x Mac Minis — final evals and t-test evals are here [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer