ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

arXiv cs.CL / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

ReSum is proposed as a plug-and-play paradigm for LLM web agents to enable unbounded long-horizon exploration by periodically summarizing interaction histories into compact external context without retraining.
The work argues that standard agents are not naturally aligned to reason over compressed summaries, so it introduces ReSum-GRPO to improve long-horizon credit assignment via an advantage broadcasting adaptation of GRPO.
Experiments on training-free settings show ReSum improves performance by 4.5% over ReAct, while ReSum-GRPO provides an additional 8.2% gain.
With only 1K training samples, a ReSum-enhanced 30B model reportedly reaches competitive performance versus leading open-source models, indicating strong sample efficiency.
Overall, the approach aims to preserve compatibility with existing agent architectures while addressing the context-window conflict that limits current web-agent strategies.

Abstract

Large Language Model (LLM)-based web agents excel at knowledge-intensive tasks but face a fundamental conflict between the need for extensive exploration and the constraints of limited context windows. Current solutions typically rely on architectural modifications, e.g., internal memory tokens, which break compatibility with pre-existing agents and necessitate costly end-to-end retraining. To overcome these limitations, we introduce ReSum, a lightweight, plug-and-play paradigm that enables unbounded exploration by periodically invoking an external tool to condense interaction histories into compact summaries. Although this paradigm functions without training, standard agents are not inherently aligned to reason over such compressed contexts. To bridge this gap, we propose ReSum-GRPO, which adapts Group Relative Policy Optimization (GRPO) via advantage broadcasting to propagate final rewards across segmented trajectories, enabling credit assignments over long-horizons. Extensive experiments show that ReSum achieves a 4.5% improvement over ReAct in training-free settings, with ReSum-GRPO yielding a further 8.2% gain. Notably, with only 1K training samples, a ReSum-enhanced 30B agent achieves competitive performance with leading open-source models, showing ReSum's effectiveness.

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Dev.to

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Dev.to

Data Sovereignty Rules and Enterprise AI

Dev.to

ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Key Points

Abstract

Related Articles

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Data Sovereignty Rules and Enterprise AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer