A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

arXiv cs.AI / 3/23/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a subgoal-driven framework that enables real-time online planning with subgoal decomposition to improve long-horizon LLM agents in dynamic environments such as web navigation.
It presents MiRA (Milestoning your Reinforcement Learning Enhanced Agent), an RL training framework that uses dense milestone-based rewards to guide learning for longer task sequences.
Empirical results show substantial gains, with Gemini achieving about a 10 percentage point absolute increase in success rate on WebArena-Lite, and Gemma3-12B rising from 6.4% to 43.0% SR, surpassing several strong baselines including GPT-4-Turbo and GPT-4o.
The findings indicate that combining explicit inference-time planning with milestone-based rewards significantly enhances long-horizon capabilities, suggesting broad potential for robust autonomous systems.

Abstract

Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing LLM-based agents struggle with long-horizon planning in two main ways. During online execution, they often lose track as new information arrives, lacking a clear and adaptive path toward the final goal. This issue is further exacerbated during reinforcement learning (RL) fine-tuning, where sparse and delayed rewards make it difficult for agents to identify which actions lead to success, preventing them from maintaining coherent reasoning over extended tasks. To address these challenges, we propose two contributions. First, we introduce an agent framework that leverages proprietary models for online planning through subgoal decomposition. Second, we present MiRA (Milestoning your Reinforcement Learning Enhanced Agent), an RL training framework that uses dense, milestone-based reward signals. The real-time planning mechanism improves proprietary models such as Gemini by approximately a 10% absolute increase in success rate (SR) on the WebArena-Lite benchmark. Meanwhile, applying MiRA to the open Gemma3-12B model increases its success rate from 6.4% to 43.0%. This performance surpasses proprietary systems such as GPT-4-Turbo (17.6%) and GPT-4o (13.9%), as well as the previous open-model state of the art, WebRL (38.4%). Overall, our findings demonstrate that combining explicit inference-time planning with milestone-based rewards significantly improves an agent's long-horizon capabilities, paving the way for more robust and general-purpose autonomous systems.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/23DailyView insight →

Is AI becoming a bubble, and could it end like the dot-com crash?

Reddit r/artificial

How I Gave My AI a Real Brain: The System That Runs Half My Company

Dev.to

Externalizing State

Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Key Points

Abstract

💡 Insights using this article

Related Articles

Is AI becoming a bubble, and could it end like the dot-com crash?

How I Gave My AI a Real Brain: The System That Runs Half My Company

Externalizing State

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer