WebUncertainty: Dual-Level Uncertainty Driven Planning and Reasoning For Autonomous Web Agent

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper presents WebUncertainty, a framework for autonomous web agents to better handle complex, long-horizon tasks on real, dynamic webpages.
It introduces a Task Uncertainty-Driven Adaptive Planning mechanism that switches planning modes based on uncertainty in unknown environments.
It adds an Action Uncertainty-Driven MCTS reasoning approach using ConActU to quantify aleatoric and epistemic uncertainties and improve the decision-making during search.
Experiments on the WebArena and WebVoyager benchmarks show WebUncertainty outperforms existing state-of-the-art methods.
The work targets two core failures of prior agents—rigid planning and hallucination-prone reasoning—by explicitly modeling uncertainty at planning and action levels.

Abstract

Recent advancements in large language models (LLMs) have empowered autonomous web agents to execute natural language instructions directly on real-world webpages. However, existing agents often struggle with complex tasks involving dynamic interactions and long-horizon execution due to rigid planning strategies and hallucination-prone reasoning. To address these limitations, we propose WebUncertainty, a novel autonomous agent framework designed to tackle dual-level uncertainty in planning and reasoning. Specifically, we design a Task Uncertainty-Driven Adaptive Planning Mechanism that adaptively selects planning modes to navigate unknown environments. Furthermore, we introduce an Action Uncertainty-Driven Monte Carlo tree search (MCTS) Reasoning Mechanism. This mechanism incorporates the Confidence-induced Action Uncertainty (ConActU) strategy to quantify both aleatoric uncertainty (AU) and epistemic uncertainty (EU), thereby optimizing the search process and guiding robust decision-making. Experimental results on the WebArena and WebVoyager benchmarks demonstrate that WebUncertainty achieves superior performance compared to state-of-the-art baselines.