From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation
arXiv cs.LG / 4/15/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that text-based web agents struggle because real-world HTML is noisy and heterogeneous, and standard SFT both fails to discriminate against plausible wrong elements and generalizes poorly to new layouts.
- It introduces the Triton dataset (590k instances) built with Structural-Semantic Hard Negative Mining and a Dual-Agent Consensus pipeline to generate hard distractors and cross-domain navigation tasks with verification.
- A progressive curriculum is used to train three 32B models targeting different abilities: imitation (Triton-SFT-32B), robust discrimination via Odds Ratio Preference Optimization (Triton-ORPO-32B), and long-horizon consistency via Group Relative Policy Optimization (Triton-GRPO-32B).
- On Mind2Web, Triton-GRPO-32B achieves state-of-the-art open-source performance with a 58.7% Step Success Rate and reportedly surpasses GPT-4.5 and Claude-4.5 by more than 16%, suggesting curriculum- and data-driven improvements can beat raw scale for web navigation.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business
Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]
Reddit r/MachineLearning

I built a trading intelligence MCP server in 2 days — here's how
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s
Reddit r/LocalLLaMA