Hello! Excited to share our latest community-driven research project: WebHarbor: Docking Real Websites for Evolving GUI Agent Environments!
TL;DR: 15 popular websites (Amazon, GitHub, BBC News, arXiv, Booking, Hugging Face, etc.) packaged as self-contained Flask + SQLite apps in a single Docker image, with a control plane that resets each site to byte-identical state in <1 second, all by human-in-the-loop coding agent (e.g., Claude Code or CodeX). We support all 643 WebVoyager tasks out of the box.
Call for contribution: Our Next goal is 100+ popular websites — covering all of Online-Mind2Web (147 sites) and beyond. Two tracks:
- Contribute a new mirror site (use the coding-agent pipeline → human verify → open PR) → co-author on the final paper
- Review submitted PRs (5 reviews → co-author)
We also released useful skills for you(your coding agent) to work on it! Typically you can create a new mirron within 1 day! See more contribution details at Contribute Guide.
Why WebHarbor: running web agent benchmarks on the live web is a nightmare — reCAPTCHA, geo-blocks, content drift, network flakiness, and tasks that go stale within months. Plus you can't reset the live web, which rules out heavy RL training. You will need a lightweight, easy-to-reset, task-driven evolving environments for web agent, both evaluation and training!
Related Resources:
| Name | Link |
|---|---|
| 🏠 WebHarbor Project Page | WebHarbor |
| 🤗 HuggingFace Dataset | ChilleD/WebHarbor |
| 💻 WebHarbor GitHub | Code Repo |
| 📊 Contribution Guide | Guide Details |
| 📝 Contribution Request Form | Google Form |
Welcome suggestions and discussions!
[link] [comments]




