Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations

arXiv cs.AI / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Forage V2 targets “denominator blindness” in open-world autonomous agents by extending V1’s co-evolving evaluation and method isolation into a learning-organization architecture.
  • The approach accumulates knowledge across multiple runs, transfers that knowledge across different model capabilities, and uses institutional safeguards to prevent degradation of stored evaluation heuristics.
  • Experiments across web scraping, API queries, and mathematical reasoning show knowledge entries growing from 0 to 54 over six runs, with denominator estimates stabilizing as domain understanding improves.
  • Knowledge transfer is demonstrated by seeding a weaker model (Sonnet) with a stronger model’s (Opus) knowledge, reducing a coverage gap from 6.6pp to 1.1pp, cutting cost (9.40 to 5.13 USD), and reaching convergence faster (4.5 vs. 7.0 rounds).
  • V2’s key contribution is architectural: it proposes organizational “institutions” (audit separation, contract protocols, organizational memory) so future agents can inherit and rely on calibrated, readable knowledge independent of model provider.

Abstract

Autonomous agents operating in open-world tasks -- where the completion boundary is not given in advance -- face denominator blindness: they systematically underestimate the scope of the target space. Forage V1 addressed this through co-evolving evaluation (an independent Evaluator discovers what "complete" means) and method isolation (Evaluator and Planner cannot see each other's code). V2 extends the architecture from a single expedition to a learning organization: experience accumulates across runs, transfers across model capabilities, and institutional safeguards prevent knowledge degradation. We demonstrate two claims across three task types (web scraping, API queries, mathematical reasoning). Knowledge accumulation: over six runs, knowledge entries grow from 0 to 54, and denominator estimates stabilize as domain understanding deepens. Knowledge transfer: a weaker agent (Sonnet) seeded with a stronger agent's (Opus) knowledge narrows a 6.6pp coverage gap to 1.1pp, halves cost (9.40 to 5.13 USD), converges in half the rounds (mean 4.5 vs. 7.0), and three independent seeded runs arrive at exactly the same denominator estimate (266), suggesting organizational knowledge calibrates evaluation itself. V2's contribution is architectural: it designs institutions -- audit separation, contract protocols, organizational memory -- that make any agent more reliable upon entry. The accumulated experience is organizational, model-agnostic, and transferable, stored as readable documents that any future agent inherits regardless of provider or capability level.