The World Leaks the Future: Harness Evolution for Future Prediction Agents

arXiv cs.AI / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies “future prediction” by LLM agents, where predictions must be made using only public information available before the final outcome is known.
  • It argues that existing methods often rely too much on final outcomes because supervision arrives after resolution, making it harder to track factors, gather/evaluate evidence, and manage uncertainty earlier.
  • It introduces “internal feedback,” a signal derived from revisiting the same unresolved question over time and comparing temporal prediction differences to reveal omissions in earlier reasoning.
  • The authors propose Milkyway, a self-evolving agent system that keeps the base model fixed but updates a persistent “future prediction harness” using internal feedback during repeated predictions.
  • Experiments on FutureX and FutureWorld show Milkyway achieves the top overall scores, substantially improving results (FutureX: 44.07→60.90; FutureWorld: 62.22→77.96).

Abstract

Many consequential decisions must be made before the relevant outcome is known. Such problems are commonly framed as \emph{future prediction}, where an LLM agent must form a prediction for an unresolved question using only the public information available at the prediction time. The setting is difficult because public evidence evolves while useful supervision arrives only after the question is resolved, so most existing approaches still improve mainly from final outcomes. Yet final outcomes are too coarse to guide earlier factor tracking, evidence gathering and interpretation, or uncertainty handling. When the same unresolved question is revisited over time, temporal contrasts between earlier and later predictions can expose omissions in the earlier prediction process; we call this signal \emph{internal feedback}. We introduce \emph{Milkyway}, a self-evolving agent system that keeps the base model fixed and instead updates a persistent \emph{future prediction harness} for factor tracking, evidence gathering and interpretation, and uncertainty handling. Across repeated predictions on the same unresolved question, \emph{Milkyway} extracts internal feedback and writes reusable guidance back into the harness, so later predictions on that question can improve before the outcome is known. After the question is resolved, the final outcome provides a \emph{retrospective check} before the updated harness is carried forward to subsequent questions. On FutureX and FutureWorld, Milkyway achieves the best overall score among the compared methods, improving FutureX from 44.07 to 60.90 and FutureWorld from 62.22 to 77.96.