Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

arXiv cs.AI / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that developing proactive assistants is hindered by the lack of realistic user simulation, because prior methods treat apps as flat tool-calling APIs rather than stateful, sequential environments.
  • It introduces Proactive Agent Research Environment (Pare), which models applications as finite state machines so a user simulator can navigate statefully and generate state-dependent actions.
  • The framework is extended with Pare-Bench, a benchmark covering 143 tasks across communication, productivity, scheduling, and lifestyle apps.
  • Pare-Bench is designed to evaluate key capabilities such as context observation, goal inference, correct intervention timing, and coordinating actions across multiple apps.

Abstract

Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.