ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents
arXiv cs.AI / 4/6/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces ESL-Bench, a new event-driven synthetic longitudinal benchmark designed for evaluating “health agents” that must reason over multi-source, time-extended patient trajectories.
- ESL-Bench generates 100 synthetic users with 1–5 year timelines combining continuous device streams, sparse clinical exams, and episodic life events, while providing explicit ground-truth indicator impact parameters.
- The framework models each health indicator using baseline stochastic processes triggered by discrete events with sigmoid onset and exponential decay, subject to physiological saturation/projection constraints.
- A hybrid pipeline uses LLM-based planning for sparse semantic artifacts and algorithmic simulation for dense indicator dynamics, enabling programmatically computable answers for evaluation queries.
- Experiments with 13 methods show DB-native agents outperform memory-augmented RAG (48–58% vs. 30–38%), with the biggest gains on Comparison and Explanation tasks requiring multi-hop evidence attribution.
Related Articles

Black Hat Asia
AI Business

How Bash Command Safety Analysis Works in AI Systems
Dev.to

How I Built an AI Agent That Earns USDC While I Sleep — A Complete Guide
Dev.to

How to Get Better Output from AI Tools (Without Burning Time and Tokens)
Dev.to

How I Added LangChain4j Without Letting It Take Over My Spring Boot App
Dev.to