PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments
arXiv cs.AI / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces PERMA, a benchmark for evaluating long-term personalized memory agents by testing how well models maintain persona consistency over temporally ordered, multi-session interactions rather than relying on static preference recall.
- It addresses limitations of prior evaluations that mix preference-related dialogue with irrelevant conversation, by modeling how user preferences gradually emerge and accumulate across noisy contexts.
- PERMA incorporates simulated real-world input variability and linguistic alignment (idiolects) using temporally evolving event sequences with preference queries inserted over time.
- The benchmark includes both multiple-choice and interactive tasks to measure a model’s ability to track preferences along an interaction timeline, across multiple domains.
- Experiments suggest that event-linked memory systems can recover more precise preferences and reduce token usage compared with semantic retrieval, but still struggle with long-horizon persona coherence and cross-domain interference.
Related Articles
5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)
Dev.to
AgentDesk vs Hiring Another Consultant: A Cost Comparison
Dev.to
"Why Your AI Agent Needs a System 1"
Dev.to
When should we expect TurboQuant?
Reddit r/LocalLLaMA
AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia
Dev.to