OmniMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory

arXiv cs.AI / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The arXiv paper introduces OmniMem, a unified multimodal “lifelong memory” framework aimed at helping AI agents retain, organize, and recall multimodal experiences over long time horizons.
  • It uses an autonomous autoresearch pipeline that runs ~50 experiments across two benchmarks, diagnosing failures, proposing architectural changes, and even fixing data-pipeline bugs without human involvement in the inner loop.
  • OmniMem achieves new state-of-the-art results, raising F1 from 0.117 to 0.598 on LoCoMo (+411%) and from 0.254 to 0.797 on Mem-Gallery (+214%) compared with the initial baseline.
  • The study finds that major gains come less from hyperparameter tuning and more from bug fixes (+175%), architectural changes (+44%), and targeted prompt engineering (+188% in certain categories), each outperforming the total effect of hyperparameter adjustments.
  • The authors provide a taxonomy of six discovery types and identify four properties that make multimodal memory especially well-suited to autoresearch, along with guidance for applying similar pipelines to other AI domains.

Abstract

AI agents increasingly operate over extended time horizons, yet their ability to retain, organize, and recall multimodal experiences remains a critical bottleneck. Building effective lifelong memory requires navigating a vast design space spanning architecture, retrieval strategies, prompt engineering, and data pipelines; this space is too large and interconnected for manual exploration or traditional AutoML to explore effectively. We deploy an autonomous research pipeline to discover OmniMem, a unified multimodal memory framework for lifelong AI agents. Starting from a na\"ive baseline (F1=0.117 on LoCoMo), the pipeline autonomously executes {\sim}50 experiments across two benchmarks, diagnosing failure modes, proposing architectural modifications, and repairing data pipeline bugs, all without human intervention in the inner loop. The resulting system achieves state-of-the-art on both benchmarks, improving F1 by +411% on LoCoMo (0.117\to0.598) and +214% on Mem-Gallery (0.254\to0.797) relative to the initial configurations. Critically, the most impactful discoveries are not hyperparameter adjustments: bug fixes (+175%), architectural changes (+44%), and prompt engineering (+188\% on specific categories) each individually exceed the cumulative contribution of all hyperparameter tuning, demonstrating capabilities fundamentally beyond the reach of traditional AutoML. We provide a taxonomy of six discovery types and identify four properties that make multimodal memory particularly suited for autoresearch, offering guidance for applying autonomous research pipelines to other AI system domains. Code is available at this https://github.com/aiming-lab/OmniMem.