M$^\star$: Every Task Deserves Its Own Memory Harness

arXiv cs.AI / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes M$^\star$, an approach for LLM agents that automatically finds task-specific memory “harnesses” rather than using a fixed, one-size-fits-all memory architecture.
M$^\star$ represents an agent’s memory system as an executable Python memory program that bundles a data schema, storage logic, and workflow instructions, and then optimizes these components jointly.
It uses reflective code evolution with population-based search and feedback from evaluation failures to iteratively refine candidate memory programs.
Experiments across four benchmarks covering conversation, embodied planning, and expert reasoning show consistent performance gains over fixed-memory baselines.
The evolved memory programs develop structurally distinct processing mechanisms per domain, suggesting task specialization opens a broader design space than general-purpose memory paradigms.

Abstract

Large language model agents rely on specialized memory systems to accumulate and reuse knowledge during extended interactions. Recent architectures typically adopt a fixed memory design tailored to specific domains, such as semantic retrieval for conversations or skills reused for coding. However, a memory system optimized for one purpose frequently fails to transfer to others. To address this limitation, we introduce M

^\star

, a method that automatically discovers task-optimized memory harnesses through executable program evolution. Specifically, M

^\star

models an agent memory system as a memory program written in Python. This program encapsulates the data Schema, the storage Logic, and the agent workflow Instructions. We optimize these components jointly using a reflective code evolution method; this approach employs a population-based search strategy and analyzes evaluation failures to iteratively refine the candidate programs. We evaluate M

^\star

on four distinct benchmarks spanning conversation, embodied planning, and expert reasoning. Our results demonstrate that M

^\star

improves performance over existing fixed-memory baselines robustly across all evaluated tasks. Furthermore, the evolved memory programs exhibit structurally distinct processing mechanisms for each domain. This finding indicates that specializing the memory mechanism for a given task explores a broad design space and provides a superior solution compared to general-purpose memory paradigms.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/15DailyView insight →

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Reddit r/MachineLearning

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Failure to Reproduce Modern Paper Claims [D]

Reddit r/MachineLearning

Why don’t they just use Mythos to fix all the bugs in Claude Code?

Reddit r/LocalLLaMA

M$^\star$: Every Task Deserves Its Own Memory Harness

Key Points

Abstract

💡 Insights using this article

Related Articles

Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Failure to Reproduce Modern Paper Claims [D]

Why don’t they just use Mythos to fix all the bugs in Claude Code?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer