Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research

arXiv cs.AI / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The article proposes Mimosa, an evolving multi-agent framework for Autonomous Scientific Research that adapts workflows to changing tasks and environments rather than relying on fixed procedures.
Mimosa uses Model Context Protocol (MCP) for dynamic tool discovery, a meta-orchestrator to generate multi-agent workflow topologies, and code-generating agents to execute subtasks via scientific software libraries.
Execution quality is assessed by an LLM-based judge, whose feedback iteratively refines the workflow over repeated experimental cycles.
On ScienceAgentBench, Mimosa reaches a 43.1% success rate with DeepSeek-V3.2, outperforming both single-agent baselines and static multi-agent setups, while showing heterogeneous responses to decomposition and iteration.
The framework is released as a fully open-source, modular, tool-agnostic platform with logged execution traces and archived workflows to improve auditability and support extensibility by the research community.

Abstract

Current Autonomous Scientific Research (ASR) systems, despite leveraging large language models (LLMs) and agentic architectures, remain constrained by fixed workflows and toolsets that prevent adaptation to evolving tasks and environments. We introduce Mimosa, an evolving multi-agent framework that automatically synthesizes task-specific multi-agent workflows and iteratively refines them through experimental feedback. Mimosa leverages the Model Context Protocol (MCP) for dynamic tool discovery, generates workflow topologies via a meta-orchestrator, executes subtasks through code-generating agents that invoke available tools and scientific software libraries, and scores executions with an LLM-based judge whose feedback drives workflow refinement. On ScienceAgentBench, Mimosa achieves a success rate of 43.1% with DeepSeek-V3.2, surpassing both single-agent baselines and static multi-agent configurations. Our results further reveal that models respond heterogeneously to multi-agent decomposition and iterative learning, indicating that the benefits of workflow evolution depend on the capabilities of the underlying execution model. Beyond these benchmarks, Mimosa modular architecture and tool-agnostic design make it readily extensible, and its fully logged execution traces and archived workflows support auditability by preserving every analytical step for inspection and potential replication. Combined with domain-expert guidance, the framework has the potential to automate a broad range of computationally accessible scientific tasks across disciplines. Released as a fully open-source platform, Mimosa aims to provide an open foundation for community-driven ASR.