Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation

arXiv cs.LG / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper reports controlled experiments on nanoFMT, a Free-Market Algorithm (FMA)-orchestrated transformer using dynamic Mixture-of-Experts (MoE) expert management to handle shifting data distributions at full capacity.
  • It finds that using cost-penalized fitness with a linear grace period for newly created experts enables the model to accumulate domain expertise via diversification rather than frequent expert replacement.
  • In a round-trip domain shift test, the approach achieves 9–11× faster recovery when returning to a previously learned domain without requiring any expert births or replacements.
  • The authors term this behavior a “molecular memory” effect, arguing that dormant experts persist and reactivate when their original domain reappears, unlike existing MoE management strategies.
  • A preliminary cost/energy analysis estimates potential annual savings of $39.1M and a 27.1 GWh energy reduction for an OpenAI-scale provider under a moderate scenario.

Abstract

We present experimental results from seven controlled runs of nanoFMT, a Free-Market Algorithm (FMA) orchestrated transformer with dynamic Mixture-of-Experts (MoE) management. The experiments address a fundamental question for advanced LLM development: how should an MoE system manage its expert pool when operating at full capacity under changing data distributions? We demonstrate that cost-penalized fitness metrics, combined with a linear grace period for newborn experts, produce a system that accumulates domain expertise through diversification rather than replacement. The central result is a round-trip domain shift experiment showing 9-11x faster recovery when returning to a previously learned domain, with zero expert births or replacements required. This "molecular memory" effect -- where dormant experts survive and reactivate when their domain returns -- has no analogue in current MoE management approaches. A preliminary cost analysis estimates annual savings of $39.1M and 27.1 GWh energy reduction for an OpenAI-scale provider under a moderate scenario.

Cost-Penalized Fitness in FMA-Orchestrated Mixture of Experts: Experimental Evidence for Molecular Memory in Domain Adaptation | AI Navigate