Evolve: A Persistent Knowledge Lifecycle for Small Language Models

arXiv cs.LG / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • Evolve proposes a persistent knowledge lifecycle for small local language models by pairing a 2B model with a teacher-compiled, semantically coherent knowledge store that is updated and consolidated over time.
  • Instead of fragment retrieval at query time, it stages new knowledge sections when acquired, consolidates them offline via teacher-mediated merging (“sleep consolidation”), and refreshes sections inline when they expire.
  • Experiments on 750 benchmark queries (specialist questions, NaturalQuestions, TriviaQA) show accuracy rising from a 20–33% baseline to 60–84% (+40–52 percentage points) while cutting teacher model invocations by more than 50% through cross-query knowledge reuse.
  • The store compression achieved after consolidation is 31–33.5% across three benchmarks without sacrificing accuracy, and section-based retrieval outperforms chunk-based retrieval by 5–9 percentage points across all lifecycle conditions.
  • The system supports two generation modes—“suppress” (strict section-only, auditable) and “augment” (section-supplemented)—over the same underlying knowledge lifecycle.

Abstract

Evolve pairs a small local language model with a persistent, teacher-compiled knowledge store -- refined through sleep consolidation and usage-driven refresh -- to deliver substantial accuracy gains over the model's parametric baseline while amortizing teacher costs through cross-query knowledge reuse. Rather than retrieving document fragments at query time, Evolve constructs a store of semantically coherent sections compiled by teacher models at natural conceptual boundaries; new sections are staged on acquisition, consolidated offline through teacher-mediated merging, and refreshed inline when expired. A 2B-parameter local model handles classification and generation; large teacher models are invoked only for knowledge operations. Across 750 benchmark queries spanning custom specialist questions, NaturalQuestions, and TriviaQA, the 2B model augmented by Evolve improves from 20-33% baseline accuracy to 60-84% (+40-52pp) while reducing teacher invocations by over 50% through reuse. Post-consolidation compresses the knowledge store by 31-33.5% across three independent benchmarks while preserving accuracy; section-based retrieval outperforms chunk-based retrieval by 5-9pp across every lifecycle condition. The architecture supports two generation modes over the same lifecycle -- suppress (strict section-only grounding, auditable) and augment (section-supplemented responses).