Dynamic Context Evolution for Scalable Synthetic Data Generation

arXiv cs.CL / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces “cross-batch mode collapse” for LLM-based synthetic data generation, where repeated independent prompting gradually reduces output diversity.
  • It proposes Dynamic Context Evolution (DCE), combining verbalized tail sampling, semantic memory for near-duplicate rejection across batches, and adaptive prompt evolution that rebuilds prompts from memory to preserve diversity.
  • Experiments across three domains and two model families (gpt-5-mini and claude-haiku-4-5) show DCE reduces mode collapse to 0.0% (vs. about 5.6% for naive prompting) while yielding substantially more stable conceptual clusters.
  • The approach is validated using an independent embedding model (all-MiniLM-L6-v2) and remains robust under sensitivity sweeps of VTS threshold (tau) and dedup threshold (delta).
  • DCE reportedly improves candidate diversity without fine-tuning or custom architectures, costing roughly $0.50 per 1,000 candidates using standard API calls.

Abstract

Large language models produce repetitive output when prompted independently across many batches, a phenomenon we term cross-batch mode collapse: the progressive loss of output diversity when a language model is prompted repeatedly without access to its prior generations. Practitioners have long mitigated this with ad hoc deduplication and seed rotation, but no principled framework exists. We introduce Dynamic Context Evolution (DCE), comprising three mechanisms: (1) verbalized tail sampling (the model labels each idea with a guess about how obvious it is, and obvious ideas are discarded), which filters high-probability candidates via model self-assessment; (2) semantic memory, which maintains a persistent embedding index to reject near-duplicates across batches; and (3) adaptive prompt evolution, which reconstructs the generation prompt each batch using memory state and rotating diversity strategies. In experiments across three domains (sustainable packaging concepts, educational exam questions, and creative writing prompts) and two model families (gpt-5-mini and claude-haiku-4-5), a component ablation across 2-3 random seeds per method shows that DCE achieves 0.0 +/- 0.0% collapse versus 5.6 +/- 2.0% for naive prompting, while producing 17-18 HDBSCAN clusters per seed versus naive's volatile 2-17, indicating reliably richer conceptual structure. These results are validated with an independent embedding model (all-MiniLM-L6-v2) and hold across sensitivity sweeps of the VTS threshold tau and dedup threshold delta. Deduplication and prompt evolution are individually insufficient but jointly effective, at approximately $0.50 per 1,000 candidates using only standard API calls, with no fine-tuning or custom architectures required.