Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution

arXiv cs.CL / 4/10/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that verifier-free evolutionary inference suffers from a dual bottleneck: repeated evolution collapses diversity toward narrow modes, and using a uniformly expensive model wastes compute and becomes economically impractical.
  • It introduces Squeeze Evolve, a lightweight multi-model orchestration framework that preserves diversity and improves cost-efficiency by allocating model capacity based on marginal utility across evolution stages.
  • The approach assigns stronger (higher-cost) models to high-impact stages while delegating lower-impact steps to cheaper models, aiming to jointly address both effectiveness and cost.
  • Across multiple benchmarks (AIME 2025, HMMT 2025, LiveCodeBench V6, GPQA-Diamond, ARC-AGI-V2, and multimodal vision tasks like MMMU-Pro/BabyVision), Squeeze Evolve improves the cost–capability frontier versus single-model evolution and reports new state-of-the-art results on several tasks.
  • Empirical results claim up to ~3× API cost reduction and up to ~10× higher throughput under fixed budgets, and it reportedly matches or exceeds verifier-based evolutionary methods on discovery tasks despite being verifier-free.

Abstract

We show that verifier-free evolution is bottlenecked by both diversity and efficiency: without external correction, repeated evolution accelerates collapse toward narrow modes, while the uniform use of a high-cost model wastes compute and quickly becomes economically impractical. We introduce Squeeze Evolve, a unified multi-model orchestration framework for verifier-free evolutionary inference. Our approach is guided by a simple principle: allocate model capability where it has the highest marginal utility. Stronger models are reserved for high-impact stages, while cheaper models handle the other stages at much lower costs. This principle addresses diversity and cost-efficiency jointly while remaining lightweight. Squeeze Evolve naturally supports open-source, closed-source, and mixed-model deployments. Across AIME 2025, HMMT 2025, LiveCodeBench V6, GPQA-Diamond, ARC-AGI-V2, and multimodal vision benchmarks, such as MMMU-Pro and BabyVision, Squeeze Evolve consistently improves the cost-capability frontier over single-model evolution and achieves new state-of-the-art results on several tasks. Empirically, Squeeze Evolve reduces API cost by up to \sim3\times and increases fixed-budget serving throughput by up to \sim10\times. Moreover, on discovery tasks, Squeeze Evolve is the first verifier-free evolutionary method to match, and in some cases exceed, the performance of verifier-based evolutionary methods.