Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution

arXiv cs.CL / 4/10/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that verifier-free evolutionary inference suffers from a dual bottleneck: repeated evolution collapses diversity toward narrow modes, and using a uniformly expensive model wastes compute and becomes economically impractical.
It introduces Squeeze Evolve, a lightweight multi-model orchestration framework that preserves diversity and improves cost-efficiency by allocating model capacity based on marginal utility across evolution stages.
The approach assigns stronger (higher-cost) models to high-impact stages while delegating lower-impact steps to cheaper models, aiming to jointly address both effectiveness and cost.
Across multiple benchmarks (AIME 2025, HMMT 2025, LiveCodeBench V6, GPQA-Diamond, ARC-AGI-V2, and multimodal vision tasks like MMMU-Pro/BabyVision), Squeeze Evolve improves the cost–capability frontier versus single-model evolution and reports new state-of-the-art results on several tasks.
Empirical results claim up to ~3× API cost reduction and up to ~10× higher throughput under fixed budgets, and it reportedly matches or exceeds verifier-based evolutionary methods on discovery tasks despite being verifier-free.

Abstract

We show that verifier-free evolution is bottlenecked by both diversity and efficiency: without external correction, repeated evolution accelerates collapse toward narrow modes, while the uniform use of a high-cost model wastes compute and quickly becomes economically impractical. We introduce Squeeze Evolve, a unified multi-model orchestration framework for verifier-free evolutionary inference. Our approach is guided by a simple principle: allocate model capability where it has the highest marginal utility. Stronger models are reserved for high-impact stages, while cheaper models handle the other stages at much lower costs. This principle addresses diversity and cost-efficiency jointly while remaining lightweight. Squeeze Evolve naturally supports open-source, closed-source, and mixed-model deployments. Across AIME 2025, HMMT 2025, LiveCodeBench V6, GPQA-Diamond, ARC-AGI-V2, and multimodal vision benchmarks, such as MMMU-Pro and BabyVision, Squeeze Evolve consistently improves the cost-capability frontier over single-model evolution and achieves new state-of-the-art results on several tasks. Empirically, Squeeze Evolve reduces API cost by up to

\sim

\times

and increases fixed-budget serving throughput by up to

\sim

\times

. Moreover, on discovery tasks, Squeeze Evolve is the first verifier-free evolutionary method to match, and in some cases exceed, the performance of verifier-based evolutionary methods.

CIA is trusting AI to help analyze intel from human spies

Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table

Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.

Dev.to

My AI-900 Experience Learning Azure AI from Scratch

Dev.to

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

Dev.to

Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution

Key Points

Abstract

Related Articles

CIA is trusting AI to help analyze intel from human spies

LLM API Pricing in 2026: I Put Every Major Model in One Table

i generated AI video on a GTX 1660. here's what it actually takes.

My AI-900 Experience Learning Azure AI from Scratch

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer