FUSE: Ensembling Verifiers with Zero Labeled Data
arXiv stat.ML / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces FUSE (Fully Unsupervised Score Ensembling), which improves LLM output verification by ensembling multiple verifiers without using any ground-truth correctness labels.
- FUSE works by controlling conditional dependencies among verifiers, aiming to boost the unsupervised performance of spectral-ensemble-style methods from the verification/ensembling literature.
- Experiments show FUSE can match or outperform semi-supervised alternatives in test-time scaling setups across varied generator models, verifier types, and benchmarks.
- Validation spans both established academic benchmarks (e.g., GPQA Diamond) and more challenging frontier-style, label-light evaluation sets such as Humanity’s Last Exam and IMO Shortlist questions.
Related Articles

Rethinking Coding Education for the AI Era
Dev.to

We Shipped an MVP With Vibe-Coding. Here's What Nobody Tells You About the Aftermath
Dev.to

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents
Dev.to

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work
Dev.to

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)
Dev.to