Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference
arXiv cs.CL / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Pyramid MoA introduces a hierarchical Mixture-of-Agents architecture with a decision-theoretic router that escalates queries only when necessary to balance cost and accuracy.
- It formalizes a Probabilistic Anytime Property, proving that expected solution quality is non-decreasing with computational depth under certain router precision conditions.
- It derives an escalation rule from Value of Computation theory to handle imperfect oracles, extending the Hansen and Zilberstein monitoring framework to stochastic LLM inference.
- Empirical results show the router intercepts 81.6% of bugs on MBPP, matches the Oracle baseline on GSM8K/MMLU with up to 18.4% compute savings, and achieves high accuracy with substantial cost savings on HumanEval while preserving the Oracle ceiling on MATH 500.
- The framework dynamically serves as an aggressive cost-cutter for low-entropy tasks and a safety net for high-entropy tasks.




