Structured Multi-Criteria Evaluation of Large Language Models with Fuzzy Analytic Hierarchy Process and DualJudge
arXiv cs.AI / 4/7/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper proposes a structured evaluation method for large language models by adapting Analytic Hierarchy Process (AHP) to decompose judgments into explicit criteria rather than relying on opaque direct scoring.
- It introduces a confidence-aware Fuzzy AHP (FAHP) that represents epistemic uncertainty using triangular fuzzy numbers and uses LLM-generated confidence scores to modulate uncertainty during aggregation.
- Evaluations on JudgeBench show that both crisp and fuzzy AHP approaches outperform direct scoring across model scales and dataset splits, with FAHP delivering more stable results when comparisons are uncertain.
- The authors further develop DualJudge, a hybrid framework that fuses holistic direct scores with AHP outputs using consistency-aware weighting inspired by Dual-Process Theory.
- The work claims state-of-the-art performance for DualJudge and provides released code to support reproducibility and adoption.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Fully Automated Website 2026-04-11: **The Scoreboard — Visual Judge Score Comparison on the Homepage**
Dev.to
Human-Aligned Decision Transformers for satellite anomaly response operations with ethical auditability baked in
Dev.to

That Smoking-Gun Video? It's Not Evidence. It's a Suspect.
Dev.to