End-to-End Chatbot Evaluation with Adaptive Reasoning and Uncertainty Filtering
arXiv cs.CL / 3/12/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The paper presents an end-to-end automatic evaluator for domain-specific chatbots that reduces manual review by automatically generating Q&A pairs from the underlying knowledge base and using LLMs to judge chatbot responses against reference answers.
- It introduces confidence-based filtering to highlight uncertain cases, helping reviewers focus on the most ambiguous outputs.
- The method is demonstrated on a Vietnamese news dataset, where it achieves high agreement with human judgments while significantly lowering review overhead.
- The framework is modular and language-agnostic, enabling easy adaptation to diverse domains and deployment scenarios with minimal manual intervention.