CHiL(L)Grader: Calibrated Human-in-the-Loop Short-Answer Grading
arXiv cs.CL / 3/13/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- CHiL(L)Grader is a calibrated human-in-the-loop grading framework that combines uncertainty estimation with human review to improve trustworthiness in automated short-answer scoring.
- It employs post-hoc temperature scaling, confidence-based selective prediction, and continual learning to automatically grade only high-confidence responses and route uncertain cases to human graders.
- On three short-answer datasets, it auto-scores 35-65% of responses at expert-level quality (QWK >= 0.80), demonstrating effective use of uncertainty quantification in education AI.
- Each correction cycle uses teacher feedback to strengthen the model's grading ability and adapt to evolving rubrics and unseen questions.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to

Perplexity Hub
Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to