Real-Time Trustworthiness Scoring for LLM Structured Outputs and Data Extraction
arXiv cs.CL / 3/20/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- CONSTRUCT introduces a real-time trustworthiness scoring method for LLM-structured outputs to identify outputs with higher likelihood of errors and guide human review.
- The method scores trustworthiness at the level of individual fields within a structured output, enabling reviewers to focus on the parts that are wrong.
- It works with any LLM, including black-box APIs without logprobs, and does not require labeled training data or custom model deployment.
- The evaluation uses four datasets and shows higher precision/recall than other scoring methods, including assessments on models like Gemini 3 and GPT-5.
- The work provides one of the first public benchmarks for LLM structured outputs with reliable ground-truth values, including support for complex outputs with nested JSON schemas.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".
Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development
Dev.to

Windsurf’s New Pricing Explained: Simpler AI Coding or Hidden Trade-Offs?
Dev.to

Building Production RAG Systems with PostgreSQL: Complete Implementation Guide
Dev.to