CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems
arXiv cs.CL / 4/27/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper presents “Clarity,” a framework to benchmark interactive NL2SQL systems under realistic ambiguity and unanswerability cases, especially when users provide incomplete clarification.
- Clarity automatically generates NL2SQL benchmark data by transforming executable SQL into queries with multi-faceted ambiguities, including grounded conversational continuations and schema-level metadata via a constraint-driven pipeline.
- Experiments on Spider and BIRD show that top NL2SQL systems, including those using strong LLMs, experience substantial performance drops in multi-faceted ambiguity scenarios.
- The findings suggest that while current systems can often detect ambiguity, they have difficulty precisely identifying (localizing) and resolving the underlying schema-level causes.
- Overall, the work argues for more robust ambiguity detection and resolution capabilities tailored to industry-grade, interactive NL2SQL deployments.
Related Articles

Subagents: The Building Block of Agentic AI
Dev.to

DeepSeek-V4 Models Could Change Global AI Race
AI Business

Got OpenAI's privacy filter model running on-device via ExecuTorch
Reddit r/LocalLLaMA

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems
Dev.to

We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why
Dev.to