IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages
arXiv cs.CL / 4/16/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- IndicDB is introduced as a new multilingual Text-to-SQL benchmark targeting real-world Indian-language settings where prior benchmarks mainly covered Western contexts and simplified schemas.
- The benchmark uses realistic relational database schemas sourced from open government/administrative data platforms (NDAP, IDP) and includes 20 databases with 237 tables, featuring complex join structures (up to six join depth).
- An iterative three-agent pipeline (Architect, Auditor, Refiner) is used to convert denormalized data into richly structured schemas while enforcing join validity and calibrating task difficulty.
- IndicDB generates 15,617 value-aware tasks across English, Hindi, and five Indic languages, and evaluates multiple state-of-the-art models for cross-lingual semantic parsing.
- Results reveal an “Indic Gap,” with a 9.00% performance drop from English to Indic languages attributed to harder schema linking, greater structural ambiguity, and limited external knowledge.
Related Articles
Which Version of Qwen 3.6 for M5 Pro 24g
Reddit r/LocalLLaMA
From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to
GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to
Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial