DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning

arXiv cs.AI / 4/22/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces DW-Bench, a new benchmark designed to test LLM graph-topology reasoning over data warehouse schemas.
  • DW-Bench explicitly models both foreign-key (FK) relationships and data-lineage edges to better reflect real warehouse graph structure.
  • The benchmark includes 1,046 automatically generated questions that are verifiably correct, spanning five different schemas.
  • Experiments indicate that tool-augmented LLM approaches significantly outperform static (non-tool) methods, though performance plateaus on particularly hard compositional question subtypes.

Abstract

This paper introduces DW-Bench, a new benchmark that evaluates large language models (LLMs) on graph-topology reasoning over data warehouse schemas, explicitly integrating both foreign-key (FK) and data-lineage edges. The benchmark comprises 1,046 automatically generated, verifiably correct questions across five schemas. Experiments show that tool-augmented methods substantially outperform static approaches but plateau on hard compositional subtypes.

DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning | AI Navigate