Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs

arXiv cs.CL / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles Open-World Question Answering over incomplete or evolving knowledge graphs by moving beyond the closed-world assumption of traditional KGQA systems.
  • It introduces GLOW, a hybrid LLM–GNN approach where a pre-trained GNN proposes top-k candidate answers from graph structure and an LLM reasons over serialized triples and those candidates for semantic grounding.
  • Unlike prior methods that often depend heavily on retrieval quality or fine-tuning, GLOW is designed to perform joint reasoning over symbolic (graph facts) and semantic signals without retrieval or model fine-tuning.
  • The authors also propose GLOW-BENCH, a 1,000-question benchmark designed to evaluate generalization on incomplete KGs across diverse domains.
  • Experiments show GLOW improves performance versus existing LLM–GNN systems, reporting up to 53.3% and about 38% average gains on the benchmark and standard evaluations, with code and data released.

Abstract

Open-world Question Answering (OW-QA) over knowledge graphs (KGs) aims to answer questions over incomplete or evolving KGs. Traditional KGQA assumes a closed world where answers must exist in the KG, limiting real-world applicability. In contrast, open-world QA requires inferring missing knowledge based on graph structure and context. Large language models (LLMs) excel at language understanding but lack structured reasoning. Graph neural networks (GNNs) model graph topology but struggle with semantic interpretation. Existing systems integrate LLMs with GNNs or graph retrievers. Some support open-world QA but rely on structural embeddings without semantic grounding. Most assume observed paths or complete graphs, making them unreliable under missing links or multi-hop reasoning. We present GLOW, a hybrid system that combines a pre-trained GNN and an LLM for open-world KGQA. The GNN predicts top-k candidate answers from the graph structure. These, along with relevant KG facts, are serialized into a structured prompt (e.g., triples and candidates) to guide the LLM's reasoning. This enables joint reasoning over symbolic and semantic signals, without relying on retrieval or fine-tuning. To evaluate generalization, we introduce GLOW-BENCH, a 1,000-question benchmark over incomplete KGs across diverse domains. GLOW outperforms existing LLM-GNN systems on standard benchmarks and GLOW-BENCH, achieving up to 53.3% and an average 38% improvement. GitHub code and data are available.