KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning

arXiv cs.CL / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces KG-Hopper, a reinforcement learning framework designed to improve compact open LLMs on knowledge graph multi-hop question answering by reducing brittle step-by-step pipelines.
  • Instead of executing reasoning in isolated sequential steps, KG-Hopper trains a 7B “Reasoning LLM” to embed the full knowledge-graph traversal and decision process into a single unified thinking stage with dynamic path exploration and backtracking.
  • Experiments across eight KG reasoning benchmarks show KG-Hopper outperforms larger multi-step systems (up to 70B) and matches competitive performance against proprietary models like GPT-3.5-Turbo and GPT-4o-mini.
  • The approach is reported to be compact, open, and data-efficient, and the authors provide public code via the linked GitHub repository.

Abstract

Large Language Models (LLMs) demonstrate impressive natural language capabilities but often struggle with knowledge-intensive reasoning tasks. Knowledge Base Question Answering (KBQA), which leverages structured Knowledge Graphs (KGs) exemplifies this challenge due to the need for accurate multi-hop reasoning. Existing approaches typically perform sequential reasoning steps guided by predefined pipelines, restricting flexibility and causing error cascades due to isolated reasoning at each step. To address these limitations, we propose KG-Hopper, a novel Reinforcement Learning (RL) framework that empowers compact open LLMs with the ability to perform integrated multi-hop KG reasoning within a single inference round. Rather than reasoning step-by-step, we train a Reasoning LLM that embeds the entire KG traversal and decision process into a unified ``thinking'' stage, enabling global reasoning over cross-step dependencies and dynamic path exploration with backtracking. Experimental results on eight KG reasoning benchmarks show that KG-Hopper, based on a 7B-parameter LLM, consistently outperforms larger multi-step systems (up to 70B) and achieves competitive performance with proprietary models such as GPT-3.5-Turbo and GPT-4o-mini, while remaining compact, open, and data-efficient. The code is publicly available at: https://github.com/Wangshuaiia/KG-Hopper.