Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks

arXiv cs.LG / 5/5/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper highlights the need to benchmark LLM reasoning limits beyond small, fully visible graphs, since real-world graph data is often much larger and only partially accessible.
It introduces a new large-graph benchmark dataset called EstGraph, along with four tasks aimed at estimating large-scale graph properties.
The researchers evaluate multiple LLMs on these tasks across a variety of graph datasets, focusing on how well models can infer global properties from limited context.
To address context-length constraints, the paper proposes task-specific prompt construction methods that use random-walk sampling from very large graphs (up to millions of nodes) to provide sufficient information to the LLMs.

Abstract

With the rapidly improving reasoning abilities of Large Language Models (LLMs), there is also a rising demand to use them in a wide variety of domains. This brings about the need to carefully evaluate the limits of the capabilities of these models with various tests and benchmarks. Graph structures are ubiquitous in real-world data, and are often used to represent and analyze relationship patterns within data. Many benchmarks have already been proposed in the graph literature to test the reasoning ability of LLMs to follow and execute graph algorithms. However, due to the limited context length of LLMs, these benchmarks consist of very small graphs. In real-world data, the size of graphs can be significantly larger, and in many cases, not fully accessible. In this paper, we examine a class of problems that arises with very large graphs having limited accessibility. We propose a large graph benchmark dataset, EstGraph, and introduce four distinct tasks designed to estimate large graph properties. We evaluate the reasoning abilities of LLMs on these tasks using a wide variety of graph datasets. In addition, we provide task-specific prompt constructions based on random walk sampling of large graphs (up to millions of nodes) that effectively convey sufficient information to LLMs within the limits of context length.

Backed by Y Combinator and 20 unicorn founders, Moritz lands $9M

Tech.eu

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Dev.to

Anthropic Launches AI Services Company with Blackstone & Goldman Sachs

Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF

Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026

Dev.to

Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks

Key Points

Abstract

Related Articles

Backed by Y Combinator and 20 unicorn founders, Moritz lands $9M

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Anthropic Launches AI Services Company with Blackstone & Goldman Sachs

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF

10 Ways AI Has Become Your Invisible Daily Companion in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer