Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

arXiv cs.AI / 4/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses real-world document question answering, where evidence must be synthesized across many documents and document sections that can exceed any fixed LLM context window.
It argues that simple chunking introduces an “aggregation bottleneck” as the number of chunks grows, because systems must reason over an ever-larger set of extracted evidence.
SLIDERS is introduced as a framework that stores salient extracted information in a relational database and performs scalable reasoning over persistent structured state using SQL rather than concatenated text.
To ensure the locally extracted representations stay globally consistent, SLIDERS adds a data reconciliation stage that uses provenance, extraction rationales, and metadata to detect and repair duplicated, inconsistent, or incomplete records.
The framework improves performance on multiple long-context benchmarks, outperforming baselines by an average of 6.6 points over GPT-4.1-strong baselines and showing large gains on newly introduced benchmarks at 3.9M and 36M tokens.

Abstract

Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections grow. A common workaround is to decompose documents into chunks and assemble answers from chunk-level outputs, but this introduces an aggregation bottleneck: as the number of chunks grows, systems must still combine and reason over an increasingly large body of extracted evidence. We present SLIDERS, a framework for question answering over long document collections through structured reasoning. SLIDERS extracts salient information into a relational database, enabling scalable reasoning over persistent structured state via SQL rather than concatenated text. To make this locally extracted representation globally coherent, SLIDERS introduces a data reconciliation stage that leverages provenance, extraction rationales, and metadata to detect and repair duplicated, inconsistent, and incomplete records. SLIDERS outperforms all baselines on three existing long-context benchmarks, despite all of them fitting within the context window of strong base LLMs, exceeding GPT-4.1 by 6.6 points on average. It also improves over the next best baseline by ~19 and ~32 points on two new benchmarks at 3.9M and 36M tokens, respectively.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/27DailyView insight →

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Everyone Wants AI Agents. Fewer Teams Are Ready for the Messy Business Context Behind Them

Dev.to

Free Registration & $20K Prize Pool: 2nd MLC-SLM Challenge 2026 on Multilingual Speech LLMs [N]

Reddit r/MachineLearning

AI 编程工具对比 2026：Claude Code vs Cursor vs Gemini CLI vs Codex

Dev.to

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools

Dev.to

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

Key Points

Abstract

💡 Insights using this article

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Everyone Wants AI Agents. Fewer Teams Are Ready for the Messy Business Context Behind Them

Free Registration & $20K Prize Pool: 2nd MLC-SLM Challenge 2026 on Multilingual Speech LLMs [N]

AI 编程工具对比 2026：Claude Code vs Cursor vs Gemini CLI vs Codex

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer