CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents

arXiv cs.CL / 4/20/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces CHOP, a chunking-and-reconstruction framework for RAG that aims to prevent retrieval accuracy from degrading when similar documents coexist in a vector database.
CHOP uses an LLM-driven iterative process to assess chunk relevance and to rebuild document content by linking chunks to specific topics or query types.
It proposes two core modules: CNM-Extractor, which creates compact per-chunk signatures (categories, key nouns, and model names), and a Continuity Decision Module, which maintains contextual coherence by deciding whether consecutive chunks belong to the same document flow.
By prefixing each chunk with context-aware metadata, CHOP reduces semantic conflicts and improves retriever discrimination, leading to better ranking quality on benchmarks.
The experiments report strong performance, including a Top-1 Hit Rate of 90.77%, indicating improved retrieval correctness and fewer confusion-driven errors.

Abstract

Retrieval-Augmented Generation (RAG) systems lose retrieval accuracy when similar documents coexist in the vector database, causing unnecessary information, hallucinations, and factual errors. To alleviate this issue, we propose CHOP, a framework that iteratively evaluates chunk relevance with Large Language Models (LLMs) and progressively reconstructs documents by determining their association with specific topics or query types. CHOP integrates two key components: the CNM-Extractor, which generates compact per-chunk signatures capturing categories, key nouns, and model names, and the Continuity Decision Module, which preserves contextual coherence by deciding whether consecutive chunks belong to the same document flow. By prefixing each chunk with context-aware metadata, CHOP reduces semantic conflicts among similar documents and enhances retriever discrimination. Experiments on benchmark datasets show that CHOP alleviates retrieval confusion and provides a scalable approach for building high-quality knowledge bases, achieving a Top-1 Hit Rate of 90.77% and notable gains in ranking quality metrics.

Black Hat USA

AI Business

Black Hat Asia

AI Business

Which Version of Qwen 3.6 for M5 Pro 24g

Reddit r/LocalLLaMA

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

Which Version of Qwen 3.6 for M5 Pro 24g

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer