Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion

arXiv cs.CL / 4/27/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper addresses relation completion (RC) in cases where the needed information is rare or sparsely expressed, noting that LLMs often struggle even when using retrieval-augmented generation (RAG).- It introduces RC-RAG, a multi-stage paraphrase-guided framework that injects relation paraphrases at several points: during retrieval to broaden lexical coverage, in retrieval-based summarization to make summaries relation-aware, and during generation to guide reasoning.- RC-RAG improves robustness in long-tail settings without requiring any model fine-tuning, making it easier to adopt across different LLMs.- Experiments on two benchmark datasets using five LLMs show consistent gains over multiple RAG baselines, including a reported +40.6 EM improvement for the best LLM in long-tail scenarios.- The authors report low computational overhead while achieving these improvements, suggesting the approach can be practically deployed alongside existing RAG pipelines.

Abstract

Large language models (LLMs) struggle with relation completion (RC), both with and without retrieval-augmented generation (RAG), particularly when the required information is rare or sparsely represented. To address this, we propose a novel multi-stage paraphrase-guided relation-completion framework, RC-RAG, that systematically incorporates relation paraphrases across multiple stages. In particular, RC-RAG: (a) integrates paraphrases into retrieval to expand lexical coverage of the relation, (b) uses paraphrases to generate relation-aware summaries, and (c) leverages paraphrases during generation to guide reasoning for relation completion. Importantly, our method does not require any model fine-tuning. Experiments with five LLMs on two benchmark datasets show that RC-RAG consistently outperforms several RAG baselines. In long-tail settings, the best-performing LLM augmented with RC-RAG improves by 40.6 Exact Match (EM) points over its standalone performance and surpasses two strong RAG baselines by 16.0 and 13.8 EM points, respectively, while maintaining low computational overhead.

The Open Source AI Studio That Nobody's Talking About

Dev.to

How I Built a 10-Language Sports Analytics Platform with FastAPI, SQLite, and Claude AI (As a Solo Non-Technical Founder)

Dev.to

The five loops between AI coding and AI engineering

Dev.to

A Machine Learning Model for Stock Market Prediction

Dev.to

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo

MarkTechPost

Bridging the Long-Tail Gap: Robust Retrieval-Augmented Relation Completion via Multi-Stage Paraphrase Infusion

Key Points

Abstract

Related Articles

The Open Source AI Studio That Nobody's Talking About

How I Built a 10-Language Sports Analytics Platform with FastAPI, SQLite, and Claude AI (As a Solo Non-Technical Founder)

The five loops between AI coding and AI engineering

A Machine Learning Model for Stock Market Prediction

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer