Association Is Not Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval

arXiv cs.CL / 4/24/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The paper proposes Association-Augmented Retrieval (AAR), which reranks dense retrieval candidates using learned, corpus-specific associative relationships rather than relying solely on embedding similarity.
AAR uses a small 4.2M-parameter MLP trained with contrastive learning on co-occurrence annotations to score bidirectional associations between passages during inference.
On HotpotQA, AAR raises passage Recall@5 from 0.831 to 0.916 (+8.6 points) without evaluation-set tuning, with the largest gains on hard questions (+28.5 points); it also improves MuSiQue by +10.1 points in the transductive setting.
Experiments indicate the approach is not broadly transferable: an inductive variant trained on training-split associations shows no significant improvement on unseen validation associations, and ablations confirm that using true association pairs (not just semantic similarity) is critical.
The method is lightweight and practical, adding about 3.7ms per query, training in under two minutes on a single GPU, and requiring no LLM-based indexing, while retrieval improvements translate to +6.4 exact match in downstream QA.

Abstract

Dense retrieval systems rank passages by embedding similarity to a query, but multi-hop questions require passages that are associatively related through shared reasoning chains. We introduce Association-Augmented Retrieval (AAR), a lightweight transductive reranking method that trains a small MLP (4.2M parameters) to learn associative relationships between passages in embedding space using contrastive learning on co-occurrence annotations. At inference time, AAR reranks an initial dense retrieval candidate set using bi-directional association scoring. On HotpotQA, AAR improves passage Recall@5 from 0.831 to 0.916 (+8.6 points) without evaluation-set tuning, with gains concentrated on hard questions where the dense baseline fails (+28.5 points). On MuSiQue, AAR achieves +10.1 points in the transductive setting. An inductive model trained on training-split associations and evaluated on unseen validation associations shows no significant improvement, suggesting that the method captures corpus-specific co-occurrences rather than transferable patterns. Ablation studies support this interpretation: training on semantically similar but non-associated passage pairs degrades retrieval below the baseline, while shuffling association pairs causes severe degradation. A downstream QA evaluation shows retrieval gains translate to +6.4 exact match improvement. The method adds 3.7ms per query, trains in under two minutes on a single GPU, and requires no LLM-based indexing.

Black Hat USA

AI Business

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Dev.to

Context Engineering for Developers: A Practical Guide (2026)

Dev.to

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

Dev.to

Association Is Not Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval

Key Points

Abstract

Related Articles

Black Hat USA

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Context Engineering for Developers: A Practical Guide (2026)

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer