Spike Hijacking in Late-Interaction Retrieval

arXiv cs.LG / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

Late-interaction retrieval models typically use hard MaxSim (winner-take-all) pooling to aggregate token/patch similarities, and the paper argues this can bias training dynamics structurally.
The study analyzes gradient routing in MaxSim-based retrieval and shows that MaxSim causes significantly higher patch-level gradient concentration than smoother aggregation methods like Top-k pooling or softmax.
In synthetic in-batch contrastive experiments, the authors find a sparsity–robustness tradeoff: while sparse routing can improve early discrimination, MaxSim becomes more sensitive to document length.
Document-length sweeps on a real-world multi-vector retrieval benchmark confirm that MaxSim degrades more sharply than mild smoothing alternatives, indicating brittleness linked to hard max pooling.
The work motivates replacing hard max pooling with more principled pooling/aggregation strategies to improve robustness in multi-vector late-interaction systems.

Abstract

Late-interaction retrieval models rely on hard maximum similarity (MaxSim) to aggregate token-level similarities. Although effective, this winner-take-all pooling rule may structurally bias training dynamics. We provide a mechanistic study of gradient routing and robustness in MaxSim-based retrieval. In a controlled synthetic environment with in-batch contrastive training, we demonstrate that MaxSim induces significantly higher patch-level gradient concentration than smoother alternatives such as Top-k pooling and softmax aggregation. While sparse routing can improve early discrimination, it also increases sensitivity to document length: as the number of document patches grows, MaxSim degrades more sharply than mild smoothing variants. We corroborate these findings on a real-world multi-vector retrieval benchmark, where controlled document-length sweeps reveal similar brittleness under hard max pooling. Together, our results isolate pooling-induced gradient concentration as a structural property of late-interaction retrieval and highlight a sparsity-robustness tradeoff. These findings motivate principled alternatives to hard max pooling in multi-vector retrieval systems.

The enforcement gap: why finding issues was never the problem

Dev.to

Agentic AI vs Traditional Automation: Why They Require Different Approaches in Modern Enterprises

Dev.to

Agentic AI vs Traditional Automation: Why Modern Enterprises Must Treat Them Differently

Dev.to

Agentic AI vs Traditional Automation: Why Modern Enterprises Can’t Treat Them the Same

Dev.to

THE ATLAS SESSIONS

Dev.to

Spike Hijacking in Late-Interaction Retrieval

Key Points

Abstract

Related Articles

The enforcement gap: why finding issues was never the problem

Agentic AI vs Traditional Automation: Why They Require Different Approaches in Modern Enterprises

Agentic AI vs Traditional Automation: Why Modern Enterprises Must Treat Them Differently

Agentic AI vs Traditional Automation: Why Modern Enterprises Can’t Treat Them the Same

THE ATLAS SESSIONS

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer