Spike Hijacking in Late-Interaction Retrieval
arXiv cs.LG / 4/8/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- Late-interaction retrieval models typically use hard MaxSim (winner-take-all) pooling to aggregate token/patch similarities, and the paper argues this can bias training dynamics structurally.
- The study analyzes gradient routing in MaxSim-based retrieval and shows that MaxSim causes significantly higher patch-level gradient concentration than smoother aggregation methods like Top-k pooling or softmax.
- In synthetic in-batch contrastive experiments, the authors find a sparsity–robustness tradeoff: while sparse routing can improve early discrimination, MaxSim becomes more sensitive to document length.
- Document-length sweeps on a real-world multi-vector retrieval benchmark confirm that MaxSim degrades more sharply than mild smoothing alternatives, indicating brittleness linked to hard max pooling.
- The work motivates replacing hard max pooling with more principled pooling/aggregation strategies to improve robustness in multi-vector late-interaction systems.
Related Articles

The enforcement gap: why finding issues was never the problem
Dev.to

Agentic AI vs Traditional Automation: Why They Require Different Approaches in Modern Enterprises
Dev.to

Agentic AI vs Traditional Automation: Why Modern Enterprises Must Treat Them Differently
Dev.to

Agentic AI vs Traditional Automation: Why Modern Enterprises Can’t Treat Them the Same
Dev.to

THE ATLAS SESSIONS
Dev.to