Learning Retrieval Models with Sparse Autoencoders
arXiv cs.LG / 3/17/2026
📰 NewsModels & Research
Key Points
- The paper introduces SPLARE, a method to train SAE-based learned sparse retrieval (LSR) models that encode queries and documents into high-dimensional sparse representations rather than projecting into the vocabulary space.
- Sparse autoencoders are used to decompose dense LLM representations into interpretable latent features, enabling more semantically structured and language-agnostic retrieval signals.
- Empirical results show SPLARE-based LSR consistently outperforms vocabulary-based LSR in multilingual and out-of-domain settings, with SPLARE-7B achieving top results on MMTEB multilingual and English retrieval tasks.
- A lighter 2B-parameter variant demonstrates a smaller footprint while preserving retrieval performance, highlighting practical scalability of the approach.
Related Articles
[D] Matryoshka Representation Learning
Reddit r/MachineLearning
Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning
Reddit r/LocalLLaMA

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups
SCMP Tech
Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling
MarkTechPost
Streaming experts
Simon Willison's Blog