LMEB: Long-horizon Memory Embedding Benchmark
arXiv cs.CL / 3/16/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- LMEB is a new benchmark designed to evaluate embedding models on long-horizon memory retrieval tasks that involve fragmented, context-dependent, and temporally distant information.
- The framework spans 22 datasets and 193 zero-shot retrieval tasks across four memory types—episodic, dialogue, semantic, and procedural—using both AI-generated and human-annotated data.
- Evaluations of 15 embedding models show that LMEB is challenging, larger models do not always outperform smaller ones, and LMEB is orthogonal to the existing MTEB benchmark.
- By providing a standardized, reproducible evaluation framework, LMEB aims to drive progress in memory embeddings for long-term, context-dependent retrieval and highlights gaps in generalizing from traditional passage retrieval.



