LLM2Vec-Gen: Generative Embeddings from Large Language Models
arXiv cs.CL / 3/12/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- LLM2Vec-Gen presents a self-supervised approach to generate embeddings by learning to represent the model's potential response rather than directly encoding the input.
- It achieves this by adding trainable special tokens to the LLM's vocabulary, appending them to inputs, and optimizing them to encode the LLM's response while keeping the backbone frozen.
- Training uses the LLM's own completion as guidance along with an unsupervised embedding teacher that provides distillation targets, enabling learning from unlabeled queries.
- The method attains state-of-the-art self-supervised performance on MTEB (9.3% improvement over the best unsupervised embedding teacher), reduces harmful content retrieval by up to 43.2%, improves reasoning by about 29.3%, and yields interpretable embeddings that can be decoded back into text.
Related Articles

The Complete Guide to Model Context Protocol (MCP): Building AI-Native Applications in 2026
Dev.to

AI Agent Skill Security Report — 2026-03-25
Dev.to

Origin raises $30M Series A+ to improve global benefits efficiency
Tech.eu

AI Shields Your Money: Banks’ New Fraud Fighters
Dev.to

Building AI Phone Systems for Veterinary Clinics — What Actually Works
Dev.to