LLM2Vec-Gen: Generative Embeddings from Large Language Models
arXiv cs.CL / 3/12/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- LLM2Vec-Gen presents a self-supervised approach to generate embeddings by learning to represent the model's potential response rather than directly encoding the input.
- It achieves this by adding trainable special tokens to the LLM's vocabulary, appending them to inputs, and optimizing them to encode the LLM's response while keeping the backbone frozen.
- Training uses the LLM's own completion as guidance along with an unsupervised embedding teacher that provides distillation targets, enabling learning from unlabeled queries.
- The method attains state-of-the-art self-supervised performance on MTEB (9.3% improvement over the best unsupervised embedding teacher), reduces harmful content retrieval by up to 43.2%, improves reasoning by about 29.3%, and yields interpretable embeddings that can be decoded back into text.
Related Articles

Astral to Join OpenAI
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Why Data is Important for LLM
Dev.to

Waymo hits 170 million miles while avoiding serious mayhem
The Verge

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to