F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World
arXiv cs.CL / 3/20/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- F2LLM-v2 is a new family of multilingual embedding models spanning 80M to 14B parameters, trained on a curated dataset of 60 million samples and supporting over 200 languages.
- The training uses a two-stage LLM-based embedding pipeline with matryoshka learning, model pruning, and knowledge distillation to boost efficiency while preserving performance, with F2LLM-v2-14B ranking first on 11 MTEB benchmarks.
- The release emphasizes open-source access, making all models, data, code, and intermediate checkpoints available to the research community.
- The smaller models set new state-of-the-art results for resource-constrained applications and advance support for underserved mid- and low-resource languages.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".
Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development
Dev.to

Windsurf’s New Pricing Explained: Simpler AI Coding or Hidden Trade-Offs?
Dev.to

Building Production RAG Systems with PostgreSQL: Complete Implementation Guide
Dev.to