ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation
arXiv cs.CL / 4/23/2026
📰 NewsModels & Research
Key Points
- ORPHEAS is a specialized Greek-English embedding model designed to improve cross-lingual retrieval-augmented generation in bilingual Greek–English scenarios.
- The model addresses limitations of existing multilingual embeddings by concentrating capacity on Greek-specific morphology and domain terminology rather than spreading representational power across many languages.
- ORPHEAS is trained using a high-quality dataset created via a knowledge-graph-based fine-tuning approach over a diverse multi-domain corpus.
- Experiments on monolingual and cross-lingual retrieval benchmarks show ORPHEAS outperforms current state-of-the-art multilingual embedding models without sacrificing cross-lingual retrieval performance.
- The results suggest that domain-specialized fine-tuning for morphologically complex languages can yield better bilingual semantic alignment for RAG systems.
Related Articles

Training ChatGPT on Private Data: A Technical Reference
Dev.to
AI as a Fascist Artifact
Dev.to
Sony Ace: el robot que ganó 3 de 5 a élites de ping-pong en Nature
Dev.to

OpenAI releases open-source model that strips personal data from text
THE DECODER

Researchers warn US politics is repeating its ChatGPT mistake with world models
THE DECODER