Maistros: A Greek Large Language Model Adapted Through Knowledge Distillation From Large Reasoning Models

arXiv cs.CL / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces Maistros, an open-weights Greek large language model designed to improve question answering in Modern Greek, where available QA resources and training datasets are limited.
  • It addresses the practicality gap of large reasoning models by distilling knowledge from large reasoning models into a smaller, more deployable model, aiming to retain accuracy without the heavy inference cost.
  • The work contributes CulturaQA, a high-quality dataset generated by large reasoning models and then human-curated for Greek LLM training and evaluation.
  • It also proposes a memory-efficient evaluation framework that can be adapted across languages and QA task types.
  • Maistros 8B is benchmarked via a comprehensive study evaluating nine LLMs on nine human-curated Greek QA datasets, showing the effectiveness of the distillation + fine-tuning approach for Greek QA.

Abstract

Large Language Models (LLMs) have substantially advanced the field of Natural Language Processing (NLP), achieving state-of-the-art performance across a wide range of tasks. These improvements have been attributed, in part, to their emerging reasoning capabilities, which are enabled by large-scale training and increased model capacity. However, existing LLMs can generate erroneous responses when addressing complex queries that fall outside their training distribution, due to limited internal knowledge or the need for multi-step reasoning. To address these limitations, recent work has introduced large reasoning models (LRMs), which incorporate explicit internal reasoning processes to improve response accuracy. Additionally, state-of-the-art LRMs often comprise hundreds of billions of parameters and require several seconds per inference, even on advanced multi-GPU systems. These characteristics limit their practicality for deployment in conventional computing environments. Meanwhile, NLP research on multilingual LLMs continues to prioritize high-resource languages. However, these models exhibit limited performance in under-resourced languages, primarily due to insufficient language- and culture-specific training data. In this paper, we focus on Modern Greek, for which only a limited number of question answering (QA) datasets have been proposed, most of which are intended for model evaluation. To address this research gap in Greek QA, we make the following contributions: (i) CulturaQA, a high-quality LRM-generated and human-curated dataset, for Greek LLM training and evaluation; (ii) a memory-efficient LLM evaluation framework adaptable to diverse languages and QA tasks; (iii) Maistros 8B, a state-of-the-art open-weights Greek LLM developed via knowledge distillation and fine-tuning on CulturaQA; and (iv) a comprehensive evaluation of nine LLMs across nine human-curated Greek QA datasets.