AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese

arXiv cs.CL / 3/30/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The AMALIA technical report introduces a fully open-source large language model designed to better serve European Portuguese (pt-PT), addressing the language’s underrepresentation in training and native evaluations.
The model is trained with more high-quality pt-PT data during both mid-training and post-training to reduce gaps caused by relying on machine-translated resources.
The authors release a suite of pt-PT-focused benchmarks, including both translated standard tasks and new datasets aimed at pt-PT generation, linguistic competence, and pt-PT/pt-BR bias measurement.
Experimental results indicate AMALIA performs comparably to strong baselines on translated benchmarks while delivering substantially improved results on pt-PT-specific evaluations, reinforcing the value of targeted training and native benchmarking.

Abstract

Despite rapid progress in open large language models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and native evaluation, with machine-translated benchmarks likely missing the variant's linguistic and cultural nuances. We introduce AMALIA, a fully open LLM that prioritizes pt-PT by using more high-quality pt-PT data during both the mid- and post-training stages. To evaluate pt-PT more faithfully, we release a suite of pt-PT benchmarks that includes translated standard tasks and four new datasets targeting pt-PT generation, linguistic competence, and pt-PT/pt-BR bias. Experiments show that AMALIA matches strong baselines on translated benchmarks while substantially improving performance on pt-PT-specific evaluations, supporting the case for targeted training and native benchmarking for European Portuguese.

Black Hat Asia

AI Business

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Simon Willison's Blog

Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026

Dev.to

I missed the "fun" part in software development

Dev.to

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere

Dev.to

AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese

Key Points

Abstract

Related Articles

Black Hat Asia

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026

I missed the "fun" part in software development

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer