Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese

arXiv cs.CL / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how to adapt small, efficient LLMs to Faroese by continuing pre-training on related Scandinavian languages (either individually or via model merging) before fine-tuning on Faroese.
  • It compares full fine-tuning against parameter-efficient adaptation using LoRA, evaluating impacts on general language modeling, linguistic accuracy, and text comprehension.
  • To compensate for limited Faroese evaluation resources, the authors create two minimal-pair probing benchmarks (linguistic acceptability and text comprehension) and add human evaluations by native Faroese linguists.
  • Findings indicate language transfer is crucial, but the best source language depends on the task: Icelandic helps linguistic accuracy while Danish improves reading comprehension.
  • Adaptation and training strategy tradeoffs emerge: LoRA performs better on linguistic acceptability and slightly higher human scores, whereas full fine-tuning better supports comprehension and more robust downstream fine-tuning; full fine-tuning with multi-source merging improves general language modeling but yields less consistent gains on the probes.

Abstract

We investigate strategies for adapting small, efficient language models to Faroese, a low-resource North Germanic language. Starting from English-pretrained models, we apply continued pre-training on related Scandinavian languages -- individually or combined via model merging -- before fine-tuning on Faroese. We compare full fine-tuning with parameter-efficient adaptation via LoRA, assessing their effects on general language modeling performance, linguistic accuracy, and text comprehension. To address the lack of existing Faroese evaluation resources, we construct two new minimal-pair probing benchmarks, one for linguistic acceptability and one for text comprehension, and complement them with human evaluations conducted by native Faroese linguists. Our results show that transfer from related languages is essential, but the optimal source language is task-dependent: Icelandic improves linguistic accuracy, while Danish boosts reading comprehension. The choice of adaptation method likewise depends on the target task: LoRA yields stronger linguistic acceptability and marginally higher human evaluation scores, whereas full fine-tuning produces better comprehension performance and more robust downstream fine-tuning. Merging multiple related languages under full fine-tuning (but not LoRA) improves general language modeling, though its benefits in the linguistic acceptability and comprehension probes are less consistent.