Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese

arXiv cs.CL / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how to adapt small, efficient LLMs to Faroese by continuing pre-training on related Scandinavian languages (either individually or via model merging) before fine-tuning on Faroese.
It compares full fine-tuning against parameter-efficient adaptation using LoRA, evaluating impacts on general language modeling, linguistic accuracy, and text comprehension.
To compensate for limited Faroese evaluation resources, the authors create two minimal-pair probing benchmarks (linguistic acceptability and text comprehension) and add human evaluations by native Faroese linguists.
Findings indicate language transfer is crucial, but the best source language depends on the task: Icelandic helps linguistic accuracy while Danish improves reading comprehension.
Adaptation and training strategy tradeoffs emerge: LoRA performs better on linguistic acceptability and slightly higher human scores, whereas full fine-tuning better supports comprehension and more robust downstream fine-tuning; full fine-tuning with multi-source merging improves general language modeling but yields less consistent gains on the probes.

Abstract

We investigate strategies for adapting small, efficient language models to Faroese, a low-resource North Germanic language. Starting from English-pretrained models, we apply continued pre-training on related Scandinavian languages -- individually or combined via model merging -- before fine-tuning on Faroese. We compare full fine-tuning with parameter-efficient adaptation via LoRA, assessing their effects on general language modeling performance, linguistic accuracy, and text comprehension. To address the lack of existing Faroese evaluation resources, we construct two new minimal-pair probing benchmarks, one for linguistic acceptability and one for text comprehension, and complement them with human evaluations conducted by native Faroese linguists. Our results show that transfer from related languages is essential, but the optimal source language is task-dependent: Icelandic improves linguistic accuracy, while Danish boosts reading comprehension. The choice of adaptation method likewise depends on the target task: LoRA yields stronger linguistic acceptability and marginally higher human evaluation scores, whereas full fine-tuning produces better comprehension performance and more robust downstream fine-tuning. Merging multiple related languages under full fine-tuning (but not LoRA) improves general language modeling, though its benefits in the linguistic acceptability and comprehension probes are less consistent.

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Dev.to

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Sector HQ Daily AI Intelligence - March 27, 2026

Dev.to

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Dev.to

Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese

Key Points

Abstract

Related Articles

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Sector HQ Daily AI Intelligence - March 27, 2026

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer