Translating Under Pressure: Domain-Aware LLMs for Crisis Communication

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses the challenge of producing timely, reliable multilingual crisis communication despite a lack of curated parallel data.
It introduces a domain-adaptive pipeline that enlarges a small reference corpus by retrieving and filtering relevant data from larger general corpora.
The authors fine-tune a small language model on the crisis-domain dataset and then use preference optimization to steer translations toward CEFR A2-level English.
Evaluation (automatic and human) shows improved readability and maintained adequacy, suggesting simplified English plus domain adaptation can act as a workable emergency lingua franca.
The approach is positioned as practical for situations where full multilingual coverage is not feasible due to resource constraints.

Abstract

Timely and reliable multilingual communication is critical during natural and human-induced disasters, but developing effective solutions for crisis communication is limited by the scarcity of curated parallel data. We propose a domain-adaptive pipeline that expands a small reference corpus, by retrieving and filtering data from general corpora. We use the resulting dataset to fine-tune a small language model for crisis-domain translation and then apply preference optimization to bias outputs toward CEFR A2-level English. Automatic and human evaluation shows that this approach improves readability, while maintaining strong adequacy. Our results indicate that simplified English, combined with domain adaptation, can function as a practical lingua franca for emergency communication when full multilingual coverage is not feasible.