SeaAlert: Critical Information Extraction From Maritime Distress Communications with Large Language Models

arXiv cs.AI / 4/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces SeaAlert, an LLM-based framework designed to robustly extract critical information from safety-critical maritime distress VHF voice communications.
It targets real-world difficulties such as brief and noisy messages, deviations from standardized GMDSS procedures, and transcription errors introduced by ASR under channel noise and speaker stress.
To overcome limited labeled data, the authors build a synthetic data generation pipeline that uses an LLM to create realistic maritime distress messages, including hard cases where distress codewords are omitted or rephrased.
The pipeline then synthesizes speech from the generated utterances, degrades it with simulated VHF noise, and runs ASR to produce realistic noisy transcripts for training and evaluation.

Abstract

Maritime distress communications transmitted over very high frequency (VHF) radio are safety-critical voice messages used to report emergencies at sea. Under the Global Maritime Distress and Safety System (GMDSS), such messages follow standardized procedures and are expected to convey essential details, including vessel identity, position, nature of the distress, and required assistance. In practice, however, automatic analysis remains difficult because distress messages are often brief, noisy, and produced under stress, may deviate from the prescribed format, and are further degraded by automatic speech recognition (ASR) errors caused by channel noise and speaker stress. This paper presents SeaAlert, an LLM-based framework for robust analysis of maritime distress communications. To address the scarcity of labeled real-world data, we develop a synthetic data generation pipeline in which an LLM produces realistic and diverse maritime messages, including challenging variants in which standard distress codewords are omitted or replaced with less explicit expressions. The generated utterances are synthesized into speech, degraded with simulated VHF noise, and transcribed by an ASR system to obtain realistic noisy transcripts.