PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision

arXiv cs.AI / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces PReD, described as the first LLM-based foundation multimodal model targeted specifically at electromagnetic (EM) perception, recognition, and decision-making in a closed loop.
  • To address EM-domain data scarcity and limited domain knowledge integration, the authors built the PReD-1.3M multitask dataset and a corresponding evaluation benchmark, PReD-Bench.
  • PReD is trained on multiple signal representations—time-domain waveforms, frequency-domain spectrograms, and constellation diagrams—covering communication and radar signal features.
  • The model supports tasks spanning detection, modulation and protocol recognition, parameter estimation, RF fingerprint recognition, and even anti-jamming decision-making.
  • Experiments report state-of-the-art results on PReD-Bench, suggesting vision-aligned foundation-model approaches can significantly improve EM-signal understanding and reasoning.

Abstract

Multimodal Large Language Models have demonstrated powerful cross-modal understanding and reasoning capabilities in general domains. However, in the electromagnetic (EM) domain, they still face challenges such as data scarcity and insufficient integration of domain knowledge. This paper proposes PReD, the first foundation model for the EM domain that covers the intelligent closed-loop of "perception, recognition, decision-making." We constructed a high-quality multitask EM dataset, PReD-1.3M, and an evaluation benchmark, PReD-Bench. The dataset encompasses multi-perspective representations such as raw time-domain waveform, frequency-domain spectrograms, and constellation diagrams, covering typical features of communication and radar signals. It supports a range of core tasks, including signal detection, modulation recognition, parameter estimation, protocol recognition, radio frequency fingerprint recognition, and anti-jamming decision-making. PReD adopts a multi-stage training strategy that unifies multiple tasks for EM signals. It achieves closed-loop optimization from end-to-end signal understanding to language-driven reasoning and decision-making, significantly enhancing EM domain expertise while maintaining general multimodal capabilities. Experimental results show that PReD achieves state-of-the-art performance on PReD-Bench constructed from both open-source and self-collected signal datasets. These results collectively validate the feasibility and potential of vision-aligned foundation models in advancing the understanding and reasoning of EM signals.