AirFM-DDA: Air-Interface Foundation Model in the Delay-Doppler-Angle Domain for AI-Native 6G

arXiv cs.AI / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes AirFM-DDA, a wireless “air-interface” foundation model for AI-native 6G that operates in the Delay-Doppler-Angle (DDA) domain rather than the usual space-time-frequency (STF) domain.
  • By reparameterizing CSI from STF into DDA, the model explicitly separates multipath components along physically meaningful axes, improving the ability to learn more universal channel representations.
  • AirFM-DDA uses window-based attention together with frame-structure-aware positional encoding to capture locally clustered multipath dependencies while avoiding the prohibitive cost of global attention.
  • Experiments on channel prediction and estimation show stronger zero-shot generalization to unseen scenarios/datasets than baseline approaches.
  • Compared with global attention, the window-based design reduces training and inference costs by nearly an order of magnitude and remains robust under high mobility, large delay spreads, severe noise, and extreme aliasing.

Abstract

The success of large foundation models is catalyzing a new paradigm for AI-native 6G network design: wireless foundation models for physical layer design. However, existing models often operate on channel state information (CSI) in the space-time-frequency (STF) domain, where distinct multipath components are inherently superimposed and structurally entangled. This hinders the learning of universal channel representation. Meanwhile, their reliance on global attention mechanisms incurs prohibitive computational overhead. In this paper, we propose AirFM-DDA, an Air-interface Foundation Model operating in the Delay-Doppler-Angle (DDA) domain for physicallayer tasks. Specifically, AirFM-DDA reparameterizes CSI from the STF domain into the DDA domain to explicitly resolve multipath components along physically meaningful axes. It employs a window-based attention module augmented with framestructure-aware positional encoding (FS-PE). This window-based attention aligns with locally clustered multipath dependencies while avoiding quadratic-complexity global attention, and FS-PE injects frame-structure priors into network. Extensive experiments demonstrate that AirFM-DDA achieves superior zero-shot generalization across unseen scenarios and datasets, consistently outperforming the baselines on channel prediction and estimation tasks. Compared to the global attention, its window-based attention reduces training and inference costs by nearly an order of magnitude. Moreover, AirFM-DDA maintains robustness under high mobility, large delay spreads, severe noise, and extreme aliasing conditions.