Learning Spatial Structure from Pre-Beamforming Per-Antenna Range-Doppler Radar Data via Visibility-Aware Cross-Modal Supervision

arXiv cs.CV / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper examines whether automotive radar models can learn meaningful spatial structure directly from pre-beamforming per-antenna range-Doppler (RD) measurements, avoiding explicit angle-domain beamforming steps.
  • Using a 6-TX x 8-RX commodity automotive radar with an A/B chirp-sequence CS-FMCW scheme that changes effective transmit aperture across chirps, the authors analyze how chirp-dependent transmit configurations impact spatial recoverability.
  • A dual-chirp shared-weight, end-to-end encoder is trained on pre-beamforming per-antenna RD tensors, evaluated via bird’s-eye-view (BEV) occupancy as a geometry-focused probe rather than a purely performance metric.
  • The supervision is visibility-aware and cross-modal: LiDAR-derived labels incorporate radar field-of-view and occlusion-aware LiDAR observability via ray-based visibility modeling.
  • Chirp ablations and range-band analyses, alongside physics-aligned baselines, conclude that spatial structure can be recovered without hand-crafted signal-processing or explicit angle-domain construction.

Abstract

Automotive radar perception pipelines commonly construct angle-domain representations via beamforming before applying learning-based models. This work instead investigates a representational question: can meaningful spatial structure be learned directly from pre-beamforming per-antenna range-Doppler (RD) measurements? Experiments are conducted on a 6-TX x 8-RX (48 virtual antennas) commodity automotive radar employing an A/B chirp-sequence frequency-modulated continuous-wave (CS-FMCW) transmit scheme, in which the effective transmit aperture varies between chirps (single-TX vs. multi-TX), enabling controlled analysis of chirp-dependent transmit configurations. We operate on pre-beamforming per-antenna RD tensors using a dual-chirp shared-weight encoder trained in an end-to-end, fully data-driven manner, and evaluate spatial recoverability using bird's-eye-view (BEV) occupancy as a geometric probe rather than a performance-driven objective. Supervision is visibility-aware and cross-modal, derived from LiDAR with explicit modeling of the radar field-of-view and occlusion-aware LiDAR observability via ray-based visibility. Through chirp ablations (A-only, B-only, A+B), range-band analysis, and physics-aligned baselines, we assess how transmit configurations affect geometric recoverability. The results indicate that spatial structure can be learned directly from pre-beamforming per-antenna RD tensors without explicit angle-domain construction or hand-crafted signal-processing stages.