AI Navigate

Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization

arXiv cs.LG / 3/19/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Auto-PGD, converting the iterative proximal gradient descent algorithm into a trainable deep unfolded network by learning the parameters of each layer.
  • It introduces a hybrid layer that performs a learnable linear gradient transformation before the proximal projection to enhance performance and interpretability.
  • Hyperparameter optimization is performed with AutoGluon and a tree-structured Parzen estimator (TPE), exploring depth, initialization, optimizer, scheduler, layer type, and post-gradient activation.
  • The Auto-PGD approach achieves 98.8% of the spectral efficiency of a traditional 200-iteration PGD solver using only five unrolled layers and requires only 100 training samples, indicating reduced data and inference costs.
  • The work addresses gradient normalization for stable training and includes per-layer sum-rate logging to improve transparency.

Abstract

This study explores the combination of automated machine learning (AutoML) with model-based deep unfolding (DU) for optimizing wireless beamforming and waveforms. We convert the iterative proximal gradient descent (PGD) algorithm into a deep neural network, wherein the parameters of each layer are learned instead of being predetermined. Additionally, we enhance the architecture by incorporating a hybrid layer that performs a learnable linear gradient transformation prior to the proximal projection. By utilizing AutoGluon with a tree-structured parzen estimator (TPE) for hyperparameter optimization (HPO) across an expanded search space, which includes network depth, step-size initialization, optimizer, learning rate scheduler, layer type, and post-gradient activation, the proposed auto-unrolled PGD (Auto-PGD) achieves 98.8% of the spectral efficiency of a traditional 200-iteration PGD solver using only five unrolled layers, while requiring only 100 training samples. We also address a gradient normalization issue to ensure consistent performance during training and evaluation, and we illustrate per-layer sum-rate logging as a tool for transparency. These contributions highlight a notable reduction in the amount of training data and inference cost required, while maintaining high interpretability compared to conventional black-box architectures.