Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization

arXiv cs.LG / 3/19/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes Auto-PGD, converting the iterative proximal gradient descent algorithm into a trainable deep unfolded network by learning the parameters of each layer.
It introduces a hybrid layer that performs a learnable linear gradient transformation before the proximal projection to enhance performance and interpretability.
Hyperparameter optimization is performed with AutoGluon and a tree-structured Parzen estimator (TPE), exploring depth, initialization, optimizer, scheduler, layer type, and post-gradient activation.
The Auto-PGD approach achieves 98.8% of the spectral efficiency of a traditional 200-iteration PGD solver using only five unrolled layers and requires only 100 training samples, indicating reduced data and inference costs.
The work addresses gradient normalization for stable training and includes per-layer sum-rate logging to improve transparency.

Abstract

This study explores the combination of automated machine learning (AutoML) with model-based deep unfolding (DU) for optimizing wireless beamforming and waveforms. We convert the iterative proximal gradient descent (PGD) algorithm into a deep neural network, wherein the parameters of each layer are learned instead of being predetermined. Additionally, we enhance the architecture by incorporating a hybrid layer that performs a learnable linear gradient transformation prior to the proximal projection. By utilizing AutoGluon with a tree-structured parzen estimator (TPE) for hyperparameter optimization (HPO) across an expanded search space, which includes network depth, step-size initialization, optimizer, learning rate scheduler, layer type, and post-gradient activation, the proposed auto-unrolled PGD (Auto-PGD) achieves 98.8% of the spectral efficiency of a traditional 200-iteration PGD solver using only five unrolled layers, while requiring only 100 training samples. We also address a gradient normalization issue to ensure consistent performance during training and evaluation, and we illustrate per-layer sum-rate logging as a tool for transparency. These contributions highlight a notable reduction in the amount of training data and inference cost required, while maintaining high interpretability compared to conventional black-box architectures.