Complex SGD and Directional Bias in Reproducing Kernel Hilbert Spaces

arXiv cs.LG / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a new variant of Stochastic Gradient Descent (complex SGD) that supports complex-valued parameters for complex-valued optimization problems, including those arising in complex neural networks.
  • It provides convergence guarantees for complex SGD (and also extends the analysis to complex gradient descent) under assumptions analogous to the standard real-valued theory, without requiring analyticity constraints.
  • The authors show that certain directional bias phenomena known in the real setting also carry over to the complex setting for kernel regression tasks.
  • Experiments on kernel regression with complex reproducing kernel Hilbert spaces demonstrate that complex SGD can effectively recover superoscillation functions and Blaschke products from appropriate spaces (Fock and Hardy) as optimal solutions for a chosen loss function.

Abstract

Stochastic Gradient Descent (SGD) is a known stochastic iterative method popular for large-scale convex optimization problems due to its simple implementation and scalability. Some objectives, such as those found in complex-valued neural networks, benefit from updates like in SGD and Gradient Descent (GD) with a newly defined ``gradient'' that allows for complex parameters. This complex variant of the SGD/GD methods has already been proposed, but convergence guarantees without analyticity constraints have not yet been provided. We propose a variant of SGD (complex SGD) that allows for complex parameters, and we provide convergence guarantees under assumptions that parallel those from the real setting. Notably, these results extend to GD as well, and with the same set of assumptions, we confirm that some directional bias results extend from the real to the complex setting for kernel regression problems. We provide empirical results demonstrating the efficacy of the complex SGD in kernel regression problems utilizing complex reproducing kernel Hilbert spaces. In particular, we demonstrate we may recover superoscillation functions and Blaschke products from the Fock Space and Hardy Space, respectively, as the optimal functions for a particular choice of a loss function.