Abstract
This paper presents Sparse Gradient Descent as a solution for variable selection in convex piecewise linear regression, where the model is given as the maximum of k-affine functions x \mapsto \max_{j \in [k]} \langle a_j^\star, x \rangle + b_j^\star for j = 1,\dots,k. Here, \{ a_j^\star\}_{j=1}^k and \{b_j^\star\}_{j=1}^k denote the ground-truth weight vectors and intercepts. A non-asymptotic local convergence analysis is provided for Sp-GD under sub-Gaussian noise when the covariate distribution satisfies the sub-Gaussianity and anti-concentration properties. When the model order and parameters are fixed, Sp-GD provides an \epsilon-accurate estimate given \mathcal{O}(\max(\epsilon^{-2}\sigma_z^2,1)s\log(d/s)) observations where \sigma_z^2 denotes the noise variance. This also implies the exact parameter recovery by Sp-GD from \mathcal{O}(s\log(d/s)) noise-free observations. The proposed initialization scheme uses sparse principal component analysis to estimate the subspace spanned by \{ a_j^\star\}_{j=1}^k, then applies an r-covering search to estimate the model parameters. A non-asymptotic analysis is presented for this initialization scheme when the covariates and noise samples follow Gaussian distributions. When the model order and parameters are fixed, this initialization scheme provides an \epsilon-accurate estimate given \mathcal{O}(\epsilon^{-2}\max(\sigma_z^4,\sigma_z^2,1)s^2\log^4(d)) observations. A new transformation named Real Maslov Dequantization (RMD) is proposed to transform sparse generalized polynomials into sparse max-affine models. The error decay rate of RMD is shown to be exponentially small in its temperature parameter. Furthermore, theoretical guarantees for Sp-GD are extended to the bounded noise model induced by RMD. Numerical Monte Carlo results corroborate theoretical findings for Sp-GD and the initialization scheme.