LassoFlexNet: Flexible Neural Architecture for Tabular Data

arXiv stat.ML / 3/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that deep neural networks lag behind tree-based models on tabular data and proposes LassoFlexNet to close this gap by adding five inductive biases (feature irrelevance robustness, axis alignment, localized irregularities, heterogeneity, and training stability).
  • LassoFlexNet uses Per-Feature Embeddings to assess each input’s linear and nonlinear marginal contributions, and applies a Tied Group Lasso mechanism for sparse variable selection with Lasso-like interpretability.
  • To address optimization instability introduced by these components, the authors develop a Sequential Hierarchical Proximal Adaptive Gradient optimizer with exponential moving averages (EMA) to improve convergence stability.
  • Experiments across 52 datasets from three benchmarks show LassoFlexNet matches or outperforms leading tree-based models, with reported relative gains up to 10%, supported by ablation studies and theoretical proofs.
  • Theoretical analysis claims improved expressivity and “structural breaking” of undesired rotational invariance, aiming to better align neural representations with tabular data structure.

Abstract

Despite their dominance in vision and language, deep neural networks often underperform relative to tree-based models on tabular data. To bridge this gap, we incorporate five key inductive biases into deep learning: robustness to irrelevant features, axis alignment, localized irregularities, feature heterogeneity, and training stability. We propose \emph{LassoFlexNet}, an architecture that evaluates the linear and nonlinear marginal contribution of each input via Per-Feature Embeddings, and sparsely selects relevant variables using a Tied Group Lasso mechanism. Because these components introduce optimization challenges that destabilize standard proximal methods, we develop a \emph{Sequential Hierarchical Proximal Adaptive Gradient optimizer with exponential moving averages (EMA)} to ensure stable convergence. Across 52 datasets from three benchmarks, LassoFlexNet matches or outperforms leading tree-based models, achieving up to a 10\% relative gain, while maintaining Lasso-like interpretability. We substantiate these empirical results with ablation studies and theoretical proofs confirming the architecture's enhanced expressivity and structural breaking of undesired rotational invariance.