Benchmarking Optimizers for MLPs in Tabular Deep Learning
arXiv cs.LG / 4/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses a gap in tabular deep learning by systematically benchmarking optimizers for training MLPs, rather than relying on AdamW as the default choice.
- Under a shared experimental protocol across multiple tabular datasets and standard supervised learning, the Muon optimizer shows consistently better performance than AdamW.
- The authors recommend considering Muon as a strong practical optimizer, assuming its additional training efficiency overhead is acceptable.
- They also find that using an exponential moving average (EMA) of model weights can improve AdamW performance on vanilla MLPs, though the benefit is less consistent for other model variants.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

10 ChatGPT Prompts Every Genetic Counselor Should Be Using in 2025
Dev.to

The Memory Wall Can't Be Killed — 3 Papers Proving Every Architecture Hits It
Dev.to

BlueColumn vs Mem0: Which AI Agent Memory API Should You Use?
Dev.to

The Physics Wall in 2026: 3 Papers That Show Why Node Shrinks Won't Save Us
Dev.to