Learning Tree-Based Models with Gradient Descent
arXiv cs.LG / 3/13/2026
📰 NewsModels & Research
Key Points
- The thesis introduces a method to learn hard, axis-aligned decision trees via gradient descent by applying backpropagation with a straight-through operator on a dense DT representation, enabling differentiable training of tree structures.
- It enables joint optimization of all tree parameters, overcoming the combinatorial and non-differentiable limitations of traditional DT methods like CART that rely on greedy splits.
- The approach is designed to integrate with existing gradient-descent-based ML pipelines, including multimodal and reinforcement learning tasks.
- The authors report state-of-the-art results across multiple domains, including interpretable trees for small tabular datasets, models for complex tabular data, and improvements in multimodal and interpretable reinforcement learning, without information loss.
Related Articles
Self-Refining Agents in Spec-Driven Development
Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop
Reddit r/LocalLLaMA

M2.7 open weights coming in ~2 weeks
Reddit r/LocalLLaMA

MiniMax M2.7 Will Be Open Weights
Reddit r/LocalLLaMA
Best open source coding models for claude code? LB?
Reddit r/LocalLLaMA