Learning Tree-Based Models with Gradient Descent

arXiv cs.LG / 3/13/2026

📰 NewsModels & Research

共有:

Key Points

The thesis introduces a method to learn hard, axis-aligned decision trees via gradient descent by applying backpropagation with a straight-through operator on a dense DT representation, enabling differentiable training of tree structures.
It enables joint optimization of all tree parameters, overcoming the combinatorial and non-differentiable limitations of traditional DT methods like CART that rely on greedy splits.
The approach is designed to integrate with existing gradient-descent-based ML pipelines, including multimodal and reinforcement learning tasks.
The authors report state-of-the-art results across multiple domains, including interpretable trees for small tabular datasets, models for complex tabular data, and improvements in multimodal and interpretable reinforcement learning, without information loss.

Abstract

Tree-based models are widely recognized for their interpretability and have proven effective in various application domains, particularly in high-stakes domains. However, learning decision trees (DTs) poses a significant challenge due to their combinatorial complexity and discrete, non-differentiable nature. As a result, traditional methods such as CART, which rely on greedy search procedures, remain the most widely used approaches. These methods make locally optimal decisions at each node, constraining the search space and often leading to suboptimal tree structures. Additionally, their demand for custom training methods precludes a seamless integration into modern machine learning (ML) approaches. In this thesis, we propose a novel method for learning hard, axis-aligned DTs through gradient descent. Our approach utilizes backpropagation with a straight-through operator on a dense DT representation, enabling the joint optimization of all tree parameters, thereby addressing the two primary limitations of traditional DT algorithms. First, gradient-based training is not constrained by the sequential selection of locally optimal splits but, instead, jointly optimizes all tree parameters. Second, by leveraging gradient descent for optimization, our approach seamlessly integrates into existing ML approaches e.g., for multimodal and reinforcement learning tasks, which inherently rely on gradient descent. These advancements allow us to achieve state-of-the-art results across multiple domains, including interpretable DTs rees for small tabular datasets, advanced models for complex tabular data, multimodal learning, and interpretable reinforcement learning without information loss. By bridging the gap between DTs and gradient-based optimization, our method significantly enhances the performance and applicability of tree-based models across various ML domains.

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!

Reddit r/LocalLLaMA

acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan

Reddit r/LocalLLaMA

Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!

Reddit r/LocalLLaMA

Learning Tree-Based Models with Gradient Descent

Key Points

Abstract

Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!

acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan

Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Newest GPU server in the lab! 72gb ampere vram!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!

acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan

**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**

Newest GPU server in the lab! 72gb ampere vram!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding