Gradient Regularized Newton Boosting Trees with Global Convergence

arXiv stat.ML / 5/4/2026

📰 NewsModels & Research

共有:

Key Points

The paper tackles a gap in understanding the global convergence behavior of Newton boosting for Gradient Boosting Decision Trees (GBDTs), which are widely used in tools like XGBoost, LightGBM, and CatBoost.
It introduces “Restricted Newton Descent,” an optimization framework for Newton’s method on Hilbert spaces with inexact iterates, using concepts such as cosine angle and weak gradient edge.
For smooth, strongly convex losses with a Hessian-dominance condition, the authors prove that vanilla Newton boosting converges linearly.
For more general convex losses with Lipschitz Hessians, they propose a gradient-regularized Newton scheme for the restricted weak learner setting that adds an adaptive L2 regularization term based on the gradient norm.
The resulting algorithm achieves a global convergence rate of O(1/k^2) for a second-order GBDT method, and experiments indicate it can converge where vanilla Newton boosting may diverge.

Abstract

Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based on Newton boosting: a second-order descent step in the space of decision trees. Despite its empirical success, the global convergence of Newton boosting is poorly understood compared to first-order boosting. In this paper, we introduce Restricted Newton Descent, which studies convex optimization with Newton's method on Hilbert spaces with inexact iterates, based on the concepts of cosine angle and weak gradient edge. Within this framework, we recover Newton boosting with GBDTs and classical finite-dimensional theory as special cases. We first prove that vanilla Newton boosting achieves a linear rate of convergence for smooth, strongly convex losses that satisfy a Hessian-dominance condition. To handle general convex losses with Lipschitz Hessians, we extend a recent gradient regularized Newton scheme to the restricted weak learner setting. This scheme minimally modifies the classical algorithm by introducing an adaptive

\ell_2

-regularization term proportional to the square root of the gradient norm at each iteration. We establish a

\mathcal{O}(\frac{1}{k^2})

rate for this scheme, thereby obtaining a globally convergent second-order GBDT algorithm with a rate matching that of first-order boosting with Nesterov momentum. In numerical experiments, we show that our scheme converges while vanilla Newton boosting may diverge.

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

Dev.to

The Refund Buried in Export Paperwork: Why Customs Drawback Claim Assembly Fits an Agent Better Than Another Research Bo

Dev.to

Gemini File Generation Guide: How to Create PDFs, Word Docs & Excel Files with AI (2026)

Dev.to

How an AI Agent Executed 500+ Real-World Operations and Built Its Own Recovery Engine

Dev.to

Qwen 3.6 27B MTP on v100 32GB: 54 t/s

Reddit r/LocalLLaMA

Gradient Regularized Newton Boosting Trees with Global Convergence

Key Points

Abstract

Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

The Refund Buried in Export Paperwork: Why Customs Drawback Claim Assembly Fits an Agent Better Than Another Research Bo

Gemini File Generation Guide: How to Create PDFs, Word Docs & Excel Files with AI (2026)

How an AI Agent Executed 500+ Real-World Operations and Built Its Own Recovery Engine

Qwen 3.6 27B MTP on v100 32GB: 54 t/s

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer