First-Mover Bias in Gradient Boosting Explanations: Mechanism, Detection, and Resolution

arXiv cs.AI / 3/25/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper identifies “first-mover bias” in gradient boosting explanations as a mechanistic, path-dependent concentration of feature importance caused by sequential residual fitting when correlated features compete for early splits.
  • It explains that the feature chosen first gains a self-reinforcing advantage because later trees inherit residuals that favor the incumbent, leading SHAP-based rankings to become unstable under multicollinearity.
  • The authors show that scaling to a “Large Single Model” (with the same total tree count) produces the worst SHAP explanation stability among tested workflows, making the bias more pronounced in that setting.
  • They demonstrate that breaking the sequential dependency via model independence resolves the issue in both linear regimes and remains the most effective mitigation under nonlinear data-generating processes.
  • Two approaches—DASH (Diversified Aggregation of SHAP) and simple seed-averaging (Stochastic Retrain)—restore stability (e.g., at ρ=0.9, stability reaches 0.977 for both), and the paper also introduces diagnostic tools (FSI and IS Plot) to detect the bias without ground truth.

Abstract

We isolate and empirically characterize first-mover bias -- a path-dependent concentration of feature importance caused by sequential residual fitting in gradient boosting -- as a specific mechanistic cause of the well-known instability of SHAP-based feature rankings under multicollinearity. When correlated features compete for early splits, gradient boosting creates a self-reinforcing advantage for whichever feature is selected first: subsequent trees inherit modified residuals that favor the incumbent, concentrating SHAP importance on an arbitrary feature rather than distributing it across the correlated group. Scaling up a single model amplifies this effect -- a Large Single Model with the same total tree count as our method produces the worst explanations of any approach tested. We demonstrate that model independence is sufficient to resolve first-mover bias in the linear regime, and remains the most effective mitigation under nonlinear data-generating processes. Both our proposed method, DASH (Diversified Aggregation of SHAP), and simple seed-averaging (Stochastic Retrain) restore stability by breaking the sequential dependency chain, confirming that the operative mechanism is independence between explained models. At rho=0.9, both achieve stability=0.977, while the single-best workflow degrades to 0.958 and the Large Single Model to 0.938. On the Breast Cancer dataset, DASH improves stability from 0.32 to 0.93 (+0.61) against a tree-count-matched baseline. DASH additionally provides two diagnostic tools -- the Feature Stability Index (FSI) and Importance-Stability (IS) Plot -- that detect first-mover bias without ground truth, enabling practitioners to audit explanation reliability before acting on feature rankings. Software and reproducible benchmarks are available at https://github.com/DrakeCaraker/dash-shap.