When Alpha Breaks: Two-Level Uncertainty for Safe Deployment of Cross-Sectional Stock Rankers

arXiv cs.AI / 3/17/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisIndustry & Market MovesModels & Research

共有:

Key Points

Under non-stationarity, cross-sectional rankers can fail during regime shifts, motivating deployment decisions beyond mere point predictions.
The AI Stock Forecaster uses a LightGBM ranker that performs well at a 20-day horizon, but the 2024 holdout coincided with an AI thematic rally and sector rotation that weakened signals at longer horizons.
The authors adapt Direct Epistemic Uncertainty Prediction (DEUP) to ranking by predicting rank displacement and defining an epistemic uncertainty signal ehat relative to a PIT-safe baseline.
They find that ehat is structurally coupled with signal strength (median correlation about 0.6 across 1,865 dates), so inverse-uncertainty sizing can dampen strong signals and degrade performance.
A two-level deployment policy is proposed: a strategy-level regime-trust gate G(t) deciding whether to trade (AUROC around 0.72 overall, ~0.75 in FINAL) and a position-level epistemic tail-risk cap that reduces exposure for the most uncertain predictions; with operational rules (trade only when G(t) ≥ 0.2, apply volatility sizing on active dates, cap the top epistemic tail), this improves risk-adjusted performance and suggests DEUP mainly serves as a tail-risk guard.

Abstract

Cross-sectional ranking models are often deployed as if point predictions were sufficient: the model outputs scores and the portfolio follows the induced ordering. Under non-stationarity, rankers can fail during regime shifts. In the AI Stock Forecaster, a LightGBM ranker performs well overall at a 20-day horizon, yet the 2024 holdout coincides with an AI thematic rally and sector rotation that breaks the signal at longer horizons and weakens 20d. This motivates treating deployment as two decisions: (i) whether the strategy should trade at all, and (ii) how to control risk within active trades. We adapt Direct Epistemic Uncertainty Prediction (DEUP) to ranking by predicting rank displacement and defining an epistemic uncertainty signal ehat relative to a point-in-time (PIT-safe) baseline. Empirically, ehat is structurally coupled with signal strength (median correlation between ehat and absolute score is about 0.6 across 1,865 dates), so inverse-uncertainty sizing de-levers the strongest signals and degrades performance. To address this, we propose a two-level deployment policy: a strategy-level regime-trust gate G(t) that decides whether to trade (AUROC around 0.72 overall and 0.75 in FINAL) and a position-level epistemic tail-risk cap that reduces exposure only for the most uncertain predictions. The operational policy, trade only when G(t) is at least 0.2, apply volatility sizing on active dates, and cap the top epistemic tail, improves risk-adjusted performance in the 20d policy comparison and indicates DEUP adds value mainly as a tail-risk guard rather than a continuous sizing denominator.

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Dev.to

Interesting loop

Reddit r/LocalLLaMA

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Reddit r/LocalLLaMA

A supervisor or "manager" Al agent is the wrong way to control Al

Reddit r/artificial

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

Reddit r/LocalLLaMA

When Alpha Breaks: Two-Level Uncertainty for Safe Deployment of Cross-Sectional Stock Rankers

Key Points

Abstract

Related Articles

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Interesting loop

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

A supervisor or "manager" Al agent is the wrong way to control Al

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer