When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies
arXiv cs.CL / 4/14/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a pipeline where a frozen LLM converts daily news/filings into fixed-dimensional numerical features that feed a PPO reinforcement learning trading agent.
- It uses an automated prompt-optimization loop that tunes the extraction prompt as a discrete hyperparameter against Information Coefficient (Spearman rank correlation) rather than standard NLP losses.
- While the optimized prompts can yield genuinely predictive features (IC above 0.15 on held-out data), those features can fail to improve trading performance under distribution shift from macroeconomic shock.
- In the stressed regime the LLM-derived features add noise and the augmented agent underperforms a price-only baseline, though performance can recover in calmer periods.
- The study emphasizes a “feature-level validity vs policy-level robustness” gap under distribution shift, with macroeconomic state variables remaining the most robust drivers of improvement.
Related Articles

Black Hat Asia
AI Business
Microsoft launches MAI-Image-2-Efficient, a cheaper and faster AI image model
VentureBeat

The AI School Bus Camera Company Blanketing America in Tickets
Dev.to
GPT-5.3 and GPT-5.4 on OpenClaw: Setup and Configuration...
Dev.to
GLM-5 on OpenClaw: Setup Guide, Benchmarks, and When to...
Dev.to