ValueAlpha: Agreement-Gated Stress Testing of LLM-Judged Investment Rationales Before Returns Are Observable
arXiv cs.AI / 4/29/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper highlights a “pre-realization evaluation” problem in long-horizon investing where realized returns arrive too late and are too noisy to guide AI-finance development and governance decisions.
- It argues that unvalidated LLM judges may reward superficial behaviors (verbosity, confidence, rubric mimicry) rather than true financial judgment, motivating a more rigorous protocol.
- ValueAlpha is introduced as a preregistered, agreement-gated stress-testing method that decides whether LLM-judged investment-rationale claims are publishable, qualified, or invalid.
- In a controlled capital-allocation prototype (1,000 honest cycles plus adversarial controls), the method passes an overall agreement gate (κ̄w = 0.7168) while blocking several overclaims and identifying failure modes such as per-dimension constraint_awareness collapse and family-dependent rankings.
- The authors position ValueAlpha as a pre-calibration “metrology” layer for AI-finance evaluation rather than a leaderboard or a measure of genuine investment skill.


