MemGuard-Alpha: Detecting and Filtering Memorization-Contaminated Signals in LLM-Based Financial Forecasting via Membership Inference and Cross-Model Disagreement
arXiv cs.LG / 3/31/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- MemGuard-Alpha is presented as a zero-cost, post-generation framework to detect and filter memorization-contaminated signals that can cause look-ahead bias in LLM-based financial forecasting.
- The approach combines a MemGuard Composite Score (MCS), which aggregates multiple membership inference attack signals with temporal proximity features, and Cross-Model Memorization Disagreement (CMMD), which leverages different training cutoff dates across LLMs to flag memorized outputs.
- Experiments across seven LLMs, 50 S&P 100 stocks, and 42,800 prompts over 2019–2024 show substantially improved trading performance after filtering, including a higher Sharpe ratio (4.11 vs 2.76) and much larger average daily returns for “clean” signals.
- The paper reports a clear memorization signature: in-sample accuracy increases with contamination while out-of-sample accuracy declines, directly illustrating that memorization inflates apparent model performance.
- The authors argue prior mitigations like retraining or input anonymization are costly or information-losing, positioning MemGuard-Alpha as a practical real-time filtering alternative for quantitative strategies.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.



