One Size Fits None: Heuristic Collapse in LLM Investment Advice

arXiv cs.CL / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The research examines whether frontier LLMs provide genuinely context-aware advice in high-stakes domains, or whether they simplify multi-factor judgments via “heuristic collapse.”
  • In investment-advice tasks, the study finds that LLM-driven allocation decisions are dominated by the user’s self-reported risk tolerance, while other legally relevant factors have minimal influence.
  • The authors use interpretable surrogate models to measure input sensitivity, providing evidence of a systematic reduction from complex individualized reasoning to a few dominant inputs.
  • Web search augmentation can partially reduce the heuristic collapse effect, but it does not eliminate it, implying that augmentation and scaling alone are insufficient.
  • The paper concludes that organizations deploying LLMs as advisors must audit how sensitive outputs are to different inputs, focusing on input-sensitivity rather than only output quality.

Abstract

Large language models are increasingly deployed as advisors in high-stakes domains -- answering medical questions, interpreting legal documents, recommending financial products -- where good advice requires integrating a user's full context rather than responding to salient surface features. We investigate whether frontier LLMs actually do this, or whether they instead exhibit heuristic collapse: a systematic reduction of complex, multi-factor decisions to a small number of dominant inputs. We study the phenomenon in investment advice, where legal standards explicitly require individualized reasoning over a client's full circumstances. Applying interpretable surrogate models to LLM outputs, we find systematic heuristic collapse: investment allocation decisions are largely determined by self-reported risk tolerance, while other relevant factors contribute minimally. We further find that web search partially attenuates heuristic collapse but does not resolve it. These findings suggest that heuristic collapse is not resolved by web search augmentation or model scale alone, and that deploying LLMs as advisors requires auditing input sensitivity, not just output quality.