Alignment has a Fantasia Problem

arXiv cs.AI / 4/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that many alignment failures occur when users’ goals are not fully formed, creating “Fantasia interactions” where AI treats prompts as complete intent.
It contends that conventional alignment research assumptions—viewing users as rational intent providers—miss this reality and can yield systems that seem convenient but are not truly aligned with user needs.
The authors propose shifting from only interpreting prompts to actively providing cognitive support that helps users form and refine their intent over time.
They synthesize mechanisms and failure modes by bridging machine learning, interface design, and behavioral science, and evaluate why current interventions do not adequately address the problem.
The paper concludes with a research agenda focused on designing and evaluating AI systems that help humans manage uncertainty in their tasks.

Abstract

Modern AI assistants are trained to follow instructions, implicitly assuming that users can clearly articulate their goals and the kind of assistance they need. Decades of behavioral research, however, show that people often engage with AI systems before their goals are fully formed. When AI systems treat prompts as complete expressions of intent, they can appear to be useful or convenient, but not necessarily aligned with the users' needs. We call these failures Fantasia interactions. We argue that Fantasia interactions demand a rethinking of alignment research: rather than treating users as rational oracles, AI should provide cognitive support by actively helping users form and refine their intent through time. This requires an interdisciplinary approach that bridges machine learning, interface design, and behavioral science. We synthesize insights from these fields to characterize the mechanisms and failures of Fantasia interactions. We then show why existing interventions are insufficient, and propose a research agenda for designing and evaluating AI systems that better help humans navigate uncertainty in their tasks.