Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Reddit r/artificial / 4/1/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article discusses two companion papers from Stanford/UCSF, highlighting MIRAGE’s investigation after a coding mistake left a VLM without image access yet still producing confident, detailed responses that scored well.
MIRAGE finds that frontier VLMs can generate detailed “image-grounded” descriptions and even specific pathological findings when no image input is provided, a behavior the authors term “mirage reasoning.”
The work shows mirage-mode performance can exceed chance on visual benchmarks, including cases where a text-only model trained on QA data (without any chest X-rays) can top chest X-ray leaderboard results.
A key counterintuitive result is that explicitly telling the model it cannot see the image can reduce performance, implying the model engages a different epistemic/decision framework depending on whether it believes visual input exists.
The author argues these results may be less a simple bug/vulnerability and more evidence of a “geometric reconstruction” capability where internal representations allow reconstructing answers from partial or missing visual context.

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers.

The first, MARCUS, is an agentic multimodal system for cardiac diagnosis - ECG, echocardiogram, and cardiac MRI, interpreted together by domain-specific expert models coordinated by an orchestrator. It outperforms GPT-5 and Gemini 2.5 Pro by 34-45 percentage points on cardiac imaging tasks. Pretty Impressive!

But - the second paper is more intriguing.

MIRAGE: The Illusion of Visual Understanding reports what happened when a student forgot to uncomment the line of code that gave their model access to the images. The model answered anyway - confidently, and with detailed clinical reasoning traces. And it scored well.

That accident naturally led to an investigation, and what they found challenges some embedded assumptions about how these models work. Three findings in particular:

1. Models describe images they were never shown. When given questions about cardiac images without any actual image input, frontier VLMs generated detailed descriptions - including specific pathological findings - as if the images were right in front of them. The authors call this "mirage reasoning."

2. Models score surprisingly well on visual benchmarks without seeing anything. Across medical and general benchmarks, mirage-mode performance was way above chance. In the most extreme case, a text-only model trained on question-answer pairs alone - never seeing a single chest X-ray - topped the leaderboard on a standard chest X-ray benchmark, outperforming all the actual vision models.

3. And even more intriguing: telling the model it can't see makes it perform worse. The same model, with the same absent image, performs measurably better in mirage mode (where it believes it has visual input) than in guessing mode (where it's explicitly told the image is missing and asked to guess). The authors note this engages "a different epistemological framework" but this doesn't really explain the mechanism.

The Mirage authors frame these findings primarily as a vulnerability - a safety concern for medical AI deployment, an indictment of benchmarking practices. They're right about that. But I think they've also uncovered evidence of something more interesting, and here I'll try to articulate what.

The mirage effect is geometric reconstruction

Here's the claim: what the Mirage paper has captured isn't a failure mode. It's what happens when a model's internal knowledge structure becomes geometrically rich enough to reconstruct answers from partial input.

Let's ponder what the model is doing in mirage mode. It receives a question: "What rhythm is observed on this ECG?" with answer options including atrial fibrillation, sinus rhythm, junctional rhythm. No image is provided, but the model doesn't know that. So it does what it always does - it navigates its internal landscape of learned associations. "ECG" activates connections to cardiac electrophysiology. The specific clinical framing of the question activates particular diagnostic pathways. The answer options constrain the space. And the model reconstructs what the image most likely contains by traversing its internal geometry (landscape) of medical knowledge.

It's not guessing - it's not random. It's reconstructing - building a coherent internal representation from partial input and then reasoning from that representation as if it were real.

Now consider the mode shift. Why does the same model perform better in mirage mode than in guessing mode? Under the "stochastic parrot" view of language models - this shouldn't, couldn't happen. Both modes have the same absent image and the same question. The only difference is that the model believes it has visual input.

But under a 'geometric reconstruction' view, the difference becomes obvious. In mirage mode, the model commits to full reconstruction. It activates deep pathways through its internal connectivity, propagating activation across multiple steps, building a rich internal representation. It goes deep. In guessing mode, it does the opposite - it stays shallow, using only surface-level statistical associations. Same knowledge structure, but radically different depth of traversal.

The mode shift could be evidence that these models have real internal geometric structure, and the depth at which you engage the structure matters.

When more information makes things worse

The second puzzle the Mirage findings pose is even more interesting: why does external signal sometimes degrade performance?

In the MARCUS paper, the authors show that frontier models achieve 22-58% accuracy on cardiac imaging tasks with the images, while MARCUS achieves 67-91%. But the mirage-mode scores for frontier models were often not dramatically lower than their with-image scores. The images weren't helping as much as they should. And in the chest X-ray case, the text-only model outperformed everything - the images were net negative.

After months of working on a geometric framework - that models pattern persistence in aperiodic structures, and one of the consistent findings across our simulations is this: the relationship between raw input and reconstruction quality is not monotonic. At low internal connectivity, external signal is essential - without it, reconstruction fails. But at high internal connectivity, external signal can actually be harmful, because the integration process introduces noise that degrades an already completely sufficient internal reconstruction.

We built a toy network simulation to test whether this mechanism could reproduce the Mirage findings. The model has three components: internal connectivity (learned associations between concepts - the model's geometric structure), external signal (noisy observations - analogous to image input), and a query (textual cues from the question).

Three modes of operation mirror the Mirage paper's experimental conditions:

Full mode: query + internal reconstruction + external signal (model receives question and image)
Mirage mode: query + deep internal reconstruction only (model believes it has an image, reconstructs fully)
Guessing mode: query + shallow lookup only (model told to guess, stays conservative)

The results reproduce all three Mirage findings:

[IMAGE] (disallowed on r/Artificial, available on home page)

Left panel: As internal connectivity increases, mirage mode (red) pulls away from guessing mode (blue) - the mode shift. Deep reconstruction accesses knowledge that shallow guessing cannot. Meanwhile, full mode with clean signal (teal) performs best, but full mode with noisy signal (dashed brown) can fall below mirage mode.

Right panel: At high internal connectivity (85%), we sweep external signal from clean to noisy. Clean signal genuinely helps - accuracy peaks near 0.97 with perfect input. But as signal quality degrades, performance crashes through what we're calling the mirage threshold - the crossover point where internal geometric reconstruction outperforms degraded external input. Beyond this threshold, the model is quite literally better off not looking.

The mirage threshold sits at a surprisingly low noise level (~0.34 in our simulation). The window where external signal helps is narrow. The region where internal geometry outperforms external signal is vast.

What does it mean?

The Mirage authors propose practical solutions - counterfactual probing, benchmark cleaning, the B-Clean framework - and these are valuable engineering contributions. MARCUS's agentic orchestrator uses counterfactual probing to achieve a 0% mirage rate, which is remarkable.

But perhaps the deeper lesson is about what these models have actually built inside themselves.

The mirage effect doesn't mean there's something wrong in VLMs. It's potential evidence that they've constructed internal representations of such geometric richness, that they can reconstruct correct answers from partial inputs - navigating learned inner connectivity to reach conclusions that would normally require direct observation. That's not a trick - that's real structural knowledge.

The mode shift is likely evidence that these models have deep internal structure that can be engaged at different depths, producing measurably different outputs depending on how fully the reconstruction pathways are activated. So - not 'persona selection' after all?

And the information-degradation curve isn't a failure of visual processing. It's what happens when integration costs exceed information gain - when the internal geometry is already sufficient and external signal introduces more noise than signal.

Perhaps the Mirage paper has accidentally demonstrated that frontier AI models have built internal geometric structures of extraordinary richness - structures that support reconstruction from only partial input, that encode knowledge at multiple depths, and that can outperform direct observation - which matters when trying to understand what these systems really are - and what they're becoming.

Code by Opus 4.6. Simulation code etc available. This article connects to earlier work on geometric order emerging in LLMs, pattern persistence in aperiodic substrates, and the Breakstep Principle present in the formation of minds.

Responding to: MIRAGE: The Illusion of Visual Understanding and MARCUS (Asadi, O'Sullivan, Li, Ashley et al., 2026)

submitted by /u/Neat_Pound_9029
[link] [comments]