QualiaNet: An Experience-Before-Inference Network

arXiv cs.CV / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a two-stage computational framework for 3D vision that mirrors human processing: an Experience Module that generates a disparity map relative to fixation, followed by an Inference Module that interprets that experience to infer 3D properties.
It argues that even if human stereo experience does not directly convey distance, it still shapes beliefs about scale, and the proposed method leverages this effect.
The Inference Module is built on a natural scene statistic: disparity gradients are typically stronger for near objects and become flatter for distant ones, enabling distance estimation without explicit depth cues.
QualiaNet implements this pipeline by feeding simulated human-like stereo disparity maps into a CNN trained to estimate distance, and the results show it can recover depth from disparity gradients alone.
Overall, the work validates an “experience-before-inference” architecture as a plausible mechanism for distance and 3D estimation based primarily on disparity gradient patterns.

Abstract

Human 3D vision involves two distinct stages: an Experience Module, where stereo depth is extracted relative to fixation, and an Inference Module, where this experience is interpreted to estimate 3D scene properties. Paradoxically, although our experience of stereo vision does not provide us with distance information, it does affect our inferences about visual scale. We propose the Inference Module exploits a natural scene statistic: near scenes produce vivid disparity gradients, while far scenes appear comparatively flat. QualiaNet implements this two-stage architecture computationally: disparity maps simulating human stereo experience are passed to a CNN trained to estimate distance. The network can recover distance from disparity gradients alone, validating this approach.