Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games

arXiv cs.LG / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates how central (gaze-focused), peripheral, and temporal (past-state) visual information sources each contribute to human decision-making in dynamic Atari environments.
  • Using the Atari-HEAD dataset with synchronized eye-tracking, the authors apply a controlled ablation framework and train action-prediction networks under different combinations of included/excluded information sources.
  • Results across 20 Atari games show peripheral visual information is the dominant contributor, causing the largest median action-prediction accuracy drops (about 35–44%) when removed.
  • Gaze-derived information produces smaller accuracy reductions (~2–3%), while past-state information has a wider effect range (~2–16%), with higher impact likely linked to reduced “peripheral-information leakage.”
  • By clustering states using model-predicted action probabilities, the analysis identifies behavioral regimes such as focus-dominated and periphery-dominated decisions, and proposes a general method for estimating contributions of visual information sources from behavior.

Abstract

We study how different visual information sources contribute to human decision making in dynamic visual environments. Using Atari-HEAD, a large-scale Atari gameplay dataset with synchronized eye-tracking, we introduce a controlled ablation framework as a means to reverse-engineer the contribution of peripheral visual information, explicit gaze information in form of gaze maps, and past-state information from human behavior. We train action-prediction networks under six settings that selectively include or exclude these information sources. Across 20 games, peripheral information shows by far the strongest contribution, with median prediction-accuracy drops in the range of 35.27-43.90% when removed. Gaze information yields smaller drops of 2.11-2.76%, while past-state information shows a broader range of 1.52-15.51%, with the upper end likely more informative due to reduced peripheral-information leakage. To complement aggregate accuracies, we cluster states by true-action probabilities assigned by the different model configurations. This analysis identifies coarse behavioral regimes, including focus-dominated, periphery-dominated, and more contextual decision situations. These results suggest that human decision making in Atari depends strongly on information beyond the current focus of gaze, while the proposed framework provides a way to estimate such information-source contributions from behavior.