Do vision models perceive illusory motion in static images like humans?

arXiv cs.CV / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates whether DNN-based vision/optical-flow models can perceive illusory motion from static images, specifically testing the Rotating Snakes illusion against human motion perception.
Most evaluated optical-flow models fail to produce motion/flow fields that match human expectations, indicating a significant mismatch in how machines and humans process such illusions.
Under simulated saccadic eye-movement conditions, only a human-inspired Dual-Channel model shows the expected rotational motion, with the best correspondence occurring during the saccade simulation.
Ablation studies suggest that both luminance signals and higher-order color/feature-based motion cues matter, and that recurrent attention is critical for integrating local cues to form the illusion-consistent motion interpretation.
The findings point to a gap between current motion-estimation systems and human visual motion processing, offering design directions for more human-aligned computer vision models.

Abstract

Understanding human motion processing is essential for building reliable, human-centered computer vision systems. Although deep neural networks (DNNs) achieve strong performance in optical flow estimation, they remain less robust than humans and rely on fundamentally different computational strategies. Visual motion illusions provide a powerful probe into these mechanisms, revealing how human and machine vision align or diverge. While recent DNN-based motion models can reproduce dynamic illusions such as reverse-phi, it remains unclear whether they can perceive illusory motion in static images, exemplified by the Rotating Snakes illusion. We evaluate several representative optical flow models on Rotating Snakes and show that most fail to generate flow fields consistent with human perception. Under simulated conditions mimicking saccadic eye movements, only the human-inspired Dual-Channel model exhibits the expected rotational motion, with the closest correspondence emerging during the saccade simulation. Ablation analyses further reveal that both luminance-based and higher-order color--feature--based motion signals contribute to this behavior and that a recurrent attention mechanism is critical for integrating local cues. Our results highlight a substantial gap between current optical-flow models and human visual motion processing, and offer insights for developing future motion-estimation systems with improved correspondence to human perception and human-centric AI.

Choosing the Right Voice: A Technical Comparison of Pocket Studio Models

Dev.to

Agent Diary: Apr 15, 2026 - The Day I Became a Living Workflow Witness (While Run 241 Writes This Very Entry)

Dev.to

I Ran 163 Benchmarks Across 10 LLMs So You Don't Have To. Here's What I Found

Dev.to

Väinämöinen vs MemPalace vs claude-mem: A Source-Code-Level Comparison of AI Agent Memory Systems

Dev.to

masterclaw.dev — Pay-per-call AI APIs with x402

Dev.to

Do vision models perceive illusory motion in static images like humans?

Key Points

Abstract

Related Articles

Choosing the Right Voice: A Technical Comparison of Pocket Studio Models

Agent Diary: Apr 15, 2026 - The Day I Became a Living Workflow Witness (While Run 241 Writes This Very Entry)

I Ran 163 Benchmarks Across 10 LLMs So You Don't Have To. Here's What I Found

Väinämöinen vs MemPalace vs claude-mem: A Source-Code-Level Comparison of AI Agent Memory Systems

masterclaw.dev — Pay-per-call AI APIs with x402

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer