Appearance-free Action Recognition: Zero-shot Generalization in Humans and a Two-Pathway Model
arXiv cs.CV / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The study investigates whether humans can perform zero-shot action recognition on appearance-free transformations of real-world videos, where static body shape cues are removed.
- In a lab experiment with 22 participants, people trained on naturalistic UCF5 videos still recognized actions above chance level on two appearance-free variants (AFD5 dense-noise motion and random-dot motion videos), though accuracy dropped.
- The authors propose a two-pathway 3D CNN model with separate RGB (form) and optical-flow (motion) streams plus a coherence-gating mechanism inspired by Gestalt “common-fate” grouping.
- The model matches the generalization behavior on both appearance-free datasets and outperforms contemporary video classification models, with motion cues proving critical for zero-shot appearance-free generalization and form cues helping on naturalistic videos.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA