Cortical Policy: A Dual-Stream View Transformer for Robotic Manipulation
arXiv cs.RO / 3/24/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes “Cortical Policy,” a dual-stream view transformer for robotic manipulation that jointly reasons from static-view and dynamic-view inputs rather than using view-specific static features alone.
- A static-view stream improves 3D spatial understanding by aligning features of geometrically consistent keypoints extracted with help from a pretrained 3D foundation model.
- A dynamic-view stream uses position-aware pretraining of an egocentric gaze estimation model to enable adaptive, motion-relevant reasoning, inspired by the human cortical dorsal pathway.
- The integrated representations from both streams produce language-conditioned actions, and experiments on RLBench, COLOSSEUM, and real-world tasks show substantial gains over state-of-the-art baselines.
- The authors argue that the cortex-inspired dual-stream design addresses prior limitations in 3D spatial reasoning and dynamic adaptation, with potential for wider vision-based robot control applications.
Related Articles
Santa Augmentcode Intent Ep.6
Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.
Dev.to
ClawRouter vs TeamoRouter: one requires a crypto wallet, one doesn't
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’
Reddit r/artificial