Pedestrian Crossing Intent Prediction via Psychological Features and Transformer Fusion
arXiv cs.CV / 3/23/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces a lightweight, socially informed architecture for pedestrian intention prediction that fuses four behavioral streams (attention, position, situation, and interaction) using highway encoders and a compact 4-token Transformer.
- It incorporates uncertainty estimation via a variational bottleneck and a Mahalanobis distance detector to provide calibrated probabilities and actionable risk scores.
- On PSI 1.0, it outperforms recent vision-language models with 0.9 F1, 0.94 AUC-ROC, and 0.78 MCC using only structured features; on PSI 2.0, it establishes a strong baseline of 0.78 F1 and 0.79 AUC-ROC with selective prediction improving accuracy at 80% coverage.
- The approach is modality-agnostic, easy to integrate with vision-language pipelines, and suitable for risk-aware intent prediction on resource-constrained platforms.
Related Articles
How AI is Transforming Dynamics 365 Business Central
Dev.to
Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm
Reddit r/artificial
Do I need different approaches for different types of business information errors?
Dev.to
ShieldCortex: What We Learned Protecting AI Agent Memory
Dev.to
How AI-Powered Revenue Intelligence Transforms B2B Sales Teams
Dev.to