MOSA: Motion-Guided Semantic Alignment for Dynamic Scene Graph Generation
arXiv cs.CV / 4/22/2026
📰 NewsModels & Research
Key Points
- The paper introduces MOSA, a motion-guided semantic alignment approach for Dynamic Scene Graph Generation that targets shortcomings in fine-grained and tail relationship modeling.
- It encodes object-pair motion attributes (e.g., distance, velocity, motion persistence, directional consistency) using a Motion Feature Extractor and fuses them with spatial relationship features via a Motion-guided Interaction Module.
- To improve semantic discrimination, MOSA uses a cross-modal Action Semantic Matching mechanism that aligns visual relationship features with text embeddings of relationship categories.
- It adds a category-weighted loss to emphasize learning of infrequent (“tail”) relationships, and reports the best performance on the Action Genome dataset.
Related Articles

Rethinking CNN Models for Audio Classification
Dev.to
v0.20.0rc1
vLLM Releases
I built my own event bus for a sustainability app — here's what I learned about agent automation using OpenClaw
Dev.to

HNHN: Hypergraph Networks with Hyperedge Neurons
Dev.to

Anthropic’s Mythos is stoking cybersecurity fears. What does it mean for China?
SCMP Tech