Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search
arXiv cs.CV / 4/28/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper targets text-based person anomaly search in surveillance archives, noting that pose-aware methods still suffer from a fundamental Pose-Semantic Gap where different actions can look geometrically similar.
- It argues that although Multimodal LLMs could resolve part of this ambiguity, they are too computationally expensive for large-scale retrieval.
- The proposed Structure-Semantic Decoupled Cascade (SSDC) framework splits retrieval into two stages: structure-aware coarse filtering using skeletal similarity, followed by multi-agent semantic verification.
- The “Detective Squad” multi-agent system includes a Detective for binary candidate filtering, an Analyst for evidence extraction, and a Writer for semantic synthesis, after which candidates are re-ranked by combining synthesized captions with structural priors.
- Experiments on the PAB benchmark report state-of-the-art performance, balancing retrieval efficiency with stronger semantic reasoning.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™
Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System
Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)
Dev.to

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹
Dev.to