SHOE: Semantic HOI Open-Vocabulary Evaluation Metric
arXiv cs.CV / 4/3/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that conventional HOI evaluation metrics like mAP inadequately assess open-vocabulary HOI detection because they treat HOI labels as discrete strings and ignore semantic equivalence.
- It introduces SHOE, a semantic evaluation framework that decomposes each predicted HOI into verb and object components and computes semantic similarity between prediction and ground truth.
- SHOE estimates semantic similarity using an averaged scoring approach across multiple large language models (LLMs), producing a similarity-based score rather than relying on exact lexical match.
- Experiments on standard benchmarks such as HICO-DET show SHOE better matches human judgments than existing metrics, reporting 85.73% agreement with average human ratings.
- The authors state they will release the SHOE evaluation metric publicly to support future research on semantically grounded, open-ended multimodal interaction understanding.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story
Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure
Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts
MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening
Reddit r/artificial