Towards Unconstrained Human-Object Interaction

arXiv cs.CV / 4/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses human-object interaction (HOI) detection as a computer vision problem and argues that current methods are constrained by fixed interaction vocabularies used at both training and inference.
It proposes the new Unconstrained HOI (U-HOI) task, which removes the need for predefined interaction lists, targeting more realistic “in-the-wild” settings.
The authors leverage multimodal large language models (MLLMs) to perform interaction recognition in this open-ended setting, evaluating multiple MLLM options for the task.
They introduce a processing pipeline that includes test-time inference and language-to-graph conversion to extract structured interaction representations from free-form text.
The work releases code for the proposed approach and reports that existing HOI detectors have limitations, while MLLMs better support unconstrained HOI recognition.

Abstract

Human-Object Interaction (HOI) detection is a longstanding computer vision problem concerned with predicting the interaction between humans and objects. Current HOI models rely on a vocabulary of interactions at training and inference time, limiting their applicability to static environments. With the advent of Multimodal Large Language Models (MLLMs), it has become feasible to explore more flexible paradigms for interaction recognition. In this work, we revisit HOI detection through the lens of MLLMs and apply them to in-the-wild HOI detection. We define the Unconstrained HOI (U-HOI) task, a novel HOI domain that removes the requirement for a predefined list of interactions at both training and inference. We evaluate a range of MLLMs on this setting and introduce a pipeline that includes test-time inference and language-to-graph conversion to extract structured interactions from free-form text. Our findings highlight the limitations of current HOI detectors and the value of MLLMs for U-HOI. Code will be available at https://github.com/francescotonini/anyhoi

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

Dev.to

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

"The Hidden Costs of AI Agent Deployment: A CFO's Guide to True ROI in Enterpris

Dev.to

"The Real Cost of AI Compute: Why Token Efficiency Separates Viable Agents from

Dev.to

Towards Unconstrained Human-Object Interaction

Key Points

Abstract

Related Articles

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

"The Hidden Costs of AI Agent Deployment: A CFO's Guide to True ROI in Enterpris

"The Real Cost of AI Compute: Why Token Efficiency Separates Viable Agents from

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer