HOI-aware Adaptive Network for Weakly-supervised Action Segmentation

arXiv cs.CV / 4/30/2026

📰 NewsModels & Research

共有:

Key Points

The paper introduces AdaAct, an HOI-aware adaptive network for weakly-supervised action segmentation that uses human-object interactions (HOI) as video-level prior knowledge.
Instead of relying on a fixed model across all videos, AdaAct dynamically adapts its temporal encoding at test time based on the HOI sequence to reduce ambiguity between similar actions.
The method first builds a video HOI encoder to extract, select, and integrate the most representative HOI signals over the full video.
It then uses a two-branch HyperNetwork to generate/adjust temporal encoder parameters on the fly from HOI information, enabling per-video adaptation.
Experiments on Breakfast and 50Salads show that the approach improves performance across multiple evaluation metrics.

Abstract

In this paper, we propose an HOI-aware adaptive network named AdaAct for weakly-supervised action segmentation. Most existing methods learn a fixed network to predict the action of each frame with the neighboring frames. However, this would result in ambiguity when estimating similar actions, such as pouring juice and pouring coffee. To address this, we aim to exploit temporally global but spatially local human-object interactions (HOI) as video-level prior knowledge for action segmentation. The long-term HOI sequence provides crucial contextual information to distinguish ambiguous actions, where our network dynamically adapts to the given HOI sequence at test time. More specifically, we first design a video HOI encoder that extracts, selects, and integrates the most representative HOI throughout the video. Then, we propose a two-branch HyperNetwork to learn an adaptive temporal encoder, which automatically adjusts the parameters based on the HOI information of various videos on the fly. Extensive experiments on two widely-used datasets including Breakfast and 50Salads demonstrate the effectiveness of our method under different evaluation metrics.

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Dev.to

Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD

Dev.to

Function Calling Harness 2: CoT Compliance from 9.91% to 100%

Dev.to

What Anthropic's April 23 Postmortem Reveals About Your Agent Harness

Dev.to

Fine-tuning YOLOv11 to detect stamps and signatures on banking documents - a practical walkthrough

Dev.to

HOI-aware Adaptive Network for Weakly-supervised Action Segmentation

Key Points

Abstract

Related Articles

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD

Function Calling Harness 2: CoT Compliance from 9.91% to 100%

What Anthropic's April 23 Postmortem Reveals About Your Agent Harness

Fine-tuning YOLOv11 to detect stamps and signatures on banking documents - a practical walkthrough

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer