IMPACT-Scribe: Interactive Temporal Action Segmentation with Boundary Scribbles and Query Planning

arXiv cs.CV / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • IMPACT-Scribe is a correction-driven framework for dense temporal action segmentation in procedural activity videos, aiming to reduce the labor cost of annotation.
  • Instead of treating each correction as an isolated edit, it leverages corrections to improve future human–machine collaboration while using annotator uncertainty and model reliability.
  • The method combines uncertainty-aware boundary scribble supervision, local proposal modeling, cost-aware query planning, and structured propagation to guide labeling efficiently.
  • Experiments and a human study indicate the closed-loop approach improves labeling quality per unit effort and enhances boundary accuracy over time.
  • The authors plan to publicly release the code to support adoption and further research (GitHub link provided).

Abstract

Dense temporal annotation of procedural activity videos is vital for action understanding and embodied intelligence but remains labor-intensive due to reactive tools. Each correction is treated as an isolated edit, limiting reuse of information on annotator uncertainty and model reliability. We introduce IMPACT-Scribe, a correction-driven framework for dense labeling that uses each correction to improve future human-machine collaboration. IMPACT-Scribe combines uncertainty-aware boundary scribble supervision, local proposal modeling, cost-aware query planning, structured propagation, and correction-driven adaptation. Experiments and a human study show that this closed-loop design improves labeling quality per effort, enhances boundary accuracy, and fosters better human-machine interaction over time. The code will be made publicly available at https://github.com/BanzQians/IMPACT_AS.