All Eyes on the Workflow: Automated and Efficient Event Discovery from Video Streams

arXiv cs.LG / 4/27/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper proposes SnapLog, a method for extracting event logs from video streams to support process mining and business process management.
  • SnapLog converts video frames into feature vectors via image embeddings, then uses temporal segmentation based on frame-wise similarity to identify event-relevant sub-sequences.
  • It applies a generalized few-shot classification to label the segmented video parts, producing timestamped, event-interpretable logs.
  • The authors report that the resulting logs accurately reflect the underlying processes shown in the videos and can be analyzed with conventional process mining techniques.

Abstract

Disciplines such as business process management and process mining aid organizations by discovering insights about processes on the basis of recorded event data. However, an obstacle to process analysis is data multi-modality: for instance, data in video form are not directly interpretable as events. In this work, we present SnapLog, an approach to extract event data from videos by converting frames to feature vectors using image embeddings and performing temporal segmentation through frame-wise similarity matrices. A generalized few-shot classification is then used to assign labels to the video segments, yielding labeled, timestamped sub-sequences of frames that are interpretable as events. Conventional process mining techniques can be used to analyze the resulting data. We show that our approach produces logs that accurately reflect the process in the videos.