AI Navigate

Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents

arXiv cs.LG / 3/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Knowledge-Guided TSED where a model grounds natural-language event descriptions to intervals in multivariate signals with little or no labeled data.
  • It proposes Event Logic Tree (ELT) to connect linguistic descriptions with time-series via modeling the intrinsic temporal-logic structures of events.
  • It presents a neuro-symbolic VLM agent framework that instantiates primitives from signal visualizations and composes them under ELT constraints, producing detected intervals and explanations.
  • It releases a benchmark based on real-world time series data with expert knowledge and annotations, and experiments show the method outperforms supervised fine-tuning baselines and zero-shot LLM/VLM approaches, while mitigating VLM hallucination.

Abstract

Time Series Event Detection (TSED) has long been an important task with critical applications across many high-stakes domains. Unlike statistical anomalies, events are defined by semantics with complex internal structures, which are difficult to learn inductively from scarce labeled data in real-world settings. In light of this, we introduce Knowledge-Guided TSED, a new setting where a model is given a natural-language event description and must ground it to intervals in multivariate signals with little or no training data. To tackle this challenge, we introduce Event Logic Tree (ELT), a novel knowledge representation framework to bridge linguistic descriptions and physical time series data via modeling the intrinsic temporal-logic structures of events. Based on ELT, we present a neuro-symbolic VLM agent framework that iteratively instantiates primitives from signal visualizations and composes them under ELT constraints, producing both detected intervals and faithful explanations in the form of instantiated trees. To validate the effectiveness of our approach, we release a benchmark based on real-world time series data with expert knowledge and annotations. Experiments and human evaluation demonstrate the superiority of our method compared to supervised fine-tuning baselines and existing zero-shot time series reasoning frameworks based on LLMs/VLMs. We also show that ELT is critical in mitigating VLMs' inherent hallucination in matching signal morphology with event semantics.