AI Navigate

Learning to Predict, Discover, and Reason in High-Dimensional Discrete Event Sequences

arXiv cs.AI / 3/18/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The work reframes vehicle diagnostic sequences as a high-dimensional language and proposes Transformer-based architectures for predictive maintenance.
  • It unifies event sequence modeling, causal discovery, and large language models to scale to high-cardinality, long-sequence automotive data.
  • It introduces scalable causal discovery frameworks and a multi-agent system to automate the synthesis of Boolean error-pattern rules.
  • It emphasizes that tens of thousands of unique DTCs create a vocabulary-scale challenge akin to natural language.
  • It presents a three-part progression from prediction to causal understanding to reasoning in vehicle diagnostics, with implications for safety-critical systems.

Abstract

Electronic control units (ECUs) embedded within modern vehicles generate a large number of asynchronous events known as diagnostic trouble codes (DTCs). These discrete events form complex temporal sequences that reflect the evolving health of the vehicle's subsystems. In the automotive industry, domain experts manually group these codes into higher-level error patterns (EPs) using Boolean rules to characterize system faults and ensure safety. However, as vehicle complexity grows, this manual process becomes increasingly costly, error-prone, and difficult to scale. Notably, the number of unique DTCs in a modern vehicle is on the same order of magnitude as the vocabulary of a natural language, often numbering in the tens of thousands. This observation motivates a paradigm shift: treating diagnostic sequences as a language that can be modeled, predicted, and ultimately explained. Traditional statistical approaches fail to capture the rich dependencies and do not scale to high-dimensional datasets characterized by thousands of nodes, large sample sizes, and long sequence lengths. Specifically, the high cardinality of categorical event spaces in industrial logs poses a significant challenge, necessitating new machine learning architectures tailored to such event-driven systems. This thesis addresses automated fault diagnostics by unifying event sequence modeling, causal discovery, and large language models (LLMs) into a coherent framework for high-dimensional event streams. It is structured in three parts, reflecting a progressive transition from prediction to causal understanding and finally to reasoning for vehicle diagnostics. Consequently, we introduce several Transformer-based architectures for predictive maintenance, scalable sample- and population-level causal discovery frameworks and a multi-agent system that automates the synthesis of Boolean EP rules.