Extending Tabular Denoising Diffusion Probabilistic Models for Time-Series Data Generation

arXiv cs.LG / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a temporal extension of TabDDPM to generate synthetic time-series data while preserving the temporal dependencies that standard tabular diffusion models cannot model.
  • It adds sequence awareness using lightweight temporal adapters and context-aware embedding modules, including timestep embeddings, conditional activity labels, and observed/missing masks.
  • The method reforms sensor inputs into windowed sequences and explicitly models temporal context to produce more temporally coherent synthetic sequences.
  • Experiments using bigram transition matrices and autocorrelation analysis show improved temporal realism, diversity, and coherence versus baseline and interpolation approaches.
  • On the WISDM accelerometer dataset, the generated sequences achieve competitive downstream classification performance (macro F1 0.64, accuracy 0.71), supporting minority-class augmentation and statistical alignment with real data distributions.

Abstract

Diffusion models are increasingly being utilised to create synthetic tabular and time series data for privacy-preserving augmentation. Tabular Denoising Diffusion Probabilistic Models (TabDDPM) generate high-quality synthetic data from heterogeneous tabular datasets but assume independence between samples, limiting their applicability to time-series domains where temporal dependencies are critical. To address this, we propose a temporal extension of TabDDPM, introducing sequence awareness through the use of lightweight temporal adapters and context-aware embedding modules. By reformulating sensor data into windowed sequences and explicitly modeling temporal context via timestep embeddings, conditional activity labels, and observed/missing masks, our approach enables the generation of temporally coherent synthetic sequences. Compared to baseline and interpolation techniques, validation using bigram transition matrices and autocorrelation analysis shows enhanced temporal realism, diversity, and coherence. On the WISDM accelerometer dataset, the suggested system produces synthetic time-series that closely resemble real world sensor patterns and achieves comparable classification performance (macro F1-score 0.64, accuracy 0.71). This is especially advantageous for minority class representation and preserving statistical alignment with real distributions. These developments demonstrate that diffusion based models provide effective and adaptable solutions for sequential data synthesis when they are equipped for temporal reasoning. Future work will explore scaling to longer sequences and integrating stronger temporal architectures.