SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces

arXiv cs.AI / 4/17/2026

📰 NewsModels & Research

Key Points

  • SynHAT is a new two-stage coarse-to-fine diffusion framework designed to synthesize realistic, privacy-preserving human activity traces (HATs) for applications like mobility modeling and POI recommendation.
  • It addresses HAT irregularity and dynamic time gaps by using a spatio-temporal denoising diffusion model with a Latent Spatio-Temporal U-Net featuring dual Drift-Jitter branches to capture both smooth spatial transitions and temporal variations.
  • Stage 1 (Coarse-HADiff) learns coarse-grained spatio-temporal dependencies, while Stage 2 refines outputs via a three-step pipeline: Behavior Pattern Extraction, Fine-HADiff (same architecture), and Semantic Alignment to generate fine-grained latent traces.
  • Extensive evaluations on multi-city real-world datasets show SynHAT significantly improves over prior baselines, including 52% gains on spatial metrics and 33% gains on temporal metrics, while considering fidelity, utility, privacy, robustness, and scalability.

Abstract

Human activity traces (HATs) are critical for many applications, including human mobility modeling and point-of-interest (POI) recommendation. However, growing privacy concerns have severely limited access to authentic large-scale HAT datasets. Recent advances in generative AI provide new opportunities to synthesize realistic and privacy-preserving HATs for such applications. Yet two major challenges remain: (i) HATs are highly irregular and dynamic, with long and varying time intervals, making it difficult to capture their complex spatio-temporal dependencies and underlying distributions; and (ii) generative models are often computationally expensive, making long-term, fine-grained HAT synthesis inefficient. To address these challenges, we propose SynHAT, a computationally efficient coarse-to-fine HAT synthesis framework built on a novel spatio-temporal denoising diffusion model. In Stage 1, we develop Coarse-HADiff, which models the overall spatio-temporal dependencies of coarse-grained latent spatio-temporal traces. It incorporates a novel Latent Spatio-Temporal U-Net with dual Drift-Jitter branches to jointly model smooth spatial transitions and temporal variations during denoising. In Stage 2, we introduce a three-step pipeline consisting of Behavior Pattern Extraction, Fine-HADiff, which shares the same architecture as Coarse-HADiff, and Semantic Alignment to generate fine-grained latent spatio-temporal traces from the Stage 1 outputs. We extensively evaluate SynHAT in terms of data fidelity, utility, privacy, robustness, and scalability. Experiments on real-world HAT datasets from four cities across three countries show that SynHAT substantially outperforms state-of-the-art baselines, achieving 52% and 33% improvements on spatial and temporal metrics, respectively.