Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns: A Benchmark on Temporal, Velocity, and Multi-Account Signals

arXiv cs.LG / 4/16/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper proposes “behavioral fidelity” as a new evaluation dimension for synthetic tabular data, focusing on whether generators preserve temporal, sequential, and structural fraud signals used in real detection systems.
It defines four behavioral fraud pattern types (P1–P4) including inter-event timing, burst structure, multi-account graph motifs, and velocity-rule trigger rates, along with a degradation-ratio metric calibrated to a real-data noise floor.
The authors prove that row-independent synthetic generators cannot reproduce multi-account graph motifs (P3) and yield non-positive within-entity inter-event-time autocorrelation, implying core burst/fraud fingerprints are unattainable regardless of model architecture or data size.
Benchmarks on IEEE-CIS Fraud Detection and the Amazon Fraud Dataset show multiple popular generators (CTGAN, TVAE, GaussianCopula, TabularARGN) fail badly, with degradation ratios up to ~39x on IEEE-CIS and 81.6–99.7x for row-independent methods on Amazon, while TabularARGN performs better (17.2x) but still degrades substantially.
The work releases an open-source evaluation framework and claims the P1–P4 behavioral-pattern framework generalizes to other domains with entity-level sequential tabular data (e.g., healthcare and network security).

Abstract

We introduce behavioral fidelity -- a third evaluation dimension for synthetic tabular data that measures whether generated data preserves the temporal, sequential, and structural behavioral patterns that distinguish real-world entity activity. Existing frameworks evaluate statistical fidelity (marginal distributions and correlations) and downstream utility (classifier AUROC on synthetic-trained models), but neither tests for the behavioral signals that operational detection and analysis systems actually rely on. We formalize a taxonomy of four behavioral fraud patterns (P1-P4) covering inter-event timing, burst structure, multi-account graph motifs, and velocity-rule trigger rates; define a degradation ratio metric calibrated to a real-data noise floor (1.0 = matches real variability, k = k-times worse); and prove that row-independent generators -- the dominant paradigm -- are structurally incapable of reproducing P3 graph motifs (Proposition 1) and produce non-positive within-entity IET autocorrelation (Proposition 2), making the positive burst fingerprint of fraud sequences unachievable regardless of architecture or training data size. We benchmark CTGAN, TVAE, GaussianCopula, and TabularARGN on IEEE-CIS Fraud Detection and the Amazon Fraud Dataset. All four fail severely: on IEEE-CIS composite degradation ratios range from 24.4x (TVAE) to 39.0x (GaussianCopula); on Amazon FDB, row-independent generators score 81.6-99.7x, while TabularARGN achieves 17.2x. We document generator-specific failure modes and their resolutions. The P1-P4 framework extends to any domain with entity-level sequential tabular data, including healthcare and network security. We release our evaluation framework as open source.

Black Hat USA

AI Business

Black Hat Asia

AI Business

The AI Hype Cycle Is Lying to You About What to Learn

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

OpenAI Codex April 2026 Update Review: Computer Use, Memory & 90+ Plugins — Is the Hype Real?

Dev.to

Synthetic Tabular Generators Fail to Preserve Behavioral Fraud Patterns: A Benchmark on Temporal, Velocity, and Multi-Account Signals

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

The AI Hype Cycle Is Lying to You About What to Learn

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

OpenAI Codex April 2026 Update Review: Computer Use, Memory & 90+ Plugins — Is the Hype Real?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer