EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection
arXiv stat.ML / 5/1/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- Imbalanced fraud datasets often lead to biased classifiers, so the paper introduces EmDT to generate synthetic fraudulent transactions as a mitigation strategy.
- EmDT uses UMAP clustering to identify distinct fraudulent patterns, then trains a diffusion model with a Transformer denoising network using sinusoidal positional embeddings to learn feature relationships during generation.
- After generating synthetic samples, the method applies a standard tabular-friendly decision-tree classifier (such as XGBoost) for the final fraud prediction task.
- Experiments on a credit card fraud dataset show that EmDT improves downstream classification performance over prior oversampling and generative approaches while keeping privacy protection comparable and preserving original feature correlations.
Related Articles
Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...
Dev.to
I deployed AI agents across AWS, GCP, and Azure without a VPN. Here is how it works.
Dev.to
Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia
Dev.to
AI made learning fun again
Dev.to
MCP, Skills, AI Agents, and New Models: The New Stack for Software Development
Dev.to