AI Navigate

Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis

arXiv cs.AI / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study presents a reproducible GAN-based simulation framework that links crime occurrence to police contact to quantify racial bias in predictive policing.
  • It uses 145,000+ Baltimore Part 1 crime records (2017-2019) and 233,000+ Chicago records (2022), augmented with US Census demographics, to compute four monthly bias metrics across 264 city-year observations (DIR, Demographic Parity Gap, Gini, and a Bias Amplification Score).
  • Results reveal strong, year-variant bias, with Baltimore showing extreme mean annual DIR up to 15714 in 2019 and Chicago showing under-detection of Black residents (DIR ≈ 0.22) alongside persistent Gini coefficients (0.43–0.62).
  • A CTGAN-based debiasing approach partially redistributes detection rates but cannot eliminate structural disparities without accompanying policy interventions.
  • The analysis finds strong correlations between neighborhood racial composition and detection likelihood (Pearson r ≈ 0.83 for percent White; r ≈ −0.81 for percent Black) and shows outcomes are most sensitive to officer deployment levels; code and data are publicly available.

Abstract

Predictive policing systems that direct patrol resources based on algorithmically generated crime forecasts have been widely deployed across US cities, yet their tendency to encode and amplify racial disparities remains poorly understood in quantitative terms. We present a reproducible simulation framework that couples a Generative Adversarial Network GAN with a Noisy OR patrol detection model to measure how racial bias propagates through the full enforcement pipeline from crime occurrence to police contact. Using 145000 plus Part 1 crime records from Baltimore 2017 to 2019 and 233000 plus records from Chicago 2022, augmented with US Census ACS demographic data, we compute four monthly bias metrics across 264 city year mode observations: the Disparate Impact Ratio DIR, Demographic Parity Gap, Gini Coefficient, and a composite Bias Amplification Score. Our experiments reveal extreme and year variant bias in Baltimores detected mode, with mean annual DIR up to 15714 in 2019, moderate under detection of Black residents in Chicago DIR equals 0.22, and persistent Gini coefficients of 0.43 to 0.62 across all conditions. We further demonstrate that a Conditional Tabular GAN CTGAN debiasing approach partially redistributes detection rates but cannot eliminate structural disparity without accompanying policy intervention. Socioeconomic regression analysis confirms strong correlations between neighborhood racial composition and detection likelihood Pearson r equals 0.83 for percent White and r equals negative 0.81 for percent Black. A sensitivity analysis over patrol radius, officer count, and citizen reporting probability reveals that outcomes are most sensitive to officer deployment levels. The code and data are publicly available at this repository.