SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning

arXiv cs.CV / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SPARK-IL, a knowledge-driven deepfake detector that targets poor cross-generator generalization by using more consistent frequency-domain (spectral) signatures rather than pixel-level artifacts.
  • SPARK-IL combines dual-path spectral analysis (semantic features via a partially frozen ViT-L/14 and raw RGB embeddings) with multi-band Fourier decomposition, then applies Kolmogorov-Arnold Networks with mixture-of-experts for band-specific transformations.
  • During inference, fused spectral embeddings retrieve the k nearest labeled signatures from a Milvus vector database (cosine similarity) and produce predictions via majority voting to leverage stored “knowledge” about known generators.
  • The framework uses incremental learning to expand the labeled signature database over time while applying elastic weight consolidation to reduce catastrophic forgetting of previously learned transformations.
  • On the UniversalFakeDetect benchmark covering 19 generative model families, SPARK-IL reports 94.6% mean accuracy, and the authors plan to publicly release code.

Abstract

Detecting AI-generated images remains a significant challenge because detectors trained on specific generators often fail to generalize to unseen models; however, while pixel-level artifacts vary across models, frequency-domain signatures exhibit greater consistency, providing a promising foundation for cross-generator detection. To address this, we propose SPARK-IL, a retrieval-augmented framework that combines dual-path spectral analysis with incremental learning by utilizing a partially frozen ViT-L/14 encoder for semantic representations alongside a parallel path for raw RGB pixel embeddings. Both paths undergo multi-band Fourier decomposition into four frequency bands, which are individually processed by Kolmogorov-Arnold Networks (KAN) with mixture-of-experts for band-specific transformations before the resulting spectral embeddings are fused via cross-attention with residual connections. During inference, this fused embedding retrieves the k nearest labeled signatures from a Milvus database using cosine similarity to facilitate predictions via majority voting, while an incremental learning strategy expands the database and employs elastic weight consolidation to preserve previously learned transformations. Evaluated on the UniversalFakeDetect benchmark across 19 generative models -- including GANs, face-swapping, and diffusion methods -- SPARK-IL achieves a 94.6\% mean accuracy, with the code to be publicly released at https://github.com/HessenUPHF/SPARK-IL.

SPARK-IL: Spectral Retrieval-Augmented RAG for Knowledge-driven Deepfake Detection via Incremental Learning | AI Navigate