Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

arXiv cs.CV / 5/1/2026

📰 NewsModels & Research

共有:

Key Points

The paper tackles the difficulty of generalizing AI-generated image detectors when models face unseen generators, where existing fusion of semantics and frequency artifacts degrades significantly.
It identifies two main causes of poor generalization: a “frequency shortcut bias” toward generator-easy cues and a cross-domain conflict between high-level semantic features and low-level frequency patterns.
The proposed Frequency-aware Gated Injection Network (FGINet) improves robustness by using a Band-Masked Frequency Encoder (BMFE) to mask frequency bands and reduce dependence on generator-specific artifacts.
FGINet also introduces Layer-wise Gated Frequency Injection (LGFI) to inject frequency cues progressively into a vision foundation model via adaptive gating, easing representation conflicts at different abstraction levels.
A Hyperspherical Compactness Learning (HCL) training objective encourages compact, well-separated embeddings, and experiments report state-of-the-art results with strong generalization across multiple datasets.

Abstract

AI-generated images are becoming increasingly realistic and diverse, posing significant challenges for generalizable detection. While Vision Foundation Models (VFMs) provide rich semantic representations and frequency-based methods capture complementary artifact cues, existing approaches that combine these modalities still suffer from limited generalization, with notable performance degradation on unseen generative models. We attribute this limitation to two key factors: frequency shortcut bias toward easily distinguishable cues associated with specific generators and cross-domain representation conflict between high-level semantics and low-level frequency patterns. To address these issues, we propose a Frequency-aware Gated Injection Network (FGINet) to improve generalization. Specifically, we design a Band-Masked Frequency Encoder (BMFE) that applies cross-band masking in the frequency domain to reduce reliance on generator-specific patterns and encourage more diverse and generalizable representations. We further introduce a Layer-wise Gated Frequency Injection (LGFI) mechanism to progressively inject frequency cues into the VFM backbone with adaptive gating, aligning with its hierarchical abstraction and alleviating representation conflict. Moreover, we propose a Hyperspherical Compactness Learning (HCL) framework with a cosine margin objective to learn compact and well-separated representations. Extensive experiments demonstrate that FGINet achieves state-of-the-art performance and strong generalization across multiple challenging datasets.

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

THE DECODER

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

The Register

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Reddit r/LocalLLaMA

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

Reddit r/MachineLearning

Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

Key Points

Abstract

Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer