Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection
arXiv cs.CV / 5/1/2026
📰 NewsModels & Research
Key Points
- The paper tackles the difficulty of generalizing AI-generated image detectors when models face unseen generators, where existing fusion of semantics and frequency artifacts degrades significantly.
- It identifies two main causes of poor generalization: a “frequency shortcut bias” toward generator-easy cues and a cross-domain conflict between high-level semantic features and low-level frequency patterns.
- The proposed Frequency-aware Gated Injection Network (FGINet) improves robustness by using a Band-Masked Frequency Encoder (BMFE) to mask frequency bands and reduce dependence on generator-specific artifacts.
- FGINet also introduces Layer-wise Gated Frequency Injection (LGFI) to inject frequency cues progressively into a vision foundation model via adaptive gating, easing representation conflicts at different abstraction levels.
- A Hyperspherical Compactness Learning (HCL) training objective encourages compact, well-separated embeddings, and experiments report state-of-the-art results with strong generalization across multiple datasets.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER

Qualcomm teases ‘dedicated CPU for agentic experiences’ and ‘agentic smartphones’
The Register
Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats
Reddit r/LocalLLaMA
![Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]](/_next/image?url=https%3A%2F%2Fpreview.redd.it%2Fvutakjb0vgyg1.png%3Fwidth%3D140%26height%3D59%26auto%3Dwebp%26s%3D08ecb95fd65ade25c924988f1992e9abe3d79f62&w=3840&q=75)
Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]
Reddit r/MachineLearning