Bridging the Micro--Macro Gap: Frequency-Aware Semantic Alignment for Image Manipulation Localization
arXiv cs.CV / 4/15/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces FASA (Frequency-Aware Semantic Alignment), a unified framework to localize both traditional image manipulations and diffusion-generated edits that look locally realistic.
- It bridges the “micro–macro gap” by combining manipulation-sensitive frequency cues (via an adaptive dual-band DCT module) with manipulation-aware semantic priors (learned through patch-level contrastive alignment on frozen CLIP features).
- FASA injects semantic priors into a hierarchical frequency pathway using a semantic-frequency side adapter to enable multi-scale feature interactions.
- A prototype-guided, frequency-gated mask decoder integrates semantic consistency with boundary-aware localization to predict tampered regions more accurately.
- Experiments on OpenSDI and several traditional manipulation benchmarks show state-of-the-art results, strong cross-generator/cross-dataset generalization, and robustness under common image degradations.
Related Articles

Black Hat Asia
AI Business
Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]
Reddit r/MachineLearning

I built a trading intelligence MCP server in 2 days — here's how
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s
Reddit r/LocalLLaMA