GazeCLIP: Gaze-Guided CLIP with Adaptive-Enhanced Fine-Grained Language Prompt for Deepfake Attribution and Detection
arXiv cs.CV / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces GazeCLIP, a gaze-guided CLIP framework with adaptive-enhanced fine-grained language prompting aimed at improving deepfake attribution and detection generalization to novel generative methods.
- It proposes a new fine-grained benchmark to evaluate DFAD performance against unseen advanced generators, including diffusion and flow-based models.
- GazeCLIP leverages observed distribution differences between pristine and forged gaze vectors, using a gaze-aware image encoder (GIE) to mine global forgery embeddings across both appearance and gaze domains for a more stable shared feature space.
- A language refinement encoder (LRE) adaptively enhances language embeddings with a fine-grained word selector to improve precise vision-language matching.
- Experimental results on the benchmark report average gains over state of the art of 6.56% accuracy and 5.32% AUC for attribution and detection, with code planned for GitHub.
Related Articles

Black Hat Asia
AI Business

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register

Paperclip: Công Cụ Miễn Phí Biến AI Thành Đội Phát Triển Phần Mềm
Dev.to
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA