The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models
arXiv cs.CV / 3/31/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that robust fine-tuning of vision-language models fails to balance ID accuracy, OOD generalization, and adversarial robustness because of two geometric issues: sharp/anistropic minima and perturbation-sensitive feature representations.
- It introduces GRACE (Gram-aligned Robustness via Adaptive Curvature Estimation), a unified fine-tuning framework that regularizes parameter-space curvature to encourage flatter minima while enforcing feature-space invariance across clean, adversarial, and OOD inputs.
- GRACE uses adaptive weight perturbations scaled by locally estimated curvature and combines this with a feature alignment loss, motivated by Robust PAC-Bayes theory.
- Experiments on ImageNet fine-tuning of CLIP show simultaneous gains: +10.8% ID accuracy and +13.5% adversarial accuracy, with OOD accuracy staying essentially unchanged (57.0% vs 57.4% zero-shot baseline).
- Additional geometric analysis claims GRACE converges to flatter minima and avoids feature distortion under distribution shifts, aiming for generalized robustness in foundation VLMs.
Related Articles
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK
Dev.to

Your Knowledge, Your Model: A Method for Deterministic Knowledge Externalization
Dev.to