Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training
arXiv cs.CV / 4/30/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses a core limitation of adversarial training: the trade-off between clean accuracy and adversarial robustness in deep neural networks.
- It reports a new observation that changing perturbation intensities for training samples near decision boundaries has minimal effect on robustness, pointing to a mismatch between input and latent spaces as the key cause.
- To reduce this mismatch, the authors introduce “Robust Alignment,” a training objective that encourages the model’s perception to change under input perturbations while keeping the final prediction label the same.
- They propose two techniques to realize Robust Alignment: using reduced and fixed perturbation intensity for boundary samples, and DICAR (Domain Interpolation Consistency Adversarial Regularization) to enforce semantic alignment between input and latent representations.
- The resulting RAAT method improves the accuracy–robustness trade-off on CIFAR-10, CIFAR-100, and Tiny-ImageNet across multiple ResNet variants, outperforming four common baselines and matching or surpassing many prior SOTA approaches.
Related Articles
Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]
Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison
Dev.to

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry
Dev.to

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance
Dev.to

Vibe coding is a tool, not a shortcut. Most people are using it wrong.
Dev.to