GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees

arXiv cs.LG / 4/15/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces GF-Score, a framework that produces certified, class-conditional robustness profiles instead of a single aggregate score to expose how robustness varies across classes.
It defines four fairness/welfare-economics-grounded metrics (RDI, NRGC, WCR, and FP-GREAT) to quantify disparity and worst-case class performance under certified robustness guarantees.
GF-Score also removes reliance on adversarial attacks via a self-calibration procedure that tunes a temperature parameter using only clean accuracy correlations.
Experiments on 22 RobustBench models for CIFAR-10 and ImageNet show that the decomposition matches the original method exactly and highlight consistent vulnerability patterns, such as “cat” being weakest in most CIFAR-10 models.
The authors provide an attack-free auditing pipeline for diagnosing where certified robustness fails to protect classes evenly and release code on GitHub.

Abstract

Adversarial robustness is essential for deploying neural networks in safety-critical applications, yet standard evaluation methods either require expensive adversarial attacks or report only a single aggregate score that obscures how robustness is distributed across classes. We introduce the \emph{GF-Score} (GREAT-Fairness Score), a framework that decomposes the certified GREAT Score into per-class robustness profiles and quantifies their disparity through four metrics grounded in welfare economics: the Robustness Disparity Index (RDI), the Normalized Robustness Gini Coefficient (NRGC), Worst-Case Class Robustness (WCR), and a Fairness-Penalized GREAT Score (FP-GREAT). The framework further eliminates the original method's dependence on adversarial attacks through a self-calibration procedure that tunes the temperature parameter using only clean accuracy correlations. Evaluating 22 models from RobustBench across CIFAR-10 and ImageNet, we find that the decomposition is exact, that per-class scores reveal consistent vulnerability patterns (e.g., ``cat'' is the weakest class in 76\% of CIFAR-10 models), and that more robust models tend to exhibit greater class-level disparity. These results establish a practical, attack-free auditing pipeline for diagnosing where certified robustness guarantees fail to protect all classes equally. We release our code on \href{https://github.com/aryashah2k/gf-score}{GitHub}.