Ensembles-based Feature Guided Analysis

arXiv cs.LG / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • EFGA extends Feature Guided Analysis by ensembling multiple rules to improve coverage (recall) while aiming to preserve precision in explanations of DNN behavior.
  • The approach introduces an aggregation policy with three different aggregation criteria to form ensembles from FGA rules.
  • In experiments on MNIST and LSC, EFGA achieves higher train recall (+28.51% on MNIST, +33.15% on LSC) and higher test recall (+25.76% on MNIST, +30.81% on LSC) with only a small reduction in precision (-0.89% on MNIST, -0.69% on LSC).
  • The framework is extensible, allowing new aggregation criteria to be added and selected to balance precision and recall for various applications.

Abstract

Recent Deep Neural Networks (DNN) applications ask for techniques that can explain their behavior. Existing solutions, such as Feature Guided Analysis (FGA), extract rules on their internal behaviors, e.g., by providing explanations related to neurons activation. Results from the literature show that these rules have considerable precision (i.e., they correctly predict certain classes of features), but the recall (i.e., the number of situations these rule apply) is more limited. To mitigate this problem, this paper presents Ensembles-based Feature Guided Analysis (EFGA). EFGA combines rules extracted by FGA into ensembles. Ensembles aggregate different rules to increase their applicability depending on an aggregation criterion, a policy that dictates how to combine rules into ensembles. Although our solution is extensible, and different aggregation criteria can be developed by users, in this work, we considered three different aggregation criteria. We evaluated how the choice of the criterion influences the effectiveness of EFGA on two benchmarks (i.e., the MNIST and LSC datasets), and found that different aggregation criteria offer alternative trade-offs between precision and recall. We then compare EFGA with FGA. For this experiment, we selected an aggregation criterion that provides a reasonable trade-off between precision and recall. Our results show that EFGA has higher train recall (+28.51% on MNIST, +33.15% on LSC), and test recall (+25.76% on MNIST, +30.81% on LSC) than FGA, with a negligible reduction on the test precision (-0.89% on MNIST, -0.69% on LSC).