AGRI-Fidelity: Evaluating the Reliability of Listenable Explanations for Poultry Disease Detection

arXiv cs.LG / 3/20/2026

📰 NewsModels & Research

共有:

Key Points

The paper identifies a key limitation of existing XAI metrics: they measure faithfulness for a single model and ignore model multiplicity, which can lead to unreliable explanations in noisy farm environments.
It introduces AGRI-Fidelity, a reliability-oriented framework for listenable explanations in poultry disease detection that does not require spatial ground truth.
The method combines cross-model consensus with cyclic temporal permutation to build null distributions and compute a false discovery rate, aimed at suppressing stationary artifacts while preserving time-localized bioacoustic markers.
Empirical results on real and controlled datasets show AGRI-Fidelity provides reliability-aware discrimination for data points beyond what masking-based metrics achieve.

Abstract

Existing XAI metrics measure faithfulness for a single model, ignoring model multiplicity where near-optimal classifiers rely on different or spurious acoustic cues. In noisy farm environments, stationary artifacts such as ventilation noise can produce explanations that are faithful yet unreliable, as masking-based metrics fail to penalize redundant shortcuts. We propose AGRI-Fidelity, a reliability-oriented evaluation framework for listenable explanations in poultry disease detection without spatial ground truth. The method combines cross-model consensus with cyclic temporal permutation to construct null distributions and compute a False Discovery Rate (FDR), suppressing stationary artifacts while preserving time-localized bioacoustic markers. Across real and controlled datasets, AGRI-Fidelity effectively provides reliability-aware discrimination for all data points versus masking-based metrics.

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Dev.to

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

Reddit r/MachineLearning

Mistral Small 4 vs Qwen3.5-9B on document understanding benchmarks, but it does better than GPT-4.1

Reddit r/LocalLLaMA

Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it

Reddit r/LocalLLaMA

Ooh, new drama just dropped 👀

Reddit r/LocalLLaMA

AGRI-Fidelity: Evaluating the Reliability of Listenable Explanations for Poultry Disease Detection

Key Points

Abstract

Related Articles

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

Mistral Small 4 vs Qwen3.5-9B on document understanding benchmarks, but it does better than GPT-4.1

Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it

Ooh, new drama just dropped 👀

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer