Product-of-Experts Training Reduces Dataset Artifacts in Natural Language Inference

arXiv cs.CL / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Neural natural language inference (NLI) models can overfit superficial dataset artifacts rather than perform genuine reasoning, supported by a strong “hypothesis-only” baseline result on SNLI.
The paper estimates that 38.6% of baseline errors come from these artifacts, indicating substantial spurious correlations in common NLI benchmarks.
It proposes Product-of-Experts (PoE) training, which reduces the influence of examples where biased models become overconfident.
PoE maintains nearly the same overall accuracy (89.10% vs. 89.30%) while lowering bias reliance by 4.71%, with an ablation study finding lambda = 1.5 as the best trade-off.
Even with debiasing, behavioral evaluations show remaining weaknesses in negation handling and numerical reasoning.

Abstract

Neural NLI models overfit dataset artifacts instead of truly reasoning. A hypothesis-only model gets 57.7% in SNLI, showing strong spurious correlations, and 38.6% of the baseline errors are the result of these artifacts. We propose Product-of-Experts (PoE) training, which downweights examples where biased models are overconfident. PoE nearly preserves accuracy (89.10% vs. 89.30%) while cutting bias reliance by 4.71% (bias agreement 49.85% to 45%). An ablation finds lambda = 1.5 that best balances debiasing and accuracy. Behavioral tests still reveal issues with negation and numerical reasoning.