From Consensus to Split Decisions: ABC-Stratified Sentiment in Holocaust Oral Histories

arXiv cs.CL / 4/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how off-the-shelf sentiment (polarity) classifiers degrade under domain shift in long-form, heterogeneous Holocaust oral history narratives.
  • It evaluates three pretrained transformer-based polarity classifiers on a large corpus of 107,305 utterances and 579,013 sentences, quantifying agreement and disagreement patterns.
  • The authors introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output stability and use agreement metrics plus confusion matrices to pinpoint systematic divergences.
  • A T5-based emotion classifier is applied to stratified samples to compare emotion distributions across agreement strata, using label triangulation as an auxiliary descriptive signal.
  • Overall inter-model agreement is low to moderate, with disagreements concentrated mainly in boundary decisions around neutrality, motivating a cautious framework for sensitive historical text analysis.

Abstract

Polarity detection becomes substantially more challenging under domain shift, particularly in heterogeneous, long-form narratives with complex discourse structure, such as Holocaust oral histories. This paper presents a corpus-scale diagnostic study of off-the-shelf sentiment classifiers on long-form Holocaust oral histories, using three pretrained transformer-based polarity classifiers on a corpus of 107,305 utterances and 579,013 sentences. After assembling model outputs, we introduce an agreement-based stability taxonomy (ABC) to stratify inter-model output stability. We report pairwise percent agreement, Cohen kappa, Fleiss kappa, and row-normalized confusion matrices to localize systematic disagreement. As an auxiliary descriptive signal, a T5-based emotion classifier is applied to stratified samples from each agreement stratum to compare emotion distributions across strata. The combination of multi-model label triangulation and the ABC taxonomy provides a cautious, operational framework for characterizing where and how sentiment models diverge in sensitive historical narratives. Inter-model agreement is low to moderate overall and is driven primarily by boundary decisions around neutrality.

From Consensus to Split Decisions: ABC-Stratified Sentiment in Holocaust Oral Histories | AI Navigate