Sentiment Classification of Gaza War Headlines: A Comparative Analysis of Large Language Models and Arabic Fine-Tuned BERT Models

arXiv cs.CL / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes how different AI architectures interpret sentiment in conflict-related Arabic news headlines using the 2023 Gaza War as a case study.
It compares three large language models with six fine-tuned Arabic BERT variants on a corpus of 10,990 headlines, using information-theoretic and distributional metrics rather than a single accuracy benchmark.
Fine-tuned BERT models (especially MARBERT) show a strong tendency to classify sentiment as neutral, while LLMs consistently amplify negative sentiment.
The study finds that LLaMA-3.1-8B exhibits a near-total collapse toward negativity in the evaluated setting.
Frame-conditioned tests indicate GPT-4.1 can adjust sentiment judgments based on narrative frames (humanitarian/legal/security), whereas other LLMs show limited contextual modulation, implying model choice acts as an “interpretive lens.”

Abstract

This study examines how different artificial intelligence architectures interpret sentiment in conflict-related media discourse, using the 2023 Gaza War as a case study. Drawing on a corpus of 10,990 Arabic news headlines (Eleraqi 2026), the research conducts a comparative analysis between three large language models and six fine-tuned Arabic BERT models. Rather than evaluating accuracy against a single human-annotated gold standard, the study adopts an epistemological approach that treats sentiment classification as an interpretive act produced by model architectures. To quantify systematic differences across models, the analysis employs information-theoretic and distributional metrics, including Shannon Entropy, Jensen-Shannon Distance, and a Variance Score measuring deviation from aggregate model behavior. The results reveal pronounced and non-random divergence in sentiment distributions. Fine-tuned BERT models, particularly MARBERT, exhibit a strong bias toward neutral classifications, while LLMs consistently amplify negative sentiment, with LLaMA-3.1-8B showing near-total collapse into negativity. Frame-conditioned analysis further demonstrates that GPT-4.1 adjusts sentiment judgments in line with narrative frames (e.g., humanitarian, legal, security), whereas other LLMs display limited contextual modulation. These findings suggest that the choice of model constitutes a choice of interpretive lens, shaping how conflict narratives are algorithmically framed and emotionally evaluated. The study contributes to media studies and computational social science by foregrounding algorithmic discrepancy as an object of analysis and by highlighting the risks of treating automated sentiment outputs as neutral or interchangeable measures of media tone in contexts of war and crisis.

ChatGPT for Nurses: Prompts That Help You Document, Communicate, and Study

Dev.to

I Added a Stopwatch to My AI in 1 LOC Using the Livingrimoire While Corporations Need a Year

Dev.to

Built tasuki — an AI CLI Orchestrator that Seamlessly Hands Off Between Tools

Dev.to

I built a GNOME extension for Codex with local/remote history, live filters, Markdown export, and a read-only MCP server

Reddit r/artificial

I Built an Open‑ Source OS for AI Agents – And It’s Ready for You

Dev.to

Sentiment Classification of Gaza War Headlines: A Comparative Analysis of Large Language Models and Arabic Fine-Tuned BERT Models

Key Points

Abstract

Related Articles

ChatGPT for Nurses: Prompts That Help You Document, Communicate, and Study

I Added a Stopwatch to My AI in 1 LOC Using the Livingrimoire While Corporations Need a Year

Built tasuki — an AI CLI Orchestrator that Seamlessly Hands Off Between Tools

I built a GNOME extension for Codex with local/remote history, live filters, Markdown export, and a read-only MCP server

I Built an Open‑ Source OS for AI Agents – And It’s Ready for You

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer