Duluth at SemEval-2026 Task 6: DeBERTa with LLM-Augmented Data for Unmasking Political Question Evasions

arXiv cs.CL / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper describes the “Duluth” system submitted to SemEval-2026 Task 6 (CLARITY) for identifying and classifying political question evasions using a two-level taxonomy of response clarity.
  • The approach is built on DeBERTa-V3-base, enhanced with focal loss, layer-wise learning rate decay, and boolean discourse features to improve clarity and evasion classification of question–answer pairs.
  • To handle class imbalance, the authors generate synthetic minority-class training examples using Gemini 3 and Claude Sonnet 4.5 for LLM-augmented data augmentation.
  • On the Task 1 evaluation set, Duluth’s best model reaches a Macro F1 of 0.76 (8th of 40 teams), improving minority-class recall for nuanced political discourse, though key errors come from Ambivalent vs. Clear Reply confusion.
  • The error analysis suggests model disagreements reflect human annotator disagreements, reinforcing that annotation ambiguity remains a major challenge in this task.

Abstract

This paper presents the Duluth approach to SemEval-2026 Task 6 on CLARITY: Unmasking Political Question Evasions. We address Task 1 (clarity-level classification) and Task 2 (evasion-level classification), both of which involve classifying question--answer pairs from U.S.\ presidential interviews using a two-level taxonomy of response clarity. Our system is based on DeBERTa-V3-base, extended with focal loss, layer-wise learning rate decay, and boolean discourse features. To address class imbalance in the training data, we augment minority classes using synthetic examples generated by Gemini 3 and Claude Sonnet 4.5. Our best configuration achieved a Macro F1 of 0.76 on the Task 1 evaluation set, placing 8th out of 40 teams. The top-ranked system (TeleAI) achieved 0.89, while the mean score across participants was 0.70. Error analysis reveals that the dominant source of misclassification is confusion between Ambivalent and Clear Reply responses, a pattern that mirrors disagreements among human annotators. Our findings demonstrate that LLM-based data augmentation can meaningfully improve minority-class recall on nuanced political discourse tasks.