Duluth at SemEval-2026 Task 6: DeBERTa with LLM-Augmented Data for Unmasking Political Question Evasions

arXiv cs.CL / 4/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper describes the “Duluth” system submitted to SemEval-2026 Task 6 (CLARITY) for identifying and classifying political question evasions using a two-level taxonomy of response clarity.
The approach is built on DeBERTa-V3-base, enhanced with focal loss, layer-wise learning rate decay, and boolean discourse features to improve clarity and evasion classification of question–answer pairs.
To handle class imbalance, the authors generate synthetic minority-class training examples using Gemini 3 and Claude Sonnet 4.5 for LLM-augmented data augmentation.
On the Task 1 evaluation set, Duluth’s best model reaches a Macro F1 of 0.76 (8th of 40 teams), improving minority-class recall for nuanced political discourse, though key errors come from Ambivalent vs. Clear Reply confusion.
The error analysis suggests model disagreements reflect human annotator disagreements, reinforcing that annotation ambiguity remains a major challenge in this task.

Abstract

This paper presents the Duluth approach to SemEval-2026 Task 6 on CLARITY: Unmasking Political Question Evasions. We address Task 1 (clarity-level classification) and Task 2 (evasion-level classification), both of which involve classifying question--answer pairs from U.S.\ presidential interviews using a two-level taxonomy of response clarity. Our system is based on DeBERTa-V3-base, extended with focal loss, layer-wise learning rate decay, and boolean discourse features. To address class imbalance in the training data, we augment minority classes using synthetic examples generated by Gemini 3 and Claude Sonnet 4.5. Our best configuration achieved a Macro F1 of 0.76 on the Task 1 evaluation set, placing 8th out of 40 teams. The top-ranked system (TeleAI) achieved 0.89, while the mean score across participants was 0.70. Error analysis reveals that the dominant source of misclassification is confusion between Ambivalent and Clear Reply responses, a pattern that mirrors disagreements among human annotators. Our findings demonstrate that LLM-based data augmentation can meaningfully improve minority-class recall on nuanced political discourse tasks.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Dev.to

Why use an AI gateway at all?

Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

Dev.to

Duluth at SemEval-2026 Task 6: DeBERTa with LLM-Augmented Data for Unmasking Political Question Evasions

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Why use an AI gateway at all?

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer