CLaC at SemEval-2026 Task 6: Response Clarity Detection in Political Discourse

arXiv cs.CL / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper presents a system for SemEval-2026 Task 6 (CLARITY) focused on detecting response clarity and evasion in question–answer pairs from U.S. presidential interviews.
Results show an LLM ensemble achieves 80 macro-F1 on the 3-class Task 1 and 59 on the 9-class Task 2, indicating strong performance across both label granularities.
For transformer encoders, a four-stage training pipeline with partial encoder layer unfreezing outperforms full fine-tuning by a wide margin, and ensembling English plus multilingual encoders boosts overall accuracy.
Surprisingly, prompt-based LLMs without task-specific parameter updates outperform fine-tuned encoders, especially on minority classes, and for open-weight LLMs parameter count alone does not predict effectiveness.
The study finds that enriching inputs by concatenating the full interviewer turn improves LLM performance but not encoder performance, while the main remaining error is the Clear Reply/Ambivalent boundary, consistent with human annotation disagreement.

Abstract

In this paper, we present our system for SemEval-2026 Task 6 (CLARITY) on response clarity and evasion detection in question-answer pairs from U.S. presidential interviews, comparing fine-tuned encoders with prompt-based LLMs. Our LLM ensemble achieves 80 macro-F1 on the 3-class Task 1 (9th/41) and 59 on the 9-class Task 2 (3rd/33). Across 8 transformer encoders optimized through a four-stage pipeline, partial encoder layer unfreezing outperforms full fine-tuning by a wide margin. Combining English and multilingual encoders further improves ensemble performance over either family alone, despite multilingual models being individually weaker. Prompt-based LLMs, without any task-specific parameter updates, outperform fine-tuned encoders, particularly on minority classes; among open-weight LLMs, parameter count does not predict performance. Enriched input, concatenating the full interviewer turn, improves LLM performance but not that of encoders, an effect that persists with Longformer's extended context window, suggesting the divergence is not attributable to sequence-length capacity alone in our settings. The Clear Reply/Ambivalent boundary remains the dominant failure mode, mirroring the disagreement among human annotators. Our code, prompts, model configurations, and results are publicly available.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Dev.to

Meta will use AI to analyze height and bone structure to identify if users are underage

TechCrunch

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Dev.to

Building an AI Image Generator SaaS in 2026: My Tech Stack and Lessons

Dev.to

CLaC at SemEval-2026 Task 6: Response Clarity Detection in Political Discourse

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Meta will use AI to analyze height and bone structure to identify if users are underage

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Building an AI Image Generator SaaS in 2026: My Tech Stack and Lessons

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer