ContraPrompt: Contrastive Prompt Optimization via Dyadic Reasoning Trace Analysis

arXiv cs.AI / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

ContraPrompt is a new prompt optimization approach that uses dyadic reasoning trace analysis to extract optimization signals by comparing a model’s failed and subsequently successful retry traces with feedback on the same input.
Instead of contrasting prompts or single execution failures in isolation, it compares complete intermediate reasoning processes where shared elements (model, input, base prompt) make remaining differences reflect reasoning strategy and appended error feedback.
The method uses an instrumented, multi-attempt agentic retry loop to automatically generate contrastive training data without human annotation, then organizes extracted rules into an input-aware decision tree for routing.
On four reasoning and compliance benchmarks, ContraPrompt outperforms GEPA across all tasks, with reported absolute gains including +8.29 pp on HotPotQA and +2.21 pp on GDPR-Bench, and ablations show that removing dyadic trace contrastivity causes a large performance drop.
On additional black-box optimization and FiNER-139 NER tasks, it achieves broader gains (beating GEPA on 11 of 53 problems under equal budget) and improves compliance-aligned financial NER by +7.77 pp over an unoptimized baseline and +1.94 pp over GEPA.

Abstract

Prompt optimization methods either analyze individual failures in isolation or compare prompt variants across examples, operating on single execution traces with no access to the reasoning process distinguishing success from failure on the same input. We introduce ContraPrompt, built on the observation that when a model fails but succeeds on a retry with feedback, the difference between its two chain-of-thought traces constitutes an optimization signal not captured by prior methods. Unlike prior contrastive methods, we compare complete intermediate reasoning processes: the two traces share model, input, and base prompt, so remaining differences reflect reasoning strategy and appended error feedback -- we call this dyadic reasoning trace analysis. The multi-attempt solving phase is an instrumented agentic retry loop that generates contrastive data automatically without human annotation. Extracted rules are organized into an input-aware decision tree routing instructions by observable input characteristics. On four reasoning and compliance benchmarks, ContraPrompt outperforms GEPA (Agrawal et al., 2026) on all four, with absolute gains of +8.29 pp on HotPotQA (+20.8% rel.), +2.21 pp on GDPR-Bench (+18.2% rel.), +7.14 pp on GPQA Diamond (+10.6% rel.), and +0.74 pp on BBH (+0.85% rel.). Ablations confirm dyadic trace contrastivity is the critical component, with a -16% relative average drop upon its removal. On 53 EvalSet black-box optimization problems, ContraPrompt beats GEPA on 11, ties on 41, and loses on 1 at equal budget. On FiNER-139 financial named entity recognition (Loukas et al., 2022), ContraPrompt achieves +7.77 pp over the unoptimized baseline (+11.6% rel.) and +1.94 pp over GEPA (+2.66% rel.), with branch conditions aligning with standard US GAAP financial-instrument categories.

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.

Reddit r/artificial

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs

The Register

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle

Dev.to

DEEPX and Hyundai Are Building Generative AI Robots

Dev.to

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline

Dev.to

ContraPrompt: Contrastive Prompt Optimization via Dyadic Reasoning Trace Analysis

Key Points

Abstract

Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle

DEEPX and Hyundai Are Building Generative AI Robots

Stop Paying OpenAI to Read Garbage: The Two-Stage Agent Pipeline

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer