ContraPrompt: Contrastive Prompt Optimization via Dyadic Reasoning Trace Analysis

arXiv cs.AI / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • ContraPrompt is a new prompt optimization approach that uses dyadic reasoning trace analysis to extract optimization signals by comparing a model’s failed and subsequently successful retry traces with feedback on the same input.
  • Instead of contrasting prompts or single execution failures in isolation, it compares complete intermediate reasoning processes where shared elements (model, input, base prompt) make remaining differences reflect reasoning strategy and appended error feedback.
  • The method uses an instrumented, multi-attempt agentic retry loop to automatically generate contrastive training data without human annotation, then organizes extracted rules into an input-aware decision tree for routing.
  • On four reasoning and compliance benchmarks, ContraPrompt outperforms GEPA across all tasks, with reported absolute gains including +8.29 pp on HotPotQA and +2.21 pp on GDPR-Bench, and ablations show that removing dyadic trace contrastivity causes a large performance drop.
  • On additional black-box optimization and FiNER-139 NER tasks, it achieves broader gains (beating GEPA on 11 of 53 problems under equal budget) and improves compliance-aligned financial NER by +7.77 pp over an unoptimized baseline and +1.94 pp over GEPA.

Abstract

Prompt optimization methods either analyze individual failures in isolation or compare prompt variants across examples, operating on single execution traces with no access to the reasoning process distinguishing success from failure on the same input. We introduce ContraPrompt, built on the observation that when a model fails but succeeds on a retry with feedback, the difference between its two chain-of-thought traces constitutes an optimization signal not captured by prior methods. Unlike prior contrastive methods, we compare complete intermediate reasoning processes: the two traces share model, input, and base prompt, so remaining differences reflect reasoning strategy and appended error feedback -- we call this dyadic reasoning trace analysis. The multi-attempt solving phase is an instrumented agentic retry loop that generates contrastive data automatically without human annotation. Extracted rules are organized into an input-aware decision tree routing instructions by observable input characteristics. On four reasoning and compliance benchmarks, ContraPrompt outperforms GEPA (Agrawal et al., 2026) on all four, with absolute gains of +8.29 pp on HotPotQA (+20.8% rel.), +2.21 pp on GDPR-Bench (+18.2% rel.), +7.14 pp on GPQA Diamond (+10.6% rel.), and +0.74 pp on BBH (+0.85% rel.). Ablations confirm dyadic trace contrastivity is the critical component, with a -16% relative average drop upon its removal. On 53 EvalSet black-box optimization problems, ContraPrompt beats GEPA on 11, ties on 41, and loses on 1 at equal budget. On FiNER-139 financial named entity recognition (Loukas et al., 2022), ContraPrompt achieves +7.77 pp over the unoptimized baseline (+11.6% rel.) and +1.94 pp over GEPA (+2.66% rel.), with branch conditions aligning with standard US GAAP financial-instrument categories.