RASPRef: Retrieval-Augmented Self-Supervised Prompt Refinement for Large Reasoning Models

arXiv cs.CL / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces RASPRef, a framework for Retrieval-Augmented Self-Supervised Prompt Refinement that optimizes prompts directly for large reasoning models rather than only improving outputs.
  • RASPRef iteratively refines prompts by retrieving relevant examples and prior generated reasoning trajectories, then using self-supervised signals such as multi-sample consistency, verifier feedback, and model-generated critiques.
  • Experiments on GSM8K-style mathematical reasoning tasks indicate that retrieval-guided prompting can outperform a static prompting baseline.
  • The authors analyze how factors like retrieval quality, trajectory selection, and the choice of self-supervised feedback signals affect the effectiveness of prompt refinement.
  • The work argues that prompt engineering remains a key performance lever for reasoning-focused LLMs and proposes a scalable, annotation-free method for improving prompts across tasks and domains.

Abstract

Recent reasoning-focused language models such as DeepSeek R1 and OpenAI o1 have demonstrated strong performance on structured reasoning benchmarks including GSM8K, MATH, and multi-hop question answering tasks. However, their performance remains highly sensitive to prompt formulation, and designing effective prompts is typically a manual and iterative process that does not scale well across tasks or domains. To address this limitation, we introduce Retrieval-Augmented Self-Supervised Prompt Refinement (RASPRef), a framework that improves prompts without requiring human annotations or task-specific supervision. The approach retrieves relevant examples and previously generated reasoning trajectories, and leverages signals such as multi-sample consistency, verifier feedback, and model-generated critiques to iteratively refine the prompt. Unlike prior approaches that focus primarily on improving model outputs, RASPRef directly treats the prompt as the optimization target and improves it through an iterative retrieval-guided refinement process. Experiments on GSM8K-style mathematical reasoning tasks show that retrieval-guided prompting improves performance compared with a static prompting baseline. We further discuss how retrieval quality, trajectory selection, and self-supervised feedback signals may influence the effectiveness of prompt refinement. These findings suggest that prompt design remains a critical factor for reasoning-oriented language models, and that self-improving prompts offer a practical and scalable strategy for improving reasoning performance.