PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space

arXiv cs.LG / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces PromptEvolver, a prompt-inversion method for text-to-image systems that recovers a textual prompt matching a target image.
  • It uses a genetic algorithm to evolve natural-language prompts, guided by a vision-language model to improve reconstruction fidelity.
  • PromptEvolver is designed to work with black-box image generators by relying only on image outputs rather than access to internal model details.
  • The authors report evaluations on multiple prompt-inversion benchmarks, claiming consistent performance gains over existing methods.
  • A key motivation is improving both prompt quality—making prompts more natural and interpretable—and the resulting image reconstruction accuracy.

Abstract

Text-to-image generation has progressed rapidly, but faithfully generating complex scenes requires extensive trial-and-error to find the exact prompt. In the prompt inversion task, the goal is to recover a textual prompt that can faithfully reconstruct a given target image. Currently, existing methods frequently yield suboptimal reconstructions and produce unnatural, hard-to-interpret prompts that hinder transparency and controllability. In this work, we present PromptEvolver, a prompt inversion approach that generates natural-language prompts while achieving high-fidelity reconstructions of the target image. Our method uses a genetic algorithm to optimize the prompt, leveraging a strong vision-language model to guide the evolution process. Importantly, it works on black-box generation models by requiring only image outputs. Finally, we evaluate PromptEvolver across multiple prompt inversion benchmarks and show that it consistently outperforms competing methods.