ReVEL: Multi-Turn Reflective LLM-Guided Heuristic Evolution via Structured Performance Feedback

arXiv cs.AI / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ReVEL, a hybrid framework that uses an LLM as an interactive, multi-turn reasoner inside an evolutionary algorithm to evolve heuristics for NP-hard combinatorial optimization problems.
  • ReVEL improves on prior one-shot LLM code synthesis by adding two core mechanisms: performance-profile grouping (clustering heuristics into behaviorally coherent groups for compact feedback) and structured multi-turn reflection (using group-level behavior analysis to propose targeted refinements).
  • Proposed heuristic refinements are selectively applied and checked by an EA-based meta-controller that adaptively balances exploration versus exploitation.
  • Experiments on standard combinatorial optimization benchmarks indicate ReVEL generates heuristics that are more robust and diverse, with statistically significant gains over strong baselines.
  • The authors position multi-turn reasoning combined with structured grouping as a principled paradigm for automated heuristic design in optimization settings.

Abstract

Designing effective heuristics for NP-hard combinatorial optimization problems remains a challenging and expertise-intensive task. Existing applications of large language models (LLMs) primarily rely on one-shot code synthesis, yielding brittle heuristics that underutilize the models' capacity for iterative reasoning. We propose ReVEL: Multi-Turn Reflective LLM-Guided Heuristic Evolution via Structured Performance Feedback, a hybrid framework that embeds LLMs as interactive, multi-turn reasoners within an evolutionary algorithm (EA). The core of ReVEL lies in two mechanisms: (i) performance-profile grouping, which clusters candidate heuristics into behaviorally coherent groups to provide compact and informative feedback to the LLM; and (ii) multi-turn, feedback-driven reflection, through which the LLM analyzes group-level behaviors and generates targeted heuristic refinements. These refinements are selectively integrated and validated by an EA-based meta-controller that adaptively balances exploration and exploitation. Experiments on standard combinatorial optimization benchmarks show that ReVEL consistently produces heuristics that are more robust and diverse, achieving statistically significant improvements over strong baselines. Our results highlight multi-turn reasoning with structured grouping as a principled paradigm for automated heuristic design.