CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges
arXiv cs.AI / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- CreativeBench provides a benchmark to evaluate machine creativity in code generation with two subsets (CreativeBench-Combo and CreativeBench-Explore) and an automated pipeline using reverse engineering and self-play.
- It defines a unified metric as the product of quality and novelty to distinguish creativity from hallucination when using executable code.
- Key findings: scaling improves combinatorial creativity, with diminishing returns for exploratory creativity; larger models converge by scaling, becoming more correct but less divergent; reasoning helps more for constrained exploration than combination.
- It introduces EvoRePE, an inference-time steering strategy that internalizes evolutionary search patterns to boost machine creativity.
- The work offers a framework for objective benchmarking and guidance for future research in AI creativity.
Related Articles

The programming passion is melting
Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA