CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges
arXiv cs.AI / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- CreativeBench provides a benchmark to evaluate machine creativity in code generation with two subsets (CreativeBench-Combo and CreativeBench-Explore) and an automated pipeline using reverse engineering and self-play.
- It defines a unified metric as the product of quality and novelty to distinguish creativity from hallucination when using executable code.
- Key findings: scaling improves combinatorial creativity, with diminishing returns for exploratory creativity; larger models converge by scaling, becoming more correct but less divergent; reasoning helps more for constrained exploration than combination.
- It introduces EvoRePE, an inference-time steering strategy that internalizes evolutionary search patterns to boost machine creativity.
- The work offers a framework for objective benchmarking and guidance for future research in AI creativity.
Related Articles

I let an AI agent loose on my codebase. It tried to read my .env file in 30 seconds.
Dev.to
Alex Chenglin Wu of DeepWisdom On The Future Of Artificial Intelligence | by Chad Silverstein | Authority Magazine | Mar, 2026
Reddit r/artificial
The Exit
Dev.to

Chip Smuggling Arrests, OpenClaw Is 'The Next ChatGPT,' and 81K People on AI
Dev.to
The Crucible
Dev.to