CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge
arXiv cs.CL / 4/7/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces CresOWLve, a new benchmark designed to measure creative problem-solving by using puzzles grounded in real-world knowledge rather than contrived brainteasers.
- CresOWLve aims to better reflect real creative workflows by requiring multiple cognitive strategies, cross-domain knowledge retrieval, and the creative recombination of facts.
- Experiments on several frontier “thinking” and “non-thinking” LLMs show that the benchmark remains highly challenging overall.
- Results indicate a consistent performance gap, with models answering factual questions substantially better than creative ones, including drops of up to about 17%.
- The analysis suggests that while models can often retrieve relevant information, they struggle to make the non-obvious connections needed to integrate knowledge and produce correct creative solutions.
Related Articles

Black Hat Asia
AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to