Can Large Language Models Reinvent Foundational Algorithms?

arXiv cs.AI / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates whether large language models can “reinvent” established foundational computer science algorithms after they are intentionally unlearned from the model’s pretrained knowledge.
It introduces an Unlearn-and-Reinvent pipeline that uses a GRPO-based, on-policy unlearning method to remove specific algorithms (e.g., Dijkstra’s, Euclid’s) and then evaluates reinvention in a controlled setting.
Experiments across 10 target algorithms, three open-weight models, and multiple hint levels show that the best-performing model (Qwen3-4B-Thinking-2507) reinvents 50% of algorithms with no hints, rising to 70% with hint level 1 and 90% with hint level 2.
The study finds that hints can significantly help for simpler algorithms, but even step-by-step hints may not work for more complex ones, while test-time reinforcement learning enables successful reinvention for the Strassen algorithm at higher hint levels.
Analysis and ablations suggest that a generative verifier during the reinvention phase is crucial for maintaining reasoning quality and avoiding “thought collapse,” revealing both potential and limitations of LLM-based algorithmic innovation.

Abstract

LLMs have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundational innovation, however, remains an open question. In this work, we focus on a prerequisite for foundational innovation: can LLMs reinvent foundational algorithms in computer science? Our \textit{Unlearn-and-Reinvent} pipeline applies LLM unlearning to remove a specific foundational algorithm, such as Dijkstra's or Euclid's algorithm, from an LLM's pretrained knowledge, and then tests whether the model can reinvent it in a controlled environment. To enable effective unlearning, we adopt a GRPO-based, on-policy unlearning method. Across 10 target algorithms, 3 strong open-weight models, and 3 hint levels, our experiments demonstrate that (1) the strongest model Qwen3-4B-Thinking-2507 successfully reinvents 50% of the algorithms with no hint, 70% at hint level 1, and 90% at hint level 2; (2) a few high-level hints can enhance the reinvention success rate, but even step-by-step hints fail for those complicated algorithms; and (3) test-time reinforcement learning enables successful reinvention for the Strassen algorithm at hint level 2. Through analyses of output trajectories and ablation studies, we find that generative verifier in the reinvention phase plays a critical role in sustaining models' reasoning strength, helping to avoid the ``thought collapse'' phenomenon. These findings offer insights into both the potential and current limits of LLMs' innovative thinking.

Black Hat Asia

AI Business

Meta's latest model is as open as Zuckerberg's private school

The Register

AI fuels global trade growth as China-US flows shift, McKinsey finds

SCMP Tech

Why multi-agent AI security is broken (and the identity patterns that actually work)

Dev.to

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

Reddit r/artificial

Can Large Language Models Reinvent Foundational Algorithms?

Key Points

Abstract

Related Articles

Black Hat Asia

Meta's latest model is as open as Zuckerberg's private school

AI fuels global trade growth as China-US flows shift, McKinsey finds

Why multi-agent AI security is broken (and the identity patterns that actually work)

BANKING77-77: New best of 94.61% on the official test set (+0.13pp) over our previous tests 94.48%.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer