Can Large Language Models Reinvent Foundational Algorithms?

arXiv cs.AI / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates whether large language models can “reinvent” established foundational computer science algorithms after they are intentionally unlearned from the model’s pretrained knowledge.
  • It introduces an Unlearn-and-Reinvent pipeline that uses a GRPO-based, on-policy unlearning method to remove specific algorithms (e.g., Dijkstra’s, Euclid’s) and then evaluates reinvention in a controlled setting.
  • Experiments across 10 target algorithms, three open-weight models, and multiple hint levels show that the best-performing model (Qwen3-4B-Thinking-2507) reinvents 50% of algorithms with no hints, rising to 70% with hint level 1 and 90% with hint level 2.
  • The study finds that hints can significantly help for simpler algorithms, but even step-by-step hints may not work for more complex ones, while test-time reinforcement learning enables successful reinvention for the Strassen algorithm at higher hint levels.
  • Analysis and ablations suggest that a generative verifier during the reinvention phase is crucial for maintaining reasoning quality and avoiding “thought collapse,” revealing both potential and limitations of LLM-based algorithmic innovation.

Abstract

LLMs have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundational innovation, however, remains an open question. In this work, we focus on a prerequisite for foundational innovation: can LLMs reinvent foundational algorithms in computer science? Our \textit{Unlearn-and-Reinvent} pipeline applies LLM unlearning to remove a specific foundational algorithm, such as Dijkstra's or Euclid's algorithm, from an LLM's pretrained knowledge, and then tests whether the model can reinvent it in a controlled environment. To enable effective unlearning, we adopt a GRPO-based, on-policy unlearning method. Across 10 target algorithms, 3 strong open-weight models, and 3 hint levels, our experiments demonstrate that (1) the strongest model Qwen3-4B-Thinking-2507 successfully reinvents 50% of the algorithms with no hint, 70% at hint level 1, and 90% at hint level 2; (2) a few high-level hints can enhance the reinvention success rate, but even step-by-step hints fail for those complicated algorithms; and (3) test-time reinforcement learning enables successful reinvention for the Strassen algorithm at hint level 2. Through analyses of output trajectories and ablation studies, we find that generative verifier in the reinvention phase plays a critical role in sustaining models' reasoning strength, helping to avoid the ``thought collapse'' phenomenon. These findings offer insights into both the potential and current limits of LLMs' innovative thinking.

Can Large Language Models Reinvent Foundational Algorithms? | AI Navigate