Learning to Edit Knowledge via Instruction-based Chain-of-Thought Prompting

arXiv cs.CL / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CoT2Edit, a framework for teaching LLMs to perform knowledge editing with better generalization to practical tasks and problems.
  • It addresses two limitations of prior knowledge-editing methods: rigid fact injection that doesn’t reliably translate to real-world solving, and narrow focus on structured triples while ignoring unstructured sources like news and articles.
  • CoT2Edit generates high-quality instruction data using language model agents to produce chain-of-thought (CoT) reasoning over both structured and unstructured edited knowledge.
  • The approach trains the model using supervised fine-tuning (SFT) combined with Group Relative Policy Optimization (GRPO), then adds Retrieval-Augmented Generation (RAG) at inference to fetch relevant edited facts in real time.
  • Experiments report strong generalization across six knowledge-editing scenarios using a single round of training on three open-source language models, with code released on GitHub.

Abstract

Large language models (LLMs) can effectively handle outdated information through knowledge editing. However, current approaches face two key limitations: (I) Poor generalization: Most approaches rigidly inject new knowledge without ensuring that the model can use it effectively to solve practical problems. (II) Narrow scope: Current methods focus primarily on structured fact triples, overlooking the diverse unstructured forms of factual information (e.g., news, articles) prevalent in real-world contexts. To address these challenges, we propose a new paradigm: teaching LLMs to edit knowledge via Chain of Thoughts (CoTs) reasoning (CoT2Edit). We first leverage language model agents for both structured and unstructured edited data to generate CoTs, building high-quality instruction data. The model is then trained to reason over edited knowledge through supervised fine-tuning (SFT) and Group Relative Policy Optimization (GRPO). At inference time, we integrate Retrieval-Augmented Generation (RAG) to dynamically retrieve relevant edited facts for real-time knowledge editing. Experimental results demonstrate that our method achieves strong generalization across six diverse knowledge editing scenarios with just a single round of training on three open-source language models. The codes are available at https://github.com/FredJDean/CoT2Edit.