MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization

arXiv cs.AI / 3/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

MetaKE reframes knowledge editing as a bi-level optimization where the edit target is a learnable meta-parameter, guiding the upper-level objective to maximize post-edit performance within the model's feasible region.
It identifies a Semantic-Execution Disconnect where targets are defined independently of the downstream feasible region, leading to gradient truncation and failed edits.
To differentiate through complex solvers, it introduces a Structural Gradient Proxy that backpropagates editability constraints into the target learning phase.
Theoretical analysis shows the method automatically aligns the edit direction with the model's feasible manifold, and experiments demonstrate significant improvements over strong baselines.

Abstract

Knowledge editing (KE) aims to precisely rectify specific knowledge in Large Language Models (LLMs) without disrupting general capabilities. State-of-the-art methods suffer from an open-loop control mismatch. We identify a critical "Semantic-Execution Disconnect": the semantic target is derived independently without feedback from the downstream's feasible region. This misalignment often causes valid semantic targets to fall within the prohibited space, resulting in gradient truncation and editing failure. To bridge this gap, we propose MetaKE (Meta-learning Aligned Knowledge Editing), a new framework that reframes KE as a bi-level optimization problem. Departing from static calculation, MetaKE treats the edit target as a learnable meta-parameter: the upper-level optimizer seeks a feasible target to maximize post-edit performance, while the lower-level solver executes the editing. To address the challenge of differentiating through complex solvers, we derive a Structural Gradient Proxy, which explicitly backpropagates editability constraints to the target learning phase. Theoretical analysis demonstrates that MetaKE automatically aligns the edit direction with the model's feasible manifold. Extensive experiments confirm that MetaKE significantly outperforms strong baselines, offering a new perspective on knowledge editing.