The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

arXiv cs.LG / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes the Master Key Hypothesis, claiming that specific post-trained capabilities correspond to directions within a low-dimensional latent subspace that can be transferred across model scales via linear alignment without retraining.
  • It introduces UNLOCK, a training-free, label-free method that extracts a capability direction by contrasting activations from capability-present vs. capability-absent source variants, then aligns and applies that direction to a target model at inference time.
  • Experiments on reasoning tasks (including Chain-of-Thought and mathematical reasoning) show substantial cross-model improvements even when transferring between different model sizes.
  • Reported results include a 12.1% MATH accuracy gain when transferring CoT reasoning from Qwen1.5-14B to Qwen1.5-7B, and an AGIEval Math increase from 61.1% to 71.3% when transferring math reasoning between Qwen3 model variants.
  • The authors argue transfer success depends on capabilities present from pre-training and suggest the intervention works by sharpening the output distribution toward successful reasoning trajectories.

Abstract

We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment. Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants, aligns it with a Target model through a low-rank linear transformation, and applies it at inference time to elicit the behavior. Experiments on reasoning behaviors, including Chain-of-Thought (CoT) and mathematical reasoning, demonstrate substantial improvements across model scales without training. For example, transferring CoT reasoning from Qwen1.5-14B to Qwen1.5-7B yields an accuracy gain of 12.1% on MATH, and transferring a mathematical reasoning direction from Qwen3-4B-Base to Qwen3-14B-Base improves AGIEval Math accuracy from 61.1% to 71.3%, surpassing the 67.8% achieved by the 14B post-trained model. Our analysis shows that the success of transfer depends on the capabilities learned during pre-training, and that our intervention amplifies latent capabilities by sharpening the output distribution toward successful reasoning trajectories.