Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients
arXiv cs.AI / 3/17/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- Gradient Atoms is an unsupervised method that decomposes per-document training gradients into sparse components ("atoms") using dictionary learning in a preconditioned eigenspace.
- Among 500 discovered atoms, the highest-coherence ones recover interpretable task-type behaviors (refusal, arithmetic, yes/no classification, trivia QA) without any behavioral labels.
- These atoms also function as steering vectors: applying them as weight-space perturbations yields large, controllable shifts in model behavior (e.g., bulleted-list generation rising from 33% to 94%, systematic refusal dropping from 50% to 0%).
- The method requires no query-document scoring stage, scales independently of the number of query behaviors, and code is available at https://github.com/jrosseruk/gradient_atoms.
Related Articles

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer
The Batch

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".
Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development
Dev.to

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**
Dev.to

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)
Dev.to