Domain-Filtered Knowledge Graphs from Sparse Autoencoder Features
arXiv cs.AI / 4/28/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a method to turn large sparse autoencoder (SAE) feature inventories into domain-specific, structured knowledge by filtering out weakly grounded and generic features.
- It builds a strict concept universe for a target domain using contrastive activations followed by a multi-stage filtering pipeline to reduce concept mixing.
- From the filtered features, it creates two aligned graph views: a corpus-level co-occurrence graph at multiple granularities and a transcoder-based mechanism graph connecting source- and target-layer features via sparse latent pathways.
- Automated edge labeling converts these graph structures into readable knowledge graphs, demonstrated with a biology textbook case study that recovers chapter/subchapter organization and reveals bridging concepts.
- The approach reframes SAE interpretability from isolated feature lists into a global internal map of model knowledge that can support audits of reasoning faithfulness.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Everyone Wants AI Agents. Fewer Teams Are Ready for the Messy Business Context Behind Them
Dev.to
AI 编程工具对比 2026:Claude Code vs Cursor vs Gemini CLI vs Codex
Dev.to

How I Improved My YouTube Shorts and Podcast Audio Workflow with AI Tools
Dev.to

An improvement of the convergence proof of the ADAM-Optimizer
Dev.to