CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs

arXiv cs.LG / 2026/3/24

📰 ニュースIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

要点

The paper introduces CLT-Forge, an open-source library aimed at scaling mechanistic interpretability for Large Language Models using Cross-Layer Transcoders (CLTs) and feature attribution graphs.
It addresses a key bottleneck in existing approaches—feature attribution graphs being large and redundant—by leveraging CLTs that share features across layers while keeping layer-specific decoding.
CLT-Forge provides an end-to-end framework that combines distributed scalable training (with model sharding and compressed activation caching) and an automated interpretability pipeline for feature analysis and explanation.
The library computes attribution graphs via Circuit-Tracer and includes a flexible visualization interface to make CLT-based interpretability more practical to use.
The authors make the code publicly available on GitHub to enable researchers to train, analyze, and visualize CLTs at scale.

Abstract

Mechanistic interpretability seeks to understand how Large Language Models (LLMs) represent and process information. Recent approaches based on dictionary learning and transcoders enable representing model computation in terms of sparse, interpretable features and their interactions, giving rise to feature attribution graphs. However, these graphs are often large and redundant, limiting their interpretability in practice. Cross-Layer Transcoders (CLTs) address this issue by sharing features across layers while preserving layer-specific decoding, yielding more compact representations, but remain difficult to train and analyze at scale. We introduce an open-source library for end-to-end training and interpretability of CLTs. Our framework integrates scalable distributed training with model sharding and compressed activation caching, a unified automated interpretability pipeline for feature analysis and explanation, attribution graph computation using Circuit-Tracer, and a flexible visualization interface. This provides a practical and unified solution for scaling CLT-based mechanistic interpretability. Our code is available at: https://github.com/LLM-Interp/CLT-Forge.

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

Publickey

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

Qiita

なんと397BのAIモデルをiPhoneで動かすことに成功

GIGAZINE

Microsoft Learn参照させるAgent Skillsあるじゃん！

Zenn

Claude Code がアホになるのはあなたのせいじゃない ― オートコンパクティングの罠と完全回避術

Zenn

CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs

要点

Abstract

関連記事

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

なんと397BのAIモデルをiPhoneで動かすことに成功

Microsoft Learn参照させるAgent Skillsあるじゃん！

Claude Code がアホになるのはあなたのせいじゃない ― オートコンパクティングの罠と完全回避術

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer