CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs

arXiv cs.LG / 3/24/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces CLT-Forge, an open-source library aimed at scaling mechanistic interpretability for Large Language Models using Cross-Layer Transcoders (CLTs) and feature attribution graphs.
It addresses a key bottleneck in existing approaches—feature attribution graphs being large and redundant—by leveraging CLTs that share features across layers while keeping layer-specific decoding.
CLT-Forge provides an end-to-end framework that combines distributed scalable training (with model sharding and compressed activation caching) and an automated interpretability pipeline for feature analysis and explanation.
The library computes attribution graphs via Circuit-Tracer and includes a flexible visualization interface to make CLT-based interpretability more practical to use.
The authors make the code publicly available on GitHub to enable researchers to train, analyze, and visualize CLTs at scale.

Abstract

Mechanistic interpretability seeks to understand how Large Language Models (LLMs) represent and process information. Recent approaches based on dictionary learning and transcoders enable representing model computation in terms of sparse, interpretable features and their interactions, giving rise to feature attribution graphs. However, these graphs are often large and redundant, limiting their interpretability in practice. Cross-Layer Transcoders (CLTs) address this issue by sharing features across layers while preserving layer-specific decoding, yielding more compact representations, but remain difficult to train and analyze at scale. We introduce an open-source library for end-to-end training and interpretability of CLTs. Our framework integrates scalable distributed training with model sharding and compressed activation caching, a unified automated interpretability pipeline for feature analysis and explanation, attribution graph computation using Circuit-Tracer, and a flexible visualization interface. This provides a practical and unified solution for scaling CLT-based mechanistic interpretability. Our code is available at: https://github.com/LLM-Interp/CLT-Forge.

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4

Dev.to

How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis

Dev.to

AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?

Dev.to

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team

THE DECODER

CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs

Key Points

Abstract

Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4

How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis

AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?

[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer