CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs

arXiv cs.LG / 3/24/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces CLT-Forge, an open-source library aimed at scaling mechanistic interpretability for Large Language Models using Cross-Layer Transcoders (CLTs) and feature attribution graphs.
  • It addresses a key bottleneck in existing approaches—feature attribution graphs being large and redundant—by leveraging CLTs that share features across layers while keeping layer-specific decoding.
  • CLT-Forge provides an end-to-end framework that combines distributed scalable training (with model sharding and compressed activation caching) and an automated interpretability pipeline for feature analysis and explanation.
  • The library computes attribution graphs via Circuit-Tracer and includes a flexible visualization interface to make CLT-based interpretability more practical to use.
  • The authors make the code publicly available on GitHub to enable researchers to train, analyze, and visualize CLTs at scale.

Abstract

Mechanistic interpretability seeks to understand how Large Language Models (LLMs) represent and process information. Recent approaches based on dictionary learning and transcoders enable representing model computation in terms of sparse, interpretable features and their interactions, giving rise to feature attribution graphs. However, these graphs are often large and redundant, limiting their interpretability in practice. Cross-Layer Transcoders (CLTs) address this issue by sharing features across layers while preserving layer-specific decoding, yielding more compact representations, but remain difficult to train and analyze at scale. We introduce an open-source library for end-to-end training and interpretability of CLTs. Our framework integrates scalable distributed training with model sharding and compressed activation caching, a unified automated interpretability pipeline for feature analysis and explanation, attribution graph computation using Circuit-Tracer, and a flexible visualization interface. This provides a practical and unified solution for scaling CLT-based mechanistic interpretability. Our code is available at: https://github.com/LLM-Interp/CLT-Forge.