CLT-Forge: A Scalable Library for Cross-Layer Transcoders and Attribution Graphs
arXiv cs.LG / 2026/3/24
📰 ニュースIdeas & Deep AnalysisTools & Practical UsageModels & Research
要点
- The paper introduces CLT-Forge, an open-source library aimed at scaling mechanistic interpretability for Large Language Models using Cross-Layer Transcoders (CLTs) and feature attribution graphs.
- It addresses a key bottleneck in existing approaches—feature attribution graphs being large and redundant—by leveraging CLTs that share features across layers while keeping layer-specific decoding.
- CLT-Forge provides an end-to-end framework that combines distributed scalable training (with model sharding and compressed activation caching) and an automated interpretability pipeline for feature analysis and explanation.
- The library computes attribution graphs via Circuit-Tracer and includes a flexible visualization interface to make CLT-based interpretability more practical to use.
- The authors make the code publicly available on GitHub to enable researchers to train, analyze, and visualize CLTs at scale.
