Exhaustive Circuit Mapping of a Single-Cell Foundation Model Reveals Massive Redundancy, Heavy-Tailed Hub Architecture, and Layer-Dependent Differentiation Control

arXiv cs.LG / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Exhaustive circuit tracing of all 4065 active sparse autoencoder features at layer 5 in Geneformer reveals 1,393,850 significant downstream edges, a 27-fold expansion over prior selective sampling.
The results show a heavy-tailed hub distribution, with 1.8% of features accounting for the majority of connectivity and 40% of the top 20 hubs lacking biological annotation, indicating bias in previous analyses.
Three-way combinatorial ablation across eight feature triplets shows redundancy deepening with interaction order (three-way ratio 0.59 vs. pairwise 0.74) and zero synergy, indicating subadditivity.
Trajectory-guided feature steering establishes a causal link between layer position and differentiation direction, with late-layer features at L17 pushing toward maturity (fraction positive = 1.0) while early/mid-layer features at L0 and L11 push away from maturity (0.00–0.58).
Taken together, these findings provide causal evidence for layer-dependent control of cell state in a single-cell foundation model, moving beyond previous correlational analyses.

Abstract

Mechanistic interpretability of biological foundation models has relied on selective feature sampling, pairwise interaction testing, and observational trajectory analysis. Each of these can introduce systematic bias. Here we present three experiments that address these limitations through exhaustive circuit tracing, higher order combinatorial ablation, and causal trajectory steering in Geneformer, a transformer based single cell foundation model. First, exhaustive tracing of all 4065 active sparse autoencoder features at layer 5 yields 1393850 significant downstream edges, a 27 fold expansion over selective sampling. This reveals a heavy tailed hub distribution in which 1.8 percent of features account for disproportionate connectivity and 40 percent of the top 20 hubs lack biological annotation. These results indicate systematic annotation bias in prior selective analyses. Second, three way combinatorial ablation across 8 feature triplets shows that redundancy deepens monotonically with interaction order, with a three way ratio of 0.59 versus a pairwise ratio of 0.74, and with zero synergy. This confirms that the model architecture is subadditive at all tested orders. Third, trajectory guided feature steering establishes a causal link between layer position and differentiation directionality. Late layer features at L17 consistently push cell states toward maturity, with fraction positive equal to 1.0. Early and mid layer features at L0 and L11 mostly push away from maturity, with fraction positive ranging from 0.00 to 0.58. Together these results move from correlation toward causal evidence for layer dependent control of cell state.

Is AI becoming a bubble, and could it end like the dot-com crash?

Reddit r/artificial

Externalizing State

Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

How to settle on a coding LLM ? What parameters to watch out for ?

Reddit r/LocalLLaMA

Exhaustive Circuit Mapping of a Single-Cell Foundation Model Reveals Massive Redundancy, Heavy-Tailed Hub Architecture, and Layer-Dependent Differentiation Control

Key Points

Abstract

Related Articles

Is AI becoming a bubble, and could it end like the dot-com crash?

Externalizing State

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

How to settle on a coding LLM ? What parameters to watch out for ?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer