I gave an AI a CT Scan While It Listened to an Emotional Conversation [R]

Reddit r/artificial / 4/24/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The author built Activation Lab (llmct), a tool that “scans” a language model by capturing snapshots of internal states across all layers while it processes text, aiming to improve interpretability.
In a first experiment using Qwen 2.5 (3B) with a 20-turn conversation that rapidly shifts emotions, the residual stream stayed highly similar (0.83–0.88 cosine similarity) to “emotional fingerprints,” suggesting the model continuously tracks emotional tone.
Emotional features concentrated most strongly in layers 29–33, where deeper layers (especially layer 31) were most discriminative at distinguishing emotions such as joy vs. sadness.
The model also appeared to act like an “emotional shock absorber,” moving toward the user’s intensity without fully matching it, and instruction tuning was reported to shift the model’s internal structure toward positivity.
“Emotional memory” weakened over the dialogue: cosine similarity to the initial matching emotion dropped from about 0.90 in the first message to roughly 0.67–0.73 by message 19, indicating that longer context dilutes the signal.

I created an [Activation Lab](https://github.com/cstefanache/llmct) tool that can be seen as an MRI machine for AI. It captures snapshots of every single layer inside a language model while it processes a conversation.

It allows you to fully understand what is happening, inside a neural network during generation by capturing all internal states of the layers of an LLM and takes snapshots for interpretability.

First experiment: I fed Qwen 2.5 (3B) a 20-turn conversation where the user swings wildly between joy, fear, anger, sadness, apathy, and peace. At every turn, I scanned the AI's internal state and compared it against emotional fingerprints.

Here's what I found:

The AI has an emotional backbone. The residual stream - the main information highway, maintains 0.83–0.88 cosine similarity to emotional references at all times. It always knows the emotional temperature of the conversation.
Emotions are sharpest at layers 29–33. Early layers detect that emotion exists. Middle layers sort positive from negative. But it's the deep layers where the network actually decides "this is joy, not sadness." Layer 31 is the single most discriminative layer in the entire network.
The AI has a built-in shock absorber. When the user is emotionally intense, the assistant's internal state shifts toward that emotion, but never all the way. The gap is consistent: \~0.03 on the backbone, \~0.13 on the deeper processing centers. It acknowledges your feelings while staying calm. Nobody trained it to do this explicitly. It learned it.
Joy is the default setting. Even during angry and sad turns, the joy reference scored highest. Instruction tuning didn't just make the model helpful, it shifted its entire internal geometry toward positivity.
Emotional memory fades. First message: 0.90 cosine with its matching emotion. By message 19: only 0.67–0.73. Longer conversations dilute the signal.

submitted by /u/cstefanache
[link] [comments]