Residual connections haven't changed for 10 years and Kimi just replaced them with attention

Reddit r/LocalLLaMA / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article explains how Attention Residuals replaces traditional residual connections by using a per-layer learned query to attend over previous layer outputs, yielding input-dependent routing.
In scaling experiments, Block AttnRes matches the loss of a baseline trained with 1.25x more compute, and when integrated into a 48B-parameter Kimi Linear model trained on 1.4T tokens, it achieves notable gains on GPQA-Diamond (+7.5), Math (+3.6), and HumanEval (+3.1).
The approach adds modest overhead, with under 4% extra training cost under pipeline parallelism and under 2% additional inference latency.
Karpathy participated in the discussion 'Attention is all you need!', and the article includes a visualization image sourced from a linked X post.

Residual connections haven't changed for 10 years and Kimi just replaced them with attention

In standard residual connections, each layer simply adds its output to the sum of all previous layers with equal weight, no selectivity at all. Attention Residuals replaces this with a softmax attention mechanism: each layer gets a single learned query vector that attends over all previous layer outputs, producing input-dependent weights that let the layer selectively retrieve what it actually needs.

On scaling law experiments, Block AttnRes achieves the same loss as a baseline trained with 1.25x more compute. Integrated into a 48B-parameter (3B activated) Kimi Linear model trained on 1.4T tokens, it improves across all evaluated benchmarks: GPQA-Diamond +7.5, Math +3.6, and HumanEval +3.1. The overhead is minimal: less than 4% additional training cost under pipeline parallelism, and under 2% inference latency increase.

Karpathy also participated in the discussion "Attention is all you need!"

Source of the visualization image: https://x.com/eliebakouch/status/2033488233854620007?s=20

submitted by /u/Helpful-Guava7452
[link] [comments]

The Honest Guide to AI Writing Tools in 2026 (What Actually Works)

Dev.to

Next-Generation LLM Inference Technology: From Flash-MoE to Gemini Flash-Lite, and Local GPU Utilization

Dev.to

The Wave of Open-Source AI and Investment in Security: Trends from Qwen, MS, and Google

Dev.to

How I built a 4-product AI income stack in 4 months (the honest version)

Dev.to

I stopped writing AI prompts from scratch. Here is the system I built instead.

Dev.to

Residual connections haven't changed for 10 years and Kimi just replaced them with attention

Key Points

Related Articles

The Honest Guide to AI Writing Tools in 2026 (What Actually Works)

Next-Generation LLM Inference Technology: From Flash-MoE to Gemini Flash-Lite, and Local GPU Utilization

The Wave of Open-Source AI and Investment in Security: Trends from Qwen, MS, and Google

How I built a 4-product AI income stack in 4 months (the honest version)

I stopped writing AI prompts from scratch. Here is the system I built instead.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer