End-to-end Feature Alignment: A Simple CNN with Intrinsic Class Attribution

arXiv cs.CV / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Feature-Align CNN (FA-CNN), a prototype convolutional neural network designed for end-to-end feature alignment that yields intrinsic class attribution in its feature maps.
It argues that typical unordered operations (e.g., Linear and Conv2D) can shuffle semantic concepts, and proposes order-preserving mechanisms—dampened skip connections and a global average pooling classifier head—to maintain alignment from input pixels to class logits.
The authors provide theoretical results showing the FA-CNN penultimate feature maps are identical to Grad-CAM saliency maps, strengthening the model’s interpretability link to established attribution methods.
They also show analytically that features “morph” gradually across network depth toward penultimate class activations, describing how representations evolve layer by layer.
Experiments report strong benchmark image classification performance and improved interpretability versus Grad-CAM and permutation-based baselines on a percent-pixels-removed evaluation task.

Abstract

We present Feature-Align CNN (FA-CNN), a prototype CNN architecture with intrinsic class attribution through end-to-end feature alignment. Our intuition is that the use of unordered operations such as Linear and Conv2D layers cause unnecessary shuffling and mixing of semantic concepts, thereby making raw feature maps difficult to understand. We introduce two new order preserving layers, the dampened skip connection, and the global average pooling classifier head. These layers force the model to maintain an end-to-end feature alignment from the raw input pixels all the way to final class logits. This end-to-end alignment enhances the interpretability of the model by allowing the raw feature maps to intrinsically exhibit class attribution. We prove theoretically that FA-CNN penultimate feature maps are identical to Grad-CAM saliency maps. Moreover, we prove that these feature maps slowly morph layer-by-layer over network depth, showing the evolution of features through network depth toward penultimate class activations. FA-CNN performs well on benchmark image classification datasets. Moreover, we compare the averaged FA-CNN raw feature maps against Grad-CAM and permutation methods in a percent pixels removed interpretability task. We conclude this work with a discussion and future, including limitations and extensions toward hybrid models.

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Simon Willison's Blog

Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026

Dev.to

I missed the "fun" part in software development

Dev.to

The Billion Dollar Tax on AI Agents

Dev.to

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere

Dev.to

End-to-end Feature Alignment: A Simple CNN with Intrinsic Class Attribution

Key Points

Abstract

Related Articles

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026

I missed the "fun" part in software development

The Billion Dollar Tax on AI Agents

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer