From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition

arXiv cs.CV / 3/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces SITH (Semantic Inspection of Transformer Heads), a training-free and data-free interpretability framework for CLIP that operates directly in weight space rather than relying on activations and datasets.
For each attention head in CLIP’s vision transformer, it decomposes the value-output matrix using singular vector decomposition and then interprets each component via the new COMP algorithm as sparse, semantically coherent combinations of human-interpretable concepts.
Experiments reportedly validate that SITH produces coherent and faithful explanations, using both reconstruction fidelity and interpretability-focused tests.
The method enables precise weight-space model edits that amplify or suppress specific concepts without retraining, improving downstream performance while retaining interpretability.
The authors also use SITH to analyze fine-tuning, claiming that adaptation mainly reweights an existing stable semantic basis rather than creating entirely new features.

Abstract

As vision-language models are deployed at scale, understanding their internal mechanisms becomes increasingly critical. Existing interpretability methods predominantly rely on activations, making them dataset-dependent, vulnerable to data bias, and often restricted to coarse head-level explanations. We introduce SITH (Semantic Inspection of Transformer Heads), a fully data-free, training-free framework that directly analyzes CLIP's vision transformer in weight space. For each attention head, we decompose its value-output matrix into singular vectors and interpret each one via COMP (Coherent Orthogonal Matching Pursuit), a new algorithm that explains them as sparse, semantically coherent combinations of human-interpretable concepts. We show that SITH yields coherent, faithful intra-head explanations, validated through reconstruction fidelity and interpretability experiments. This allows us to use SITH for precise, interpretable weight-space model edits that amplify or suppress specific concepts, improving downstream performance without retraining. Furthermore, we use SITH to study model adaptation, showing how fine-tuning primarily reweights a stable semantic basis rather than learning entirely new features.

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Dev.to

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Sector HQ Daily AI Intelligence - March 27, 2026

Dev.to

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

Dev.to

From Weights to Concepts: Data-Free Interpretability of CLIP via Singular Vector Decomposition

Key Points

Abstract

Related Articles

GDPR and AI Training Data: What You Need to Know Before Training on Personal Data

Edge-to-Cloud Swarm Coordination for heritage language revitalization programs with embodied agent feedback loops

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Sector HQ Daily AI Intelligence - March 27, 2026

AI Crawler Management: The Definitive Guide to robots.txt for AI Bots

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer