WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior

arXiv cs.CL / 3/20/2026

📰 NewsModels & Research

共有:

Key Points

WASD (unWeaving Actionable Sufficient Directives) is a new framework that explains LLM behavior by identifying sufficient neuron-activation predicates, enabling more natural language controllability over outputs.
It represents candidate conditions as neuron-activation predicates and iteratively searches for a minimal subset that guarantees the current output under input perturbations.
The approach outperforms conventional attribution graphs in stability, accuracy, and conciseness in SST-2 and CounterFact experiments using the Gemma-2-2B model.
A case study on cross-lingual output generation demonstrates WASD's practical effectiveness in controlling model behavior for multilingual tasks.

Abstract

Precise behavioral control of large language models (LLMs) is critical for complex applications. However, existing methods often incur high training costs, lack natural language controllability, or compromise semantic coherence. To bridge this gap, we propose WASD (unWeaving Actionable Sufficient Directives), a novel framework that explains model behavior by identifying sufficient neural conditions for token generation. Our method represents candidate conditions as neuron-activation predicates and iteratively searches for a minimal set that guarantees the current output under input perturbations. Experiments on SST-2 and CounterFact with the Gemma-2-2B model demonstrate that our approach produces explanations that are more stable, accurate, and concise than conventional attribution graphs. Moreover, through a case study on controlling cross-lingual output generation, we validated the practical effectiveness of WASD in controlling model behavior.

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

Dev.to

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

Reddit r/LocalLLaMA

composer 2 is just Kimi K2.5 with RL?????

Reddit r/LocalLLaMA

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Dev.to

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

Reddit r/MachineLearning

WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior

Key Points

Abstract

Related Articles

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

composer 2 is just Kimi K2.5 with RL?????

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**

LongCat-Flash-Prover: A new frontier for Open-Source Formal Reasoning.

composer 2 is just Kimi K2.5 with RL?????

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

[P] We built a Weights & Biases for Autoresearch - track steps, compare experiments, and share results

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems