Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs

arXiv cs.CL / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper uses causal interventions in Transformer language models to study how syntactic “islands” block extraction, showing that models reproduce human acceptability judgments that vary by lexical content.
It finds that extraction from coordination islands uses the same filler-gap mechanisms as canonical wh-dependencies, but those mechanisms are selectively inhibited to different degrees.
By isolating functionally relevant subspaces across Transformer blocks, attention modules, and MLPs, the authors provide mechanistic evidence linking representational structure to syntactic constraints.
A projection of a large unrelated-text corpus onto the causal subspaces leads to a new hypothesis that the conjunction “and” is encoded differently in extractable versus non-extractable constructions (relational vs. purely conjunctive uses).
Overall, the work demonstrates how mechanistic interpretability techniques can generate testable linguistic hypotheses about representation and processing in Transformer LMs.

Abstract

We show how causal interventions in Transformer models provide insights into English syntax by focusing on a long-standing challenge for syntactic theory: syntactic islands. Extraction from coordinated verb phrases is often degraded, yet acceptability varies gradiently with lexical content (e.g., "I know what he hates art and loves" vs. "I know what he looked down and saw"). We show that modern Transformer language models replicate human judgments across this gradient. Using causal interventions that isolate functionally relevant subspaces in Transformer blocks, attention modules, and MLPs, we demonstrate that extraction from coordination islands engages the same filler-gap mechanisms as canonical wh-dependencies, but that these mechanisms are selectively blocked to varying degrees. By projecting a large corpus of unrelated text onto these causally identified subspaces, we derive a novel linguistic hypothesis: the conjunction "and" is represented differently in extractable versus non-extractable constructions, corresponding to expressions encoding relational dependencies versus purely conjunctive uses. These results illustrate how mechanistic interpretability can inform syntax, generating new hypotheses about linguistic representation and processing.

Introducing Claude Opus 4.7

Anthropic News

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability

Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp

Dev.to

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators

Dev.to

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs

Dev.to

Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs

Key Points

Abstract

Related Articles

Introducing Claude Opus 4.7

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer