The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

arXiv stat.ML / 4/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes that “geometric stability” (the consistency of a representation’s pairwise distance structure) provides a shared geometric basis for two key LLM needs: predicting steerability under targeted control and detecting internal drift over time.
Supervised versions of the Shesha method, which measure task-aligned geometric stability, predict linear steerability with near-perfect accuracy across multiple embedding models and NLP tasks, outperforming signals that rely only on class separability.
A key finding is that unsupervised geometric stability does not predict steerability for real-world tasks, indicating that task alignment is essential for controllability prediction.
For drift detection, however, unsupervised geometric stability is highly effective, showing much larger sensitivity than CKA, delivering earlier warnings in most models, and reducing false alarms compared with Procrustes during post-training alignment.
The authors argue that combining supervised (pre-deployment controllability checks) and unsupervised (post-deployment monitoring) geometric stability yields complementary diagnostics for safer LLM deployment.

Abstract

Reliable deployment of language models requires two capabilities that appear distinct but share a common geometric foundation: predicting whether a model will accept targeted behavioral control, and detecting when its internal structure degrades. We show that geometric stability, the consistency of a representation's pairwise distance structure, addresses both. Supervised Shesha variants that measure task-aligned geometric stability predict linear steerability with near-perfect accuracy (

\rho = 0.89

0.97

) across 35-69 embedding models and three NLP tasks, capturing unique variance beyond class separability (partial

\rho = 0.62

0.76

). A critical dissociation emerges: unsupervised stability fails entirely for steering on real-world tasks (

\rho \approx 0.10

), revealing that task alignment is essential for controllability prediction. However, unsupervised stability excels at drift detection, measuring nearly

2\times

greater geometric change than CKA during post-training alignment (up to

5.23\times

in Llama) while providing earlier warning in 73\% of models and maintaining a

6\times

lower false alarm rate than Procrustes. Together, supervised and unsupervised stability form complementary diagnostics for the LLM deployment lifecycle: one for pre-deployment controllability assessment, the other for post-deployment monitoring.

Chinese firms face pressure on AI investments as US peers’ spending keeps soaring

SCMP Tech

Building a Local AI Agent (Part 2): Six UX and UI Design Challenges

Dev.to

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay

Dev.to

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Dev.to

Your first business opportunity in 3 commands: /register_directory in @biznode_bot, wait for matches, then /my_pulse to view...

Dev.to

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

Key Points

Abstract

Related Articles

Chinese firms face pressure on AI investments as US peers’ spending keeps soaring

Building a Local AI Agent (Part 2): Six UX and UI Design Challenges

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Your first business opportunity in 3 commands: /register_directory in @biznode_bot, wait for matches, then /my_pulse to view...

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer