Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning

arXiv cs.AI / 4/6/2026

📰 News

共有:

Key Points

PRISM is an RL framework that creates a discrete set of causally validated “concepts” by clustering encoder features, and then uses these concepts as an interpretable transfer interface between agents trained with different algorithms.
The authors use causal interventions to show that forcing or overriding concept assignments changes chosen actions in 69.4% of tested cases (p=8.6×10^-86), supporting the claim that concepts drive behavior rather than just correlate with it.
Concept roles are shown to be uneven: the most frequently used concept causes only a small win-rate drop when ablated, while a less frequent concept can substantially collapse performance, indicating strategy-critical but low-usage concepts.
By aligning concepts across agents via optimal bipartite matching, PRISM enables zero-shot strategy transfer (e.g., on Go 7×7 successful transfer pairs reach ~69.5%±3.2% and 76.4%±3.4% win rates vs a standard engine, far above random and misaligned baselines).
The approach appears to depend on domains where strategic state is naturally discrete: on Atari Breakout, the same pipeline yields bottleneck policies around random-agent performance, suggesting structural limits on when transfer will work.
categories': ['models-research', 'ideas-deep-analysis', 'signals-early-trends'], 'impact_score': 7, 'is_ai_related': true, 'personas': ['engineer', 'pm', 'business'], 'is_opinion': true}

Abstract

We present PRISM (Policy Reuse via Interpretable Strategy Mapping), a framework that grounds reinforcement learning agents' decisions in discrete, causally validated concepts and uses those concepts as a zero-shot transfer interface between agents trained with different algorithms. PRISM clusters each agent's encoder features into

K

concepts via K-means. Causal intervention establishes that these concepts directly drive - not merely correlate with - agent behavior: overriding concept assignments changes the selected action in 69.4% of interventions (

p = 8.6 \times 10^{-86}

, 2500 interventions). Concept importance and usage frequency are dissociated: the most-used concept (C47, 33.0% frequency) causes only a 9.4% win-rate drop when ablated, while ablating C16 (15.4% frequency) collapses win rate from 100% to 51.8%. Because concepts causally encode strategy, aligning them via optimal bipartite matching transfers strategic knowledge zero-shot. On Go~7

\times

7 with three independently trained agents, concept transfer achieves 69.5%

\pm

3.2% and 76.4%

\pm

3.4% win rate against a standard engine across the two successful transfer pairs (10 seeds), compared to 3.5% for a random agent and 9.2% without alignment. Transfer succeeds when the source policy is strong; geometric alignment quality predicts nothing (

R^2 \approx 0

). The framework is scoped to domains where strategic state is naturally discrete: the identical pipeline on Atari Breakout yields bottleneck policies at random-agent performance, confirming that the Go results reflect a structural property of the domain.

Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning

Key Points

Abstract

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer