CREG: Compass Relational Evidence for Interpreting Spatial Reasoning in Vision-Language Models
arXiv cs.CV / 3/24/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces CREG (Compass Relational Evidence Graph), a training-free interpretability method that maps multi-layer contrastive Grad×Act attributions into a reference-centered polar (compass-sector) coordinate system to identify inferred directional relations in vision-language models.
- It evaluates directional explanations using three new metrics—Direction Alignment Error (DAE), Edge Accuracy (EA), and Causal Occlusion Score (COS)—to measure how well directional evidence matches the intended geometry and whether it is causally faithful.
- Experiments on Qwen2-VL-7B show consistent improvements over standard attribution baselines, including a 16.1° reduction in angular error versus attention rollout and a +0.120 improvement in EA on COCO-Pairs.
- The causal occlusion tests on 540 samples yield COS values ≥ +0.42, supporting the faithfulness of the directional explanations.
- Results are weaker on Qwen2-VL-2B, suggesting CREG benefits from more structured spatial representations that become clearer at larger model scales.
Related Articles
Santa Augmentcode Intent Ep.6
Dev.to

Your Agent Hired Another Agent. The Output Was Garbage. The Money's Gone.
Dev.to
ClawRouter vs TeamoRouter: one requires a crypto wallet, one doesn't
Dev.to
Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Palantir’s billionaire CEO says only two kinds of people will succeed in the AI era: trade workers — ‘or you’re neurodivergent’
Reddit r/artificial