Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction
arXiv cs.CL / 4/14/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The study tests whether BERT token embeddings encode fictional narrative semantics—time, space, causality, and character—using a token-level probing setup with LLM-assisted annotation.
- A linear probe on BERT embeddings reaches 94% accuracy and a macro-average recall of 0.83 (with balanced class weighting), outperforming a variance-matched random-embedding baseline (47%).
- Performance is weaker for rarer narrative dimensions, especially space (recall = 0.66) and causality (recall = 0.75), indicating uneven representation strength across dimensions.
- The analysis finds “Boundary Leakage,” where rare dimensions are often misclassified as “others,” and unsupervised clustering aligns near-randomly with the predefined categories (ARI = 0.081), implying the dimensions are not sharply discretely separable.
- The authors propose future work such as POS-only baselines, expanded datasets, and layer-wise probing to separate syntactic effects from narrative encoding.
Related Articles

Emerging Properties in Unified Multimodal Pretraining
Dev.to

Build a Profit-Generating AI Agent with LangChain: A Step-by-Step Tutorial
Dev.to

Open source AI is winning — but here's why I still pay $2/month for Claude API
Dev.to

AI Agents Need Real Email Infrastructure
Dev.to

Beyond the Prompt: Why AI Agents Are Hitting the Deployment Wall
Dev.to