Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis
arXiv cs.CL / 4/1/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates whether and where large language models encode structured semantic relations—synonymy, antonymy, hypernymy, and hyponymy—across models of increasing scale (Pythia-70M, GPT-2, and Llama 3.1 8B).
- Using linear probing together with mechanistic interpretability methods (sparse autoencoders and activation patching), the authors map the layer/pathway locations and the specific features that contribute to representing these relations.
- Results show a directional asymmetry in hierarchical relations: hypernymy is redundantly represented and is difficult to suppress, while hyponymy depends on compact features that are more vulnerable to ablation.
- The relation signals are described as diffuse yet stable, typically peaking in mid-layers and appearing stronger in post-residual/MLP pathways than in attention.
- Probe-level causal effects vary with model capacity: SAE-guided patching produces reliable shifts on Llama 3.1 but weaker or unstable effects on smaller models, with antonymy easiest and synonymy hardest to elicit causally.
Related Articles

Day 6: I Stopped Writing Articles and Started Hunting Bounties
Dev.to

Early Detection of Breast Cancer using SVM Classifier Technique
Dev.to

I Started Writing for Others. It Changed How I Learn.
Dev.to

10 лучших курсов по prompt engineering бесплатно: секреты успеха пошагово!
Dev.to

Prompt Engineering at Workplace: How I Used Amazon Q Developer to Boost Team Productivity by 30%
Dev.to