Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives
arXiv cs.AI / 3/12/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The authors evaluate adjective-noun compositionality in LLMs using two complementary methods: prompt-based functional tests and analysis of internal representations.
- They find a striking discrepancy: LLMs reliably build compositional representations internally but do not consistently translate that into functional task success across models.
- The results suggest performance can diverge from internal state properties, highlighting the need for contrastive evaluation to better understand model capabilities.
- The study implies caution when equating high task performance with true compositional understanding and encourages broader evaluation strategies in LLM research.




