Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives

arXiv cs.AI / 3/12/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The authors evaluate adjective-noun compositionality in LLMs using two complementary methods: prompt-based functional tests and analysis of internal representations.
They find a striking discrepancy: LLMs reliably build compositional representations internally but do not consistently translate that into functional task success across models.
The results suggest performance can diverge from internal state properties, highlighting the need for contrastive evaluation to better understand model capabilities.
The study implies caution when equating high task performance with true compositional understanding and encourages broader evaluation strategies in LLM research.

Abstract

Compositionality is considered central to language abilities. As performant language systems, how do large language models (LLMs) do on compositional tasks? We evaluate adjective-noun compositionality in LLMs using two complementary setups: prompt-based functional assessment and a representational analysis of internal model states. Our results reveal a striking divergence between task performance and internal states. While LLMs reliably develop compositional representations, they fail to translate consistently into functional task success across model variants. Consequently, we highlight the importance of contrastive evaluation for obtaining a more complete understanding of model capabilities.

MCP Is Quietly Replacing APIs — And Most Developers Haven't Noticed Yet

Dev.to

I Built a Self-Healing AI Trading Bot That Learns From Every Failure

Dev.to

Stop Guessing Your API Costs: Track LLM Tokens in Real Time

Dev.to

We are building PixelRooms! The marketplace of AI teams for thepixeloffice.ai

Dev.to

Every real estate agent tool worth your time in 2026, ranked and rated

Dev.to

Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives

Key Points

Abstract

Related Articles

MCP Is Quietly Replacing APIs — And Most Developers Haven't Noticed Yet

I Built a Self-Healing AI Trading Bot That Learns From Every Failure

Stop Guessing Your API Costs: Track LLM Tokens in Real Time

We are building PixelRooms! The marketplace of AI teams for thepixeloffice.ai

Every real estate agent tool worth your time in 2026, ranked and rated

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer