Modeling the human lexicon under temperature variations: linguistic factors, diversity and typicality in LLM word associations

arXiv cs.CL / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study compares human and LLM-generated word associations using the SWOW dataset and three LLMs (Mistral-7B, Llama-3.1-8B, Qwen-2.5-32B) across multiple temperature settings.
It examines how lexical factors such as word frequency and concreteness influence cue-response pairs in both humans and models.
Results show all models mirror human trends for frequency and concreteness but differ in response variability and typicality, with larger models emitting highly typical but less variable responses.
Temperature settings modulate this trade-off by increasing variability while reducing typicality, highlighting how sampling temperature shapes lexical representations.
The work underscores the importance of considering model size and temperature when probing LLM lexical representations and comparing to human data.

Abstract

Large language models (LLMs) achieve impressive results in terms of fluency in text generation, yet the nature of their linguistic knowledge - in particular the human-likeness of their internal lexicon - remains uncertain. This study compares human and LLM-generated word associations to evaluate how accurately models capture human lexical patterns. Using English cue-response pairs from the SWOW dataset and newly generated associations from three LLMs (Mistral-7B, Llama-3.1-8B, and Qwen-2.5-32B) across multiple temperature settings, we examine (i) the influence of lexical factors such as word frequency and concreteness on cue-response pairs, and (ii) the variability and typicality of LLM responses compared to human responses. Results show that all models mirror human trends for frequency and concreteness but differ in response variability and typicality. Larger models such as Qwen tend to emulate a single "prototypical" human participant, generating highly typical but minimally variable responses, while smaller models such as Mistral and Llama produce more variable yet less typical responses. Temperature settings further influence this trade-off, with higher values increasing variability but decreasing typicality. These findings highlight both the similarities and differences between human and LLM lexicons, emphasizing the need to account for model size and temperature when probing LLM lexical representations.

ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**

Qiita

Complete Guide: How To Make Money With Ai

Dev.to

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Dev.to

Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again

Dev.to

How We Used Hindsight Memory to Build an AI That Knows Your Weaknesses

Dev.to

Modeling the human lexicon under temperature variations: linguistic factors, diversity and typicality in LLM word associations

Key Points

Abstract

Related Articles

ADICはどの種類の革新なのか ―― ドリフト監査デモで見る「事後説明」から「通過条件」への移行**

Complete Guide: How To Make Money With Ai

Built a small free iOS app to reduce LLM answer uncertainty with multiple models

Without Valid Data, AI Transformation Is Flying Blind – Why We Need to “Grasp” Work Again

How We Used Hindsight Memory to Build an AI That Knows Your Weaknesses

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer