Embedding Enhancement via Fine-Tuned Language Models for Learner-Item Cognitive Modeling

arXiv cs.CL / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how learner-item cognitive modeling for online intelligent education can be improved by using language models to enhance embedding representations for cognitive diagnosis (CD).
  • It identifies two core problems in prior work: objective mismatch between LM training and CD modeling (creating a feature-space distribution gap) and the lack of a unified framework to integrate textual embeddings across different CD tasks.
  • It proposes EduEmbed, a two-stage framework that fine-tunes LMs using role-specific representations and an interaction diagnoser, then uses a textual adapter to extract task-relevant semantics and combine them with existing cognitive modeling approaches.
  • Experiments on four cognitive diagnosis tasks plus a computerized adaptive testing (CAT) task show robust performance gains, with additional analysis clarifying how semantic information affects generalization across tasks.

Abstract

Learner-item cognitive modeling plays a central role in the web-based online intelligent education system by enabling cognitive diagnosis (CD) across diverse online educational scenarios. Although ID embedding remains the mainstream approach in cognitive modeling due to its effectiveness and flexibility, recent advances in language models (LMs) have introduced new possibilities for incorporating rich semantic representations to enhance CD performance. This highlights the need for a comprehensive analysis of how LMs enhance embeddings through semantic integration across mainstream CD tasks. This paper identifies two key challenges in fully leveraging LMs in existing work: Misalignment between the training objectives of LMs and CD models creates a distribution gap in feature spaces; A unified framework is essential for integrating textual embeddings across varied CD tasks while preserving the strengths of existing cognitive modeling paradigms to ensure the robustness of embedding enhancement. To address these challenges, this paper introduces EduEmbed, a unified embedding enhancement framework that leverages fine-tuned LMs to enrich learner-item cognitive modeling across diverse CD tasks. EduEmbed operates in two stages. In the first stage, we fine-tune LMs based on role-specific representations and an interaction diagnoser to bridge the semantic gap of CD models. In the second stage, we employ a textual adapter to extract task-relevant semantics and integrate them with existing modeling paradigms to improve generalization. We evaluate the proposed framework on four CD tasks and computerized adaptive testing (CAT) task, achieving robust performance. Further analysis reveals the impact of semantic information across diverse tasks, offering key insights for future research on the application of LMs in CD for online intelligent education systems.