Prediction of Item Difficulty for Reading Comprehension Items by Creation of Annotated Item Repository
arXiv cs.CL / 4/1/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes predicting Item Response Theory (IRT) difficulty for reading comprehension items from text content and reported percent-correct (p-value) data.
- It builds an annotated repository using U.S. standardized test reading passages and student response data across grades 3–8 (2018–2023), enriched with linguistic, passage/test, and context metadata.
- A penalized regression model using these features achieves RMSE 0.59 versus a baseline RMSE of 0.92, with a 0.77 correlation between true and predicted difficulty.
- Adding embeddings from LLM-derived models (ModernBERT, BERT, and LLaMA) yields only marginal improvements, and linguistic features alone or LLM embeddings alone can perform similarly to combined approaches.
- The authors suggest the difficulty prediction model can be used to filter and categorize reading items and plan to release the model publicly for broader stakeholder use.
Related Articles

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA

87.4% of My Agent's Decisions Run on a 0.8B Model
Dev.to

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」
Dev.to