Decoupling Scores and Text: The Politeness Principle in Peer Review
arXiv cs.LG / 4/17/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies why authors misread peer review by comparing predictive power from numerical scores versus free-text reviews across 30,000+ ICLR submissions (2021–2025).
- It finds a clear accuracy gap: score-based models reach about 91% accuracy, while text-based models achieve around 81% even when using large language models, showing text is less reliable.
- For cases where score-based models fail, the authors observe score distributions with high kurtosis and negative skewness, suggesting that extreme low scores (not the mean) heavily drive rejection.
- From a sentiment perspective, the authors attribute the weaker text signal to the Politeness Principle: reviews of rejected papers tend to include more positive than negative sentiment terms, which can obscure the rejection cue.


![[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D1200%2Cheight%3D627%2Cfit%3Dcover%2Cgravity%3Dauto%2Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Flu4b6ttuhur71z5gemm0.png&w=3840&q=75)
