Pair2Score: Pairwise-to-Absolute Transfer for LLM-Based Essay Scoring

arXiv cs.CL / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Pair2Score is a two-stage framework that converts pairwise comparison learning signals into absolute LLM-based essay scoring using parameter-efficient LLaMA adaptation.
  • In Stage 1, it trains a directional Siamese ranker on pairwise data generated from absolute trait labels, and in Stage 2 it learns an absolute predictor with transfer strategies such as warm-start and embedding-fusion.
  • Experiments on rubric-aligned Automated Essay Scoring (AES) traits—grammar, vocabulary, and syntax—show that the best transfer variant improves quadratic weighted kappa (QWK) versus an absolute-only baseline for all three traits.
  • The study finds that extending pairwise training can hurt, with a one-epoch pairwise stage transferring more reliably than longer pairwise training, and that the specific transfer configuration is more important than merely including the pairwise stage.
  • These results suggest that careful design of pairwise-to-absolute transfer can yield more accurate absolute scoring without fully abandoning pairwise objectives during training.

Abstract

Many scoring applications require absolute predictions, while pairwise comparisons can provide a simpler learning objective. We present Pair2Score, a two-stage learning framework that transfers pairwise comparisons into absolute scoring with parameter-efficient LLaMA adaptation. Stage 1 trains a directional Siamese ranker on pairwise comparisons derived from absolute trait labels; Stage 2 trains an absolute predictor using configurable transfer strategies (warm-start and embedding-fusion variants). We evaluate on rubric-aligned Automated Essay Scoring (AES) traits (grammar, vocabulary, syntax) under a five-fold protocol that co-rotates held-out fold and random seed. At the trait level, the best-performing transfer variant improves quadratic weighted kappa (QWK) over an absolute-only baseline for all three traits. However, not all transfer configurations help: a one-epoch pairwise stage transfers more reliably than extended pairwise training, and transfer configuration -- not just the inclusion of a pairwise stage -- determines whether downstream scoring benefits.