Decision-Level Ordinal Modeling for Multimodal Essay Scoring with Large Language Models

arXiv cs.CL / 3/17/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes Decision-Level Ordinal Modeling (DLOM) that treats AES scoring as explicit ordinal decisions by using the language model head to extract score-wise logits for predefined score tokens, addressing limitations of autoregressive generation in multimodal AES.
It adds DLOM-GF for multimodal AES, a gated fusion module that adaptively combines textual and visual score logits, and DLOM-DA for text-only AES with a distance-aware regularization term to reflect ordinal distances.
Experiments on the multimodal EssayJudge dataset show DLOM improves over a generation-based SFT baseline across traits, with DLOM-GF providing further gains when modality relevance is heterogeneous; on ASAP/ASAP++ benchmarks, DLOM remains effective without visuals, and DLOM-DA further improves performance and outperforms strong baselines.
The work enables direct optimization in score space, offering a more interpretable and robust framework for ordinal rubric scoring in LLM-based AES across both multimodal and text-only settings.

Abstract

Automated essay scoring (AES) predicts multiple rubric-defined trait scores for each essay, where each trait follows an ordered discrete rating scale. Most LLM-based AES methods cast scoring as autoregressive token generation and obtain the final score via decoding and parsing, making the decision implicit. This formulation is particularly sensitive in multimodal AES, where the usefulness of visual inputs varies across essays and traits. To address these limitations, we propose Decision-Level Ordinal Modeling (DLOM), which makes scoring an explicit ordinal decision by reusing the language model head to extract score-wise logits on predefined score tokens, enabling direct optimization and analysis in the score space. For multimodal AES, DLOM-GF introduces a gated fusion module that adaptively combines textual and multimodal score logits. For text-only AES, DLOM-DA adds a distance-aware regularization term to better reflect ordinal distances. Experiments on the multimodal EssayJudge dataset show that DLOM improves over a generation-based SFT baseline across scoring traits, and DLOM-GF yields further gains when modality relevance is heterogeneous. On the text-only ASAP/ASAP++ benchmarks, DLOM remains effective without visual inputs, and DLOM-DA further improves performance and outperforms strong representative baselines.

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Dev.to

Interesting loop

Reddit r/LocalLLaMA

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Reddit r/LocalLLaMA

A supervisor or "manager" Al agent is the wrong way to control Al

Reddit r/artificial

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

Reddit r/LocalLLaMA

Decision-Level Ordinal Modeling for Multimodal Essay Scoring with Large Language Models

Key Points

Abstract

Related Articles

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Interesting loop

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

A supervisor or "manager" Al agent is the wrong way to control Al

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer