The COTe score: A decomposable framework for evaluating Document Layout Analysis models

arXiv cs.CV / 3/16/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper announces the Structural Semantic Unit (SSU) and the COTe score, a decomposable metric designed for evaluating document layout analysis beyond traditional IoU, F1, and mAP.
It shows that COTe captures semantic structure, reveals distinct failure modes such as semantic boundary breaches or repeated parsing of the same region, and is more informative than traditional metrics.
The authors report that COTe reduces the interpretation-performance gap by up to 76% relative to F1 on three DLA datasets.
Importantly, COTe's granularity robustness holds even without explicit SSU labeling, lowering barriers to adoption.
They also release an SSU-labeled dataset and a Python library to apply COTe in DLA projects.

Abstract

Document Layout analysis (DLA), is the process by which a page is parsed into meaningful elements, often using machine learning models. Typically, the quality of a model is judged using general object detection metrics such as IoU, F1 or mAP. However, these metrics are designed for images that are 2D projections of 3D space, not for the natively 2D imagery of printed media. This discrepancy can result in misleading or uninformative interpretation of model performance by the metrics. To encourage more robust, comparable, and nuanced DLA, we introduce: The Structural Semantic Unit (SSU) a relational labelling approach that shifts the focus from the physical to the semantic structure of the content; and the Coverage, Overlap, Trespass, and Excess (COTe) score, a decomposable metric for measuring page parsing quality. We demonstrate the value of these methods through case studies and by evaluating 5 common DLA models on 3 DLA datasets. We show that the COTe score is more informative than traditional metrics and reveals distinct failure modes across models, such as breaching semantic boundaries or repeatedly parsing the same region. In addition, the COTe score reduces the interpretation-performance gap by up to 76% relative to the F1. Notably, we find that the COTe's granularity robustness largely holds even without explicit SSU labelling, lowering the barriers to entry for using the system. Finally, we release an SSU labelled dataset and a Python library for applying COTe in DLA projects.

Astral to Join OpenAI

Dev.to

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic

Dev.to

Your AI coding agent is installing vulnerable packages. I built the fix.

Dev.to

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

The COTe score: A decomposable framework for evaluating Document Layout Analysis models

Key Points

Abstract

Related Articles

Astral to Join OpenAI

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic

Your AI coding agent is installing vulnerable packages. I built the fix.

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer