Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

arXiv cs.CL / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper addresses the high inference cost of span-based Named-Entity Recognition (NER) systems, which often enumerate many candidate spans and run marker-augmented processing for each.
It introduces SpanDec, an efficient span-based NER framework that computes span representation interactions primarily at the final transformer layer using a lightweight, span-focused decoder to avoid redundant earlier-layer computation.
SpanDec also adds a span filtering mechanism during candidate enumeration to prune unlikely spans before costly processing steps.
Experiments on multiple benchmarks show SpanDec achieves competitive accuracy while improving throughput and lowering computational cost, aiming for a better accuracy–efficiency trade-off for large-scale serving and on-device use.

Abstract

Named Entity Recognition (NER) is a key component in industrial information extraction pipelines, where systems must satisfy strict latency and throughput constraints in addition to strong accuracy. State-of-the-art NER accuracy is often achieved by span-based frameworks, which construct span representations from token encodings and classify candidate spans. However, many span-based methods enumerate large numbers of candidates and process each candidate with marker-augmented inputs, substantially increasing inference cost and limiting scalability in large-scale deployments. In this work, we propose SpanDec, an efficient span-based NER framework that targets this bottleneck. Our main insight is that span representation interactions can be computed effectively at the final transformer stage, avoiding redundant computation in earlier layers via a lightweight decoder dedicated to span representations. We further introduce a span filtering mechanism during enumeration to prune unlikely candidates before expensive processing. Across multiple benchmarks, SpanDec matches competitive span-based baselines while improving throughput and reducing computational cost, yielding a better accuracy-efficiency trade-off suitable for high-volume serving and on-device applications.