CADEL: A Corpus of Administrative Web Documents for Japanese Entity Linking
arXiv cs.CL / 4/1/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces CADEL, an annotated Japanese corpus specifically designed to support entity linking by mapping Japanese expressions to knowledge base entities relevant to Japan.
- It addresses a key gap in the field, noting that most entity-linking resources and evaluation materials have historically focused on English, leaving Japanese benchmarking limited.
- The authors propose a corpus design policy and include coverage of diverse linguistic expressions tied to Japan-specific entities and concepts.
- Annotation quality is validated through high inter-annotator agreement, indicating reliable labeling for training and evaluation.
- A preliminary disambiguation experiment using string matching suggests the dataset includes many non-trivial cases, positioning CADEL as a useful benchmark for more advanced entity linking systems.
Related Articles

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA

87.4% of My Agent's Decisions Run on a 0.8B Model
Dev.to

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」
Dev.to