Rewrite the News: Tracing Editorial Reuse Across News Agencies
arXiv cs.CL / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies sentence-level cross-lingual text reuse in multilingual journalism by detecting reused sentences without requiring full translations.
- Using weak supervision and publication timestamps, it traces the earliest likely foreign source for each reused English sentence across 15 foreign agencies in seven languages.
- Analysis of 1,037 STA and 237,551 FA articles finds substantial reuse: 52% of STA articles contain reused sentences, while reuse appears in 1.6% of FA articles.
- The study shows that editorial reuse is mostly non-literal, often involving paraphrase and compositional reuse, and that reused material is more common in the middle and end of articles than in leads.
- The authors release a dataset and code for automated pre-selection to reduce information overload in journalistic workflows.
Related Articles

Black Hat Asia
AI Business

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register

Paperclip: Công Cụ Miễn Phí Biến AI Thành Đội Phát Triển Phần Mềm
Dev.to
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA