DeGenTWeb: A First Look at LLM-dominant Websites
arXiv cs.AI / 5/4/2026
💬 OpinionSignals & Early TrendsModels & Research
Key Points
- The paper argues that earlier claims about LLM-generated content “taking over” the web lack representative sampling and clear, transparent methodology.
- It introduces DeGenTWeb, a system for systematically identifying LLM-dominant websites—sites where content is largely produced by LLMs with minimal human input.
- The authors adapt LLM-text detectors for use on web pages and aggregate results across multiple pages to categorize websites more accurately.
- Using DeGenTWeb, they find LLM-dominant sites are highly prevalent in Common Crawl data and in Bing search results, and their share increases over time.
- They conclude that accurately identifying such sites will likely become increasingly difficult as newer LLMs improve at producing text that evades detectors.
Related Articles

ALM on Power Platform: ADO + GitHub, the best of both worlds
Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️
Dev.to
Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?
Dev.to
Open source models are going to be the future on Cursor, OpenCode etc.
Reddit r/LocalLLaMA
Claude Desktop + NFTs: MCP Tools for AI Agent NFT Management
Dev.to