Cloudflare Lets Site Owners Block AI Scrapers from Dashboard

AI Navigate Editorial·2026.06.25·6 min read

Consent infrastructure for AI training data is now hardening at the network layer. Cloudflare launched a dashboard toggle to allow or block AI scrapers site-wide — and on the same day, OpenAI signed a content licensing deal with Getty Images.

Context

Blocking AI Scrapers Used to Require Technical Staff

"We don't want our content used for AI training" — site owners who felt that way had very few tools at their disposal. Writing Disallow directives in robots.txt depended on AI companies actually respecting those rules. Blocking IP ranges or user agents via WAF rules was effective, but required a dedicated engineer to set up.

As a result, small media outlets and blog operators had no practical way to deny AI scrapers access to their content. Copyright and data consent debates played out in courtrooms while the infrastructure tools to enforce intent remained out of reach for most publishers.

How It Works

Control in 3 Steps from the Dashboard

Flip the Toggle in the Dashboard

An "AI Scrapers" section has been added to the Cloudflare dashboard. Site owners can allow or block AI crawlers site-wide with a single toggle. No engineers needed — it's a no-code operation, which is the biggest change.

Cloudflare Filters at the Network Edge

Once the setting is applied, Cloudflare's edge network automatically blocks or allows known AI crawlers (GPTBot, ClaudeBot, CCBot, and others) based on user-agent identifiers and IP ranges. Because this operates independently of robots.txt, it also works against undeclared crawlers.

A Shift Toward Licensing Deals Like Getty x OpenAI

On the same day, OpenAI and Getty Images signed a content licensing agreement — a telling development. As infrastructure-level blocking becomes easier, AI companies gain a stronger incentive to negotiate and license content directly from publishers rather than scrape it.

Impact

Consent Infrastructure Is Hardening at the Network Layer

The direction this signals is clear: consent infrastructure for AI training data is standardizing at the network layer. With network operators like Cloudflare acting as gatekeepers for AI training data flows, content owners gain a more reliable way to enforce their preferences.

From the AI side, this creates an environment where licensed, high-quality data becomes easier to procure. Entering formal licensing agreements reduces legal risk compared to gray-zone scraping. The Getty deal could well become a template for what follows.

Open questions remain: how existing trained-on data is handled, and whether other CDN and hosting providers follow Cloudflare's lead. But the social norm that AI crawling without consent is unacceptable is now being reinforced from the infrastructure layer upward.