The moment we noticed
Yesterday, while running our routine nginx log analysis, we spotted something unusual:
51.107.70.192 - [30/Mar/2026:19:42:35 +0000] "property.nwc-advisory.com"
"GET /prices/sk10-1ae HTTP/2.0" 200 4623 "-"
"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible;
ChatGPT-User/1.0; +https://openai.com/bot"
A ChatGPT-User bot fetched one of our UK property landing pages. Not GPTBot (the training crawler). Not OAI-SearchBot (the indexer). The ChatGPT-User agent -- the one that fetches pages in real time to answer a user's question.
Someone asked ChatGPT about house prices near Macclesfield (SK10 postcode), and ChatGPT pulled the answer from our page.
What's the difference between OpenAI's bots?
OpenAI operates three distinct crawlers. In the same 2-hour window, we saw all three:
| Bot | User-Agent | Purpose | Our logs |
|---|---|---|---|
| GPTBot | GPTBot/1.3 |
Training data collection | Crawling sitemaps, homepages |
| OAI-SearchBot | OAI-SearchBot/1.3 |
Search index building | Crawling UK landing pages (SK postcodes), robots.txt across all 11 domains |
| ChatGPT-User | ChatGPT-User/1.0 |
Live query answering | Fetched /prices/sk10-1ae for a real user |
The ChatGPT-User hit is the interesting one. It means our data is being served to end users in real time through ChatGPT's search feature. The user never visits our site -- they get the answer inside their chat.
The numbers
In a 2-hour window on March 30, 2026:
OpenAI bots: 22 requests (GPTBot + OAI-SearchBot + ChatGPT-User)
Googlebot: 4 requests
OpenAI is crawling our pages 5x more frequently than Google.
Here's what OAI-SearchBot was systematically crawling:
property.nwc-advisory.com GET /prices/sk1-1aa
property.nwc-advisory.com GET /prices/sk10-1ae
property.nwc-advisory.com GET /prices/sk11-0aa
property.nwc-advisory.com GET /prices/sk12-1aa
property.nwc-advisory.com GET /prices/sk13-0aa
property.nwc-advisory.com GET /prices/sk14-1aa
property.nwc-advisory.com GET /sitemap.xml
property.nwc-advisory.com GET /robots.txt
property-chi.nwc-advisory.com GET /robots.txt
property-dxb.nwc-advisory.com GET /robots.txt
property-ie.nwc-advisory.com GET /robots.txt
property-miami.nwc-advisory.com GET /robots.txt
property-phl.nwc-advisory.com GET /robots.txt
property-sg.nwc-advisory.com GET /robots.txt
property-tw.nwc-advisory.com GET /robots.txt
It's reading the sitemaps, then crawling individual landing pages. Systematically.
What we built (and why AI can read it)
We operate property comparable sales apps across 11 markets (UK, France, Singapore, NYC, Chicago, Miami, Philadelphia, Connecticut, Dubai, Ireland, Taiwan), backed by 35M+ government-source transactions.
For SEO, we generated 9,100+ static landing pages -- one per postcode/ZIP/area:
/prices/sw1a-1aa → London Westminster
/prices/sk10-1ae → Macclesfield
/prix/75001 → Paris 1er
/comps/10001 → Midtown Manhattan
Each page contains:
- Median price, average price, price range
- Price per sqft/m2 statistics
- Recent comparable sales with addresses
- Stamp duty / notary fee calculators
- FAQ schema (JSON-LD)
- Nearby area links
The key design decision: no login walls, no JavaScript-rendered content, no gated data. Every landing page is server-rendered HTML with structured data that any crawler can parse.
The page structure that AI loves
<!-- Clean semantic HTML -->
<h1>Property Prices in SK10 1AE, Macclesfield</h1>
<div class="stats-grid">
<div class="stat">
<span class="label">Median Price</span>
<span class="value">£285,000</span>
</div>
<div class="stat">
<span class="label">Price per sqft</span>
<span class="value">£198</span>
</div>
</div>
<!-- JSON-LD structured data -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "WebApplication",
"name": "Property Comparable Sales - SK10",
"description": "Recent property sales near SK10 1AE, Macclesfield",
"url": "https://property.nwc-advisory.com/prices/sk10-1ae"
}
</script>
<!-- FAQ schema for rich snippets -->
<script type="application/ld+json">
{
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is the average house price in SK10?",
"acceptedAnswer": {
"@type": "Answer",
"text": "The average house price in SK10 is £312,450..."
}
}
]
}
</script>
The SEO stack
nginx (SSL, bot-block with search engine whitelist)
│
├── /prices/{postcode} → Static HTML (2,308 UK pages)
├── /prix/{code_postal} → Static HTML (5,851 FR pages)
├── /comps/{zip} → Static HTML (per US market)
│
├── sitemap.xml → All page URLs
├── robots.txt → Allow everything except /v1/ and /api/
└── IndexNow key → Instant Bing/Yandex notification
Key technical decisions:
1. robots.txt allows AI bots
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: *
Disallow: /v1/
Disallow: /api/
Many sites block GPTBot. We deliberately allow it. The API endpoints are protected (disallowed), but the landing pages are open.
2. Server-rendered, not SPA
Each landing page is a complete HTML document. No client-side rendering, no JavaScript required to see the data. ChatGPT-User fetches the HTML and can immediately extract the statistics.
3. Canonical URLs and sitemaps
Every page has <link rel="canonical"> and is listed in the sitemap. This tells crawlers exactly which pages exist and which URL is authoritative.
4. Structured data (JSON-LD)
FAQPage schema means the questions and answers are machine-readable. When ChatGPT needs "What is the average house price in SK10?", it can extract the answer directly from the structured data.
How we detect AI bot traffic
Our visitor analysis pipeline (Python, runs against nginx logs) classifies every request:
# Bot classification
AI_BOTS = {
'ChatGPT-User': 'Live query - someone asked ChatGPT',
'GPTBot': 'Training data collection',
'OAI-SearchBot': 'Search index building',
'ClaudeBot': 'Anthropic training',
'PerplexityBot': 'Perplexity search',
'Bytespider': 'TikTok/ByteDance',
}
SEARCH_BOTS = {
'Googlebot': 'Google Search indexing',
'bingbot': 'Bing Search indexing',
'Applebot': 'Apple/Siri search',
'DuckDuckBot': 'DuckDuckGo indexing',
}
def classify_request(user_agent: str) -> str:
ua_lower = user_agent.lower()
for bot, purpose in AI_BOTS.items():
if bot.lower() in ua_lower:
return f"AI Bot: {purpose}"
for bot, purpose in SEARCH_BOTS.items():
if bot.lower() in ua_lower:
return f"Search: {purpose}"
return "Human visitor"
We run this daily against our nginx logs. The trend over the past few weeks: AI bot traffic is growing faster than search bot traffic.
What this means for developers
If you're building a data-heavy application, the old playbook was:
Build app → SEO → Rank on Google → Users find you → They visit
The new playbook is:
Build app → Structured data → AI crawls you → Users get your data via ChatGPT
(they may never visit your site)
This is both exciting and unsettling. Exciting because your data reaches users through a completely new channel. Unsettling because those users never see your UI, your brand, or your conversion funnel.
If you want AI to use your data:
- Don't block AI bots in robots.txt (unless you have a reason to)
- Server-render your pages -- ChatGPT-User can't execute JavaScript
- Use structured data (JSON-LD, schema.org) -- makes extraction trivial
- Generate landing pages for long-tail queries -- AI searches the same way humans do
- Keep data ungated -- if it's behind a login, AI can't reach it
- Maintain sitemaps -- AI bots follow them just like Googlebot
If you want to track it:
- Parse your nginx/access logs for
ChatGPT-User,GPTBot,OAI-SearchBot -
ChatGPT-User= your data is being served to real users right now -
OAI-SearchBot= your pages are being indexed for future queries -
GPTBot= your content may be used for model training
The broader picture
Google is still the primary search engine. But in our 2-hour sample, OpenAI's crawlers outnumbered Googlebot 5:1. And while Google has been crawling our 12 domains for weeks without indexing most pages (domain authority problem), ChatGPT is already serving our data to users.
For a small startup with no domain authority, no backlinks, and a brand-new domain, this is significant. The traditional SEO path (build authority -> get indexed -> rank -> get traffic) takes months to years. The AI search path (structured data -> get crawled -> get cited) can happen in weeks.
We're not saying Google doesn't matter. We're saying there's a new, parallel distribution channel, and it favors clean data over domain authority.
Our stack: Python (FastAPI) + SQLite + nginx + static HTML generators. 11 property markets, 35M+ transactions from government open data sources. Free to search at property.nwc-advisory.com.
The API is also available on RapidAPI and as an MCP server for AI agents.
What AI bot traffic are you seeing in your logs? Have you caught ChatGPT-User fetching your pages? Drop a comment -- I'm curious if this is a broader trend or we're just early.*




