GPTBot

OpenAI's training crawler. Allowing it lets your content be used to train future GPT models.

Operator	OpenAI
Powers	Training data for OpenAI's GPT models
Purpose	Model training
User-agent token	`GPTBot`
Respects robots.txt	Yes

GPTBot is the crawler OpenAI uses to gather publicly available web content for training its foundation models. It is distinct from the bots OpenAI uses to answer live questions — GPTBot's job is bulk collection, not real-time retrieval.

GPTBot obeys robots.txt. If you disallow it, OpenAI states the pages will be excluded from future training sets, though content already collected or available through third-party datasets (such as Common Crawl) is unaffected.

Full user-agent string

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

Allow GPTBot

Your content can influence how GPT models describe your brand, products, and domain — useful if you want to be represented accurately in ChatGPT's underlying knowledge.

User-agent: GPTBot
Allow: /

Block GPTBot

You don't want your content used as training data, e.g. for licensing, copyright, or competitive reasons.

User-agent: GPTBot
Disallow: /

Can GPTBot read your page right now?

Test any URL and see exactly what AI crawlers receive.

Check my site

GPTBot

Full user-agent string

Allow GPTBot

Block GPTBot

Can GPTBot read your page right now?

Related crawlers

ClaudeBot

CCBot

Bytespider