GPTBot
OpenAI's training crawler. Allowing it lets your content be used to train future GPT models.
| Operator | OpenAI |
|---|---|
| Powers | Training data for OpenAI's GPT models |
| Purpose | Model training |
| User-agent token | GPTBot |
| Respects robots.txt | Yes |
GPTBot is the crawler OpenAI uses to gather publicly available web content for training its foundation models. It is distinct from the bots OpenAI uses to answer live questions — GPTBot's job is bulk collection, not real-time retrieval.
GPTBot obeys robots.txt. If you disallow it, OpenAI states the pages will be excluded from future training sets, though content already collected or available through third-party datasets (such as Common Crawl) is unaffected.
Full user-agent string
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)
Allow GPTBot
Your content can influence how GPT models describe your brand, products, and domain — useful if you want to be represented accurately in ChatGPT's underlying knowledge.
User-agent: GPTBot Allow: /
Block GPTBot
You don't want your content used as training data, e.g. for licensing, copyright, or competitive reasons.
User-agent: GPTBot Disallow: /
Can GPTBot read your page right now?
Test any URL and see exactly what AI crawlers receive.