Sona

GPTBot

OpenAI's training crawler. Allowing it lets your content be used to train future GPT models.

OperatorOpenAI
PowersTraining data for OpenAI's GPT models
PurposeModel training
User-agent tokenGPTBot
Respects robots.txtYes

GPTBot is the crawler OpenAI uses to gather publicly available web content for training its foundation models. It is distinct from the bots OpenAI uses to answer live questions — GPTBot's job is bulk collection, not real-time retrieval.

GPTBot obeys robots.txt. If you disallow it, OpenAI states the pages will be excluded from future training sets, though content already collected or available through third-party datasets (such as Common Crawl) is unaffected.

Full user-agent string

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

Allow GPTBot

Your content can influence how GPT models describe your brand, products, and domain — useful if you want to be represented accurately in ChatGPT's underlying knowledge.

User-agent: GPTBot
Allow: /

Block GPTBot

You don't want your content used as training data, e.g. for licensing, copyright, or competitive reasons.

User-agent: GPTBot
Disallow: /

Can GPTBot read your page right now?

Test any URL and see exactly what AI crawlers receive.

Check my site