Create a properly formatted robots.txt file for your website in seconds. Control which search engine crawlers can access your pages, set crawl delays, and include your sitemap reference.
A robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that instructs search engine crawlers which parts of your site they are allowed to access and index. It follows the Robots Exclusion Protocol (REP), a standard that has been in use since 1994.
Every major search engine — Google, Bing, Yahoo, Yandex, and Baidu — reads and respects the robots.txt file (though they treat it as a directive for some bots and a suggestion for others).
When a search engine crawler visits your website, the very first thing it does is check for a robots.txt file at /robots.txt. Based on the directives found, the crawler decides which URLs it is allowed to fetch. Here's how the process works:
https://yourdomain.com/robots.txtDisallow rule, the crawler skips it.Disallow rule exists, the crawler fetches the URL.Sitemap directives to discover additional URLs.Specifies which crawler the following rules apply to. User-agent: * means "all crawlers." You can also target specific bots:
Googlebot — Google's main web crawlerBingbot — Microsoft Bing's crawlerGooglebot-Image — Google's image search crawlerYandex — Russian search engine crawlerBaiduspider — Chinese search engine crawlerGPTBot — OpenAI's web crawler for training dataClaudeBot — Anthropic's web crawlerBlocks access to a specific URL path or directory. Examples:
Disallow: /admin/ — Block the entire admin directoryDisallow: /search — Block all URLs starting with /searchDisallow: /*.pdf$ — Block all PDF files (Google supports wildcards)Disallow: / — Block the entire siteDisallow: — Allow everything (empty value = no restrictions)Explicitly permits access to a path, overriding a broader Disallow rule. Useful for allowing specific pages within a blocked directory:
Disallow: /private/ + Allow: /private/public-page — Block the directory but allow one specific page.Points crawlers to your XML sitemap file. This is one of the most important directives because it helps search engines discover all your pages — especially new ones that may not have inbound links yet. Always include your sitemap URL.
Requests that crawlers wait a specified number of seconds between requests. This can help reduce server load on small hosting plans. Note: Google ignores Crawl-delay (use Google Search Console's crawl rate settings instead). Bing and Yandex respect it.
Common paths that should typically be blocked:
This is one of the most commonly misunderstood concepts in SEO:
If you want a page completely removed from Google, use the noindex meta tag and do not block it in robots.txt. Use our Meta Tag Generator to create proper noindex tags.
robots.txt.https://yourdomain.com/robots.txt.Meta Tag Generator — HTML meta tags | Open Graph Generator — Social tags | Keyword Density Checker — Content analysis
A robots.txt file tells search engine crawlers which parts of your website they can and cannot access. Every website benefits from having one — even if it simply includes a sitemap reference. Without robots.txt, crawlers assume they can access everything, which wastes crawl budget on low-value pages like admin panels and search result pages.
Not entirely. Robots.txt prevents crawling, not indexing. If other websites link to a blocked page, Google can still index the URL and show it in search results with limited information. To fully prevent indexing, use the noindex meta tag instead.
Place robots.txt in the root directory of your website, accessible at https://yourdomain.com/robots.txt. Crawlers only check this exact location — placing it in a subdirectory will not work.
Yes. You can block AI company crawlers by adding specific user-agent rules. For example: User-agent: GPTBot + Disallow: / blocks OpenAI's crawler. User-agent: ClaudeBot + Disallow: / blocks Anthropic's crawler. Whether these bots fully respect robots.txt depends on the company.
Google caches robots.txt and refreshes it approximately once per day, though the frequency can vary. After making changes, you can request a recrawl in Google Search Console to speed up the process. Changes typically take effect within 24–48 hours.
Using Disallow: / blocks all crawlers from all pages. Your site will gradually be removed from search results as Google can no longer verify the content. This is appropriate during development but should be removed before launching a public website.