Robots.txt Generator

Create a properly formatted robots.txt file for your website in seconds. Control which search engine crawlers can access your pages, set crawl delays, and include your sitemap reference.

Enter paths you want to block from crawlers
Enter specific paths to allow (when default is block)
robots.txt

      

What is a robots.txt File?

A robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that instructs search engine crawlers which parts of your site they are allowed to access and index. It follows the Robots Exclusion Protocol (REP), a standard that has been in use since 1994.

Every major search engine — Google, Bing, Yahoo, Yandex, and Baidu — reads and respects the robots.txt file (though they treat it as a directive for some bots and a suggestion for others).

How robots.txt Works

When a search engine crawler visits your website, the very first thing it does is check for a robots.txt file at /robots.txt. Based on the directives found, the crawler decides which URLs it is allowed to fetch. Here's how the process works:

  • Step 1: Crawler requests https://yourdomain.com/robots.txt
  • Step 2: If the file exists, the crawler reads the rules for its specific user-agent.
  • Step 3: If a URL matches a Disallow rule, the crawler skips it.
  • Step 4: If no matching Disallow rule exists, the crawler fetches the URL.
  • Step 5: The crawler follows Sitemap directives to discover additional URLs.

Complete Guide to robots.txt Directives

User-agent

Specifies which crawler the following rules apply to. User-agent: * means "all crawlers." You can also target specific bots:

  • Googlebot — Google's main web crawler
  • Bingbot — Microsoft Bing's crawler
  • Googlebot-Image — Google's image search crawler
  • Yandex — Russian search engine crawler
  • Baiduspider — Chinese search engine crawler
  • GPTBot — OpenAI's web crawler for training data
  • ClaudeBot — Anthropic's web crawler

Disallow

Blocks access to a specific URL path or directory. Examples:

  • Disallow: /admin/ — Block the entire admin directory
  • Disallow: /search — Block all URLs starting with /search
  • Disallow: /*.pdf$ — Block all PDF files (Google supports wildcards)
  • Disallow: / — Block the entire site
  • Disallow: — Allow everything (empty value = no restrictions)

Allow

Explicitly permits access to a path, overriding a broader Disallow rule. Useful for allowing specific pages within a blocked directory:

  • Disallow: /private/ + Allow: /private/public-page — Block the directory but allow one specific page.

Sitemap

Points crawlers to your XML sitemap file. This is one of the most important directives because it helps search engines discover all your pages — especially new ones that may not have inbound links yet. Always include your sitemap URL.

Crawl-delay

Requests that crawlers wait a specified number of seconds between requests. This can help reduce server load on small hosting plans. Note: Google ignores Crawl-delay (use Google Search Console's crawl rate settings instead). Bing and Yandex respect it.

What Should You Block in robots.txt?

Common paths that should typically be blocked:

  • /admin/ or /wp-admin/ — Admin panels and dashboards.
  • /cgi-bin/ — Server-side scripts directory.
  • /tmp/ or /cache/ — Temporary and cache files.
  • /search — Internal search result pages (thin, duplicate content).
  • /tag/ or /author/ — Taxonomy pages that create duplicate content.
  • /cart/ or /checkout/ — E-commerce transaction pages.
  • /api/ — API endpoints not meant for search engines.
  • Login/registration pages — No SEO value, waste of crawl budget.

What Should You NOT Block?

  • CSS and JavaScript files — Google needs to render your pages. Blocking CSS/JS prevents proper rendering and can hurt rankings.
  • Image directories — Unless you specifically don't want images in Google Image Search, don't block image folders.
  • Main content pages — Never block pages you want to appear in search results.

robots.txt vs. noindex: Key Difference

This is one of the most commonly misunderstood concepts in SEO:

  • robots.txt Disallow — Tells crawlers "don't crawl this page." But if other sites link to it, Google can still index the URL (showing it in search results with "no information is available for this page").
  • noindex meta tag — Tells search engines "don't include this page in search results." Google must crawl the page to see the noindex tag, so don't block the URL in robots.txt if you also have noindex — the crawler won't see the noindex directive.

If you want a page completely removed from Google, use the noindex meta tag and do not block it in robots.txt. Use our Meta Tag Generator to create proper noindex tags.

How to Install Your robots.txt File

  • Save the generated content as a plain text file named robots.txt.
  • Upload it to the root directory of your website (same level as your index.html).
  • Verify it's accessible at https://yourdomain.com/robots.txt.
  • Test it in Google Search Console under "Robots.txt Tester" to ensure no important pages are accidentally blocked.

Related SEO Tools

Meta Tag Generator — HTML meta tags | Open Graph Generator — Social tags | Keyword Density Checker — Content analysis

Frequently Asked Questions

What is a robots.txt file and do I need one?

A robots.txt file tells search engine crawlers which parts of your website they can and cannot access. Every website benefits from having one — even if it simply includes a sitemap reference. Without robots.txt, crawlers assume they can access everything, which wastes crawl budget on low-value pages like admin panels and search result pages.

Does robots.txt prevent pages from appearing in Google?

Not entirely. Robots.txt prevents crawling, not indexing. If other websites link to a blocked page, Google can still index the URL and show it in search results with limited information. To fully prevent indexing, use the noindex meta tag instead.

Where should I put my robots.txt file?

Place robots.txt in the root directory of your website, accessible at https://yourdomain.com/robots.txt. Crawlers only check this exact location — placing it in a subdirectory will not work.

Can robots.txt block AI crawlers?

Yes. You can block AI company crawlers by adding specific user-agent rules. For example: User-agent: GPTBot + Disallow: / blocks OpenAI's crawler. User-agent: ClaudeBot + Disallow: / blocks Anthropic's crawler. Whether these bots fully respect robots.txt depends on the company.

How often do search engines check robots.txt?

Google caches robots.txt and refreshes it approximately once per day, though the frequency can vary. After making changes, you can request a recrawl in Google Search Console to speed up the process. Changes typically take effect within 24–48 hours.

What happens if I block everything in robots.txt?

Using Disallow: / blocks all crawlers from all pages. Your site will gradually be removed from search results as Google can no longer verify the content. This is appropriate during development but should be removed before launching a public website.