Robots.txt tester

Enter a domain and we fetch its robots.txt, parse every rule group and let you test whether a specific path is allowed for a chosen crawler. Our matcher follows Google's real longest-match logic, not the loose default of older libraries.

Enter a domain such as example.com. We always fetch https://example.com/robots.txt.

How the robots.txt tester works

robots.txt is a plain text file at the root of a site that tells crawlers which paths they may or may not request. Each group starts with one or more User-agent lines followed by Allow and Disallow rules. This tool downloads the file, parses every group and evaluates your path exactly the way a real crawler would.

The key detail most testers get wrong is precedence. Google does not use the first matching rule; it uses the most specific one, meaning the longest matching path pattern wins, and Allow wins when an Allow and a Disallow are the same length. Our matcher implements this, plus the * wildcard and the $ end-of-URL anchor, so the verdict matches Googlebot.

Blocking a URL in robots.txt only stops crawling, not indexing. A blocked page can still appear in search results without a snippet if other pages link to it. To keep a page out of the index, allow crawling and use a noindex meta tag or header instead.

Common robots.txt mistakes

Frequently asked questions

Is this robots.txt tester free?

Yes, completely free and no account needed. Enter a domain and an optional path and crawler, and you get the parsed rules plus an allowed or blocked verdict instantly.

Does it match how Googlebot reads robots.txt?

Yes. We implement Google's longest-match precedence, the * wildcard and the $ anchor, where the most specific rule wins and Allow breaks ties. Many libraries use first-match and give the wrong answer.

Does blocking a URL in robots.txt remove it from Google?

No. robots.txt only controls crawling. A disallowed URL can still be indexed without a snippet if it is linked from elsewhere. Use a noindex tag or header to remove a page from the index.

Where must robots.txt be located?

At the root of the host, at /robots.txt over HTTPS. A robots.txt in a subfolder is ignored. Each subdomain needs its own file.

Monitor more than just robots.txt

robots.txt is one piece of a healthy site. ePulz.io watches uptime, SSL, DNS and domain expiry around the clock and alerts you within seconds when something breaks.

Start monitoring free

About this tool

The robots.txt tester is one of several free network and SEO tools from ePulz.io. It fetches and parses any site's robots.txt and evaluates crawl permission using Google's real matching rules.