Robots.txt Tester

Paste your robots.txt and test whether specific URLs are allowed or blocked for any crawler.

Robots.txt: Control Which Pages Google Crawls

The robots.txt file is the first thing search engine crawlers read when they visit your site — it tells them which pages or directories they are allowed to crawl and index. Incorrectly configured robots.txt can accidentally block your entire website from Google (a critical SEO error) or allow crawling of pages you want to keep private (admin panels, staging areas, duplicate content pages). Our tester lets you validate rules before deploying.

Frequently Asked Questions

How does robots.txt affect SEO?
robots.txt controls crawl access but does NOT control indexing directly. Blocking a URL in robots.txt prevents Googlebot from crawling it, but Google can still index that URL if it finds links pointing to it (from other pages). To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header — not robots.txt.
What is the correct robots.txt syntax to block a directory?
To block a directory for all bots: User-agent: * → Disallow: /admin/ (the trailing slash blocks the directory and all sub-pages). To block a specific file: Disallow: /private/config.php. To allow everything: Disallow: (empty value allows all). Order matters: more specific rules take precedence over general ones.
Can I block Googlebot but allow other crawlers?
Yes. Use specific User-agent values: User-agent: Googlebot → Disallow: /private/ (blocks only Google). User-agent: * → Allow: / (allows all others). Common bot names: Googlebot (Google search), Bingbot (Bing), Twitterbot (Twitter cards), facebookexternalhit (Facebook OG), AhrefsBot (Ahrefs), SemrushBot (Semrush).
Should I block crawling of /wp-admin/ in robots.txt?
Yes — block /wp-admin/ to prevent bots from consuming your crawl budget on admin pages. Standard WordPress robots.txt: User-agent: * → Disallow: /wp-admin/ → Allow: /wp-admin/admin-ajax.php (keep AJAX accessible). Also disallow: /wp-includes/, /?s=, /feed/, /xmlrpc.php.
What happens if robots.txt is missing?
If robots.txt returns a 404 error, search engines assume all pages are allowed to be crawled. This is usually fine but means you have no control over which pages consume your crawl budget. For large sites (1,000+ pages), a proper robots.txt with sitemap reference is important for efficient crawling.