The robots.txt file is primarily known as an SEO tool—it tells search engine crawlers like Googlebot which parts of your site they should or shouldn’t access.
However, from a security perspective, robots.txt is often a double-edged sword. While it can help reduce server load from bad bots, it is frequently used incorrectly to “hide” sensitive areas of a website.
Here is the golden rule of robots.txt security: Since robots.txt is a public file that anyone can read, listing a sensitive directory there doesn’t protect it—it advertises it to attackers.
In this guide, we’ll cover how to configure your WordPress robots.txt for maximum security without hurting your SEO, and how to avoid common information disclosure pitfalls.
What is robots.txt and Where Does it Live?
The robots.txt file sits in the root directory of your website.
https://yoursite.com/robots.txt
It uses a simple syntax to instruct “User-agents” (bots) on where they are allowed to go.
Basic Syntax:
Plaintext
User-agent: *
Disallow: /private-folder/
This tells all bots (*) not to crawl the /private-folder/.
The “Security Through Obscurity” Trap
The biggest security mistake website owners make is adding private paths to robots.txt thinking it keeps them safe.
The Risk:
Attackers and vulnerability scanners routinely check robots.txt first. If you have a line like Disallow: /backup-2024/ or Disallow: /admin-staging/, you have just handed the attacker a map to your most sensitive files.
FunSentry’s Perspective:
When you run a security scan with FunSentry, our scanner analyzes your robots.txt file specifically looking for “Information Disclosure.” We check if you are accidentally revealing paths to:
- Backup directories
- Config files
- Staging environments
- Old versions of the site
- Private admin portals
Rule #1: Never put a path in robots.txt that you wouldn’t want a hacker to know exists. If a folder is private, password-protect it or restrict access via .htaccess—do not just ask bots nicely to ignore it.
Part 1: Essential WordPress Robots.txt Settings
For a standard WordPress site, your goal is to block access to backend files while allowing access to frontend assets (CSS/JS/Images) so Google can render your site correctly.
1. The Standard WordPress Setup
At a minimum, you should block the core WordPress directories that contain no public content.
Plaintext
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Why the Allow line?
WordPress uses admin-ajax.php for many frontend dynamic features (like “Load More” buttons or contact forms). If you block all of /wp-admin/, you might break these features for crawlers, hurting your SEO.
2. Blocking Core Files
You can add rules to prevent crawlers from wasting resources on core files that shouldn’t be indexed.
Plaintext
Disallow: /wp-includes/
Disallow: /xmlrpc.php
Disallow: /readme.html
Disallow: /license.txt
Note: Be careful blocking /wp-includes/. If your theme loads jQuery or other scripts from there, blocking it might prevent Google form seeing your site correctly. Always test using the Google Search Console “URL Inspection” tool.
Part 2: Preventing Bot Abuse
While robots.txt cannot stop a malicious bot (hackers ignore these rules), it can stop “polite” scrapers and aggressive SEO tools that waste your server resources.
3. Blocking Aggressive Bots
If you notice specific bots spiking your CPU usage, you can block them specifically.
Plaintext
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: MJ12bot
Disallow: /
Note: This only stops them from crawling; it doesn’t block them from the server. For true blocking, use a firewall (WAF) or .htaccess.
4. Protecting Internal Search Results
Attackers sometimes use your internal site search to generate thousands of spam URLs (Search Result Spam). You can tell Google not to index these pages.
Plaintext
Disallow: /?s=
Disallow: /search/
Part 3: What NOT to Do (Common Mistakes)
1. Blocking CSS and JS
In the past, it was common to block /wp-content/. Do not do this. Google needs to access your CSS and JavaScript files to understand if your site is mobile-friendly.
Bad Configuration:
Plaintext
# ❌ DANGEROUS FOR SEO
Disallow: /wp-content/
Disallow: /wp-includes/
2. Leaking Plugin Vulnerabilities
Avoid listing specific plugin paths unless necessary.
Bad Configuration:
Plaintext
# ❌ TELLS HACKERS WHAT PLUGINS YOU USE
Disallow: /wp-content/plugins/vulnerable-slider-plugin/
If an attacker sees this, they know exactly which exploit to try against your site.
The “Perfect” Secure WordPress Robots.txt
Here is a balanced configuration that protects backend paths, prevents information leakage, and maintains SEO visibility.
Plaintext
User-agent: *
# Block backend admin access
Disallow: /wp-admin/
# Allow the AJAX handler for frontend functionality
Allow: /wp-admin/admin-ajax.php
# Block specific file types often targeted by bots
Disallow: /xmlrpc.php
Disallow: /readme.html
Disallow: /license.txt
# Prevent indexing of internal search results
Disallow: /?s=
Disallow: /search/
# Prevent indexing of trackbacks
Disallow: /trackback/
# ─── SECURITY NOTICE ───
# Do NOT list private backup folders here (e.g., /backups/).
# Protect them via server config, not robots.txt.
# Location of Sitemap (Helps good bots)
Sitemap: https://yoursite.com/sitemap_index.xml
How to Verify Your Robots.txt
After updating your file, you must verify it to ensure you haven’t accidentally de-indexed your whole site or leaked data.
1. Use FunSentry for a Security Audit
FunSentry performs a specialized Robots.txt Analysis as part of its security scan. It checks for:
- Sensitive Path Disclosure: Does the file reveal hidden directories?
- Installation Files: Are
readme.htmlorlicense.txtaccessible? - Logic Errors: Are you accidentally blocking legitimate traffic?
2. Google Search Console
Use the Robots.txt Tester inside Google Search Console to see exactly how Google interprets your rules.
3. Manual Inspection
Simply visit yoursite.com/robots.txt in Incognito mode. If you see paths like /secret/ or /backup/, remove them immediately and protect those folders on the server level instead.
Summary
| Do This ✅ | Don’t Do This ❌ |
Block /wp-admin/ | Block /wp-content/ (blocks CSS/Images) |
Allow /wp-admin/admin-ajax.php | List private folders like /backup_v2/ |
Block /xmlrpc.php | Use robots.txt to hide sensitive data |
| Include your Sitemap URL | Assume Disallow stops hackers |
Your robots.txt is a public billboard. Use it to guide helpful search engines, not to hide secrets from bad actors.
Is your robots.txt revealing too much?
Don’t guess. Run a free scan at FunSentry today to check for Information Disclosure risks and ensure your WordPress site is properly hardened against 2025’s most common threats.
