SEO technical guide
Robots.txt block all except one page
robots.txt controls crawling. It is the wrong tool if your real goal is to remove pages from search results.
Why this robots.txt setup is risky
A Disallow rule tells a crawler not to fetch a URL. It does not erase that URL from the index. Worse, if the crawler cannot fetch the page, it also cannot see a noindex tag or canonical link you placed on that page.
Example
User-agent: *
Disallow: /
Allow: /public-page/
Crawlers may fetch /public-page/, but blocked URLs can still be known from links.
Safer ways to control crawling and indexing
- Use robots.txt for crawl control, such as keeping crawlers out of generated folders or low-value paths.
- Use noindex when a page is allowed to be crawled but should not appear in search results.
- Leave a page crawlable if Google needs to read its noindex or canonical signal.
- After changing rules, test the robots file and inspect the specific URLs that matter.
Crawl control vs index control
Crawling is permission to fetch a URL. Indexing is whether the URL can appear in results. A blocked URL can still be known from links, sitemaps, or old crawl history.
When a page needs to leave search, make it crawlable long enough for noindex to be seen. For urgent cleanup, use the search engine's URL removal tools as well.
Allowing one page
If only one page should be public, the clean answer is usually site structure: publish that one page and keep everything else private or behind authentication.
For staging sites, use password protection. robots.txt is a request to crawlers, not access control.
Common mistakes
- Expecting Disallow to remove already known URLs from search
- Blocking a page and then adding noindex where crawlers cannot see it
- Forgetting that Allow rules are path-specific and crawler-specific
- Blocking CSS or JavaScript needed to render important pages
Related problems
FAQ
Can robots.txt block indexing?
Not reliably. It blocks crawling, but a known URL can still appear in search.
Should I use noindex or robots.txt?
Use noindex for index removal and robots.txt for crawl control. Do not block a page that crawlers need to fetch to see noindex.
Is robots.txt security?
No. Sensitive pages should require authentication or should not be publicly available.