SEO technical guide

Robots.txt block all except one page

robots.txt controls crawling. It is the wrong tool if your real goal is to remove pages from search results.

Why this robots.txt setup is risky

A Disallow rule tells a crawler not to fetch a URL. It does not erase that URL from the index. Worse, if the crawler cannot fetch the page, it also cannot see a noindex tag or canonical link you placed on that page.

Example

Input

User-agent: *
Disallow: /
Allow: /public-page/

Output

Crawlers may fetch /public-page/, but blocked URLs can still be known from links.

Safer ways to control crawling and indexing

Use robots.txt for crawl control, such as keeping crawlers out of generated folders or low-value paths.
Use noindex when a page is allowed to be crawled but should not appear in search results.
Leave a page crawlable if Google needs to read its noindex or canonical signal.
After changing rules, test the robots file and inspect the specific URLs that matter.

Crawl control vs index control

Crawling is permission to fetch a URL. Indexing is whether the URL can appear in results. A blocked URL can still be known from links, sitemaps, or old crawl history.

When a page needs to leave search, make it crawlable long enough for noindex to be seen. For urgent cleanup, use the search engine's URL removal tools as well.

Allowing one page

If only one page should be public, the clean answer is usually site structure: publish that one page and keep everything else private or behind authentication.

For staging sites, use password protection. robots.txt is a request to crawlers, not access control.

Common mistakes

Expecting Disallow to remove already known URLs from search
Blocking a page and then adding noindex where crawlers cannot see it
Forgetting that Allow rules are path-specific and crawler-specific
Blocking CSS or JavaScript needed to render important pages

FAQ

Can robots.txt block indexing?

Not reliably. It blocks crawling, but a known URL can still appear in search.

Should I use noindex or robots.txt?

Use noindex for index removal and robots.txt for crawl control. Do not block a page that crawlers need to fetch to see noindex.

Is robots.txt security?

No. Sensitive pages should require authentication or should not be publicly available.