Robots.txt explained: a simple guide to web crawler rules

What is robots.txt?

Robots.txt is a plain text file placed at the root of a domain (for example, https://example.com/robots.txt) that tells well-behaved web crawlers which parts of the site to crawl or ignore. It is a voluntary convention; not every bot uses it, and it is not a security barrier.

How robots.txt works

Robots.txt uses a simple, group-based syntax. Each group starts with a User-agent line, followed by one or more Disallow or Allow rules. Groups are separated by blank lines. Lines such as Sitemap can appear in the file to point crawlers to your sitemap.

The basics of syntax

A group looks like: User-agent: <name-or-wildcard> Disallow: /path/ Allow: /path/that-might-be-not-disallowed/

And you can have multiple groups in a single file. Paths are relative to the site root.

The scope and the caveat about security

Robots.txt controls crawling, not access control. A page can still be served to anyone; blocking it in robots.txt will not stop someone from fetching the URL directly if they know it.

Common directives

User-agent

Specifies which crawlers a group applies to. Use * for all bots or name a specific bot.

Disallow

Tells a crawler not to visit a path. An empty value means "no restriction" in that group.

Allow

Used to override a Disallow for a sub-path in some bots (not universally supported).

Crawl-delay

Requests a delay between requests from a crawler. Not all bots honor this.

Sitemap

Points to the sitemap file. Typically placed at the end of the file or within a group.

Examples

Example 1: Block all bots from /private/

User-agent: *
Disallow: /private/

Example 2: Allow a specific path for a search engine while blocking others

User-agent: *
Disallow: /private/

User-agent: Googlebot
Disallow:

Example 3: Include a sitemap and block a folder

User-agent: *
Disallow: /private/
Sitemap: https://example.com/sitemap.xml

Limitations and best practices

Place robots.txt at the site root so it is discoverable by crawlers.
Use it to guide crawling behavior, not to hide sensitive data securely.
Do not rely on robots.txt to protect confidential information; use authentication for that.
Keep the file up-to-date and consistent with any per-page noindex strategies (see noindex notes below).

Testing and verification

Fetch your robots.txt directly in a browser to confirm the content.
Use search engine tools like the webmaster console’s robots.txt tester if available.
Check server logs to verify which sections are being crawled and which are not.

SEO and indexing considerations

Disallowing pages prevents Google and other engines from crawling them, which often means they won’t be indexed.
If you want a page to appear in search results but not be crawled for content, robots.txt alone won’t suffice; use a noindex meta tag or other signals on the page (but note that noindex typically requires the page to be crawlable).
For assets like images, blocking by robots.txt will keep them from being crawled and indexed but won’t necessarily stop access.

Security and privacy notes

Robots.txt is public; anyone can read it. Do not put secrets in this file.
It is not a security boundary. If a page is sensitive, require authentication or proper access controls.
Some bots ignore robots.txt, so do not rely on it to enforce privacy.

What is robots.txt?

How robots.txt works

The basics of syntax

A group looks like: User-agent: <name-or-wildcard> Disallow: /path/ Allow: /path/that-might-be-not-disallowed/

And you can have multiple groups in a single file. Paths are relative to the site root.

The scope and the caveat about security

Robots.txt controls crawling, not access control. A page can still be served to anyone; blocking it in robots.txt will not stop someone from fetching the URL directly if they know it.

Common directives

User-agent

Specifies which crawlers a group applies to. Use * for all bots or name a specific bot.

Disallow

Tells a crawler not to visit a path. An empty value means "no restriction" in that group.

Allow

Used to override a Disallow for a sub-path in some bots (not universally supported).

Crawl-delay

Requests a delay between requests from a crawler. Not all bots honor this.

Sitemap

Points to the sitemap file. Typically placed at the end of the file or within a group.

Examples

Example 1: Block all bots from /private/

User-agent: *
Disallow: /private/

Example 2: Allow a specific path for a search engine while blocking others

User-agent: *
Disallow: /private/

User-agent: Googlebot
Disallow:

Example 3: Include a sitemap and block a folder

User-agent: *
Disallow: /private/
Sitemap: https://example.com/sitemap.xml

Limitations and best practices

Place robots.txt at the site root so it is discoverable by crawlers.
Use it to guide crawling behavior, not to hide sensitive data securely.
Do not rely on robots.txt to protect confidential information; use authentication for that.
Keep the file up-to-date and consistent with any per-page noindex strategies (see noindex notes below).

Testing and verification

Fetch your robots.txt directly in a browser to confirm the content.
Use search engine tools like the webmaster console’s robots.txt tester if available.
Check server logs to verify which sections are being crawled and which are not.

SEO and indexing considerations

Disallowing pages prevents Google and other engines from crawling them, which often means they won’t be indexed.
If you want a page to appear in search results but not be crawled for content, robots.txt alone won’t suffice; use a noindex meta tag or other signals on the page (but note that noindex typically requires the page to be crawlable).
For assets like images, blocking by robots.txt will keep them from being crawled and indexed but won’t necessarily stop access.

Security and privacy notes

Robots.txt is public; anyone can read it. Do not put secrets in this file.
It is not a security boundary. If a page is sensitive, require authentication or proper access controls.
Some bots ignore robots.txt, so do not rely on it to enforce privacy.

What is robots.txt?

How robots.txt works

The basics of syntax

The scope and the caveat about security

Common directives

User-agent

Disallow

Allow

Crawl-delay

Sitemap

Examples

Example 1: Block all bots from /private/

Example 2: Allow a specific path for a search engine while blocking others

Example 3: Include a sitemap and block a folder

Limitations and best practices

Testing and verification

SEO and indexing considerations

Security and privacy notes

Share This Article

Anne Kanana

Comments

Best times to trade forex (and times to avoid)

Booming Businesses: Trends, Sectors, and How to Ride the Wave

Choosing a Metal Fabrication Company: A Practical Guide

How to Make Money on Instagram: A Practical Guide

What is robots.txt?

How robots.txt works

The basics of syntax

The scope and the caveat about security

Common directives

User-agent

Disallow

Allow

Crawl-delay

Sitemap

Examples

Example 1: Block all bots from /private/

Example 2: Allow a specific path for a search engine while blocking others

Example 3: Include a sitemap and block a folder

Limitations and best practices

Testing and verification

SEO and indexing considerations

Security and privacy notes

Share This Article

Anne Kanana

Comments

Best times to trade forex (and times to avoid)

Booming Businesses: Trends, Sectors, and How to Ride the Wave

Choosing a Metal Fabrication Company: A Practical Guide

How to Make Money on Instagram: A Practical Guide

What to Do After a Vehicle Accident: A Practical Checklist