Robots.txt File

Every site has a simple text file sitting in the main directory called robots.txt. This file gives instructions that bots are supposed to obey when they’re crawling your site (Google and Bing bots obey these instructions — many private crawlers do not).

A robots.txt file

This file is used to block certain pages or directories of your site from search engines; pages that you don’t want Google to see and that you don’t want to show up in search results. Commonly site owners will block checkout pages, or anything behind a login. As you can imagine, any page that you block in robots.txt will not rank in Google. (Technically a blocked page can rank: if Google sees a lot of links to a page, it might rank it even though it’s never visited the page).

You can also give specific instructions to specific bots. One move that more paranoid webmasters or their security teams like to do is to specifically allow Google and Bing, but block every other kind of robot. It can also be used as part of a honey trap: make a page, link to it, and tell bots in robots.txt not to visit it: any bot that does visit that page is a naughty bot that you can then block.

In General, Don’t Worry About It

As a general rule of thumb, most webmasters do not need to block anything on their robots.txt file. That said, if you are blocking things, just be certain that you don’t use sweeping logic and end up blocking Google from the entire site. This happens far more often than you might think.

It’s worth checking your robots.txt file to make sure you don’t have something like /disallow * (blocking everything), but odds are that you’re fine, and that you won’t need to worry about your robots.txt ever again.

If you’re curious about how other sites set up their robots.txt file, you can just go look. After all, it’s a public file in a standard place on every site (it has to be, for the bots to find it). Just go to www.domain.com/robots.txt and you’ll see their file.

You can see Amazon’s here, for example: www.amazon.com/robots.txt

Facebook Twitter

Leave a Reply

Your email address will not be published. Required fields are marked *