The final element that we’re going to talk about in the crawlability world is the noindex tag. This tag lives in the <head> code of your site’s pages and looks like this:
<meta name=”robots” content=”noindex”>
This tag instructs search engines not to index that page, which means it will not be included in any search results. The noindex tag is similar to blocking a page via robots.txt (slightly different, since a noindexed page can still be crawled, just not indexed, while a blocked page shouldn’t even be crawled).
A noindex tag is the only way to be certain that Google won’t ever show the page in search results; however, note that the noindex tag only works if Google can crawl the page! If you have a page blocked in robots.txt, Google won’t crawl the page, and thus won’t see the noindex tag. Then if Google see’s a lot of links to that page, it might decide to serve it as a search result, since it never saw your instruction not to.
You can also tell Google whether or not to crawl through any links it finds on your noindexed page. For example, you might not want Google to serve page 2+ on your paginated list of products, or blog posts, as a search result, but you definitely still want Google to crawl through the links on those paginated pages. You can choose to let Google follow links or not in your noindex tag like this:
<meta name=”robots” content=”noindex,follow”>
<meta name=”robots” content=”noindex,nofollow”>
By default, if you don’t say follow or nofollow, Google will follow the links.
Just like with robots.txt, there probably aren’t many pages on your site that you need to noindex. This tag is commonly used in the same places that blocked pages in robot.txt might be. In addition, the noindex tag is often used on certain kinds of duplicate content, and on paginated pages — thus if you had a list of products, and at the bottom you can move to page 2, then page 3 — it’s common to noindex everything after page 1 (because you really want your main page to rank, not a page halfway through your list).
Happily, that’s about all there is to the crawlability portion of SEO. For the majority of sites, all you really need to do is set up a good hierarchal site structure and ignore the rest (or possibly just double check to make sure you aren’t accidentally blocking things).
The thing to remember here is that robots.txt and noindex are about blocking search engines from your site: don’t use them, and you won’t be blocking anything.