Monthly Archives: June 2016

The Meta Noindex Tag

The final element that we’re going to talk about in the crawlability world is the noindex tag. This tag lives in the <head> code of your site’s pages and looks like this:

<meta name=”robots” content=”noindex”>

This tag instructs search engines not to index that page, which means it will not be included in any search results. The noindex tag is similar to blocking a page via robots.txt (slightly different, since a noindexed page can still be crawled, just not indexed, while a blocked page shouldn’t even be crawled).

A noindex tag is the only way to be certain that Google won’t ever show the page in search results; however, note that the noindex tag only works if Google can crawl the page! If you have a page blocked in robots.txt, Google won’t crawl the page, and thus won’t see the noindex tag.  Then if Google see’s a lot of links to that page, it might decide to serve it as a search result, since it never saw your instruction not to.

You can also tell Google whether or not to crawl through any links it finds on your noindexed page. For example, you might not want Google to serve page 2+ on your paginated list of products, or blog posts, as a search result, but you definitely still want Google to crawl through the links on those paginated pages. You can choose to let Google follow links or not in your noindex tag like this:

<meta name=”robots” content=”noindex,follow”>
<meta name=”robots” content=”noindex,nofollow”>

By default, if you don’t say follow or nofollow, Google will follow the links.

Just like with robots.txt, there probably aren’t many pages on your site that you need to noindex. This tag is commonly used in the same places that blocked pages in robot.txt might be. In addition, the noindex tag is often used on certain kinds of duplicate content, and on paginated pages — thus if you had a list of products, and at the bottom you can move to page 2, then page 3 — it’s common to noindex everything after page 1 (because you really want your main page to rank, not a page halfway through your list).

Happily, that’s about all there is to the crawlability portion of SEO. For the majority of sites, all you really need to do is set up a good hierarchal site structure and ignore the rest (or possibly just double check to make sure you aren’t accidentally blocking things).

The thing to remember here is that robots.txt and noindex are about blocking search engines from your site: don’t use them, and you won’t be blocking anything.

Robots.txt File

Every site has a simple text file sitting in the main directory called robots.txt. This file gives instructions that bots are supposed to obey when they’re crawling your site (Google and Bing bots obey these instructions — many private crawlers do not).

A robots.txt file

This file is used to block certain pages or directories of your site from search engines; pages that you don’t want Google to see and that you don’t want to show up in search results. Commonly site owners will block checkout pages, or anything behind a login. As you can imagine, any page that you block in robots.txt will not rank in Google. (Technically a blocked page can rank: if Google sees a lot of links to a page, it might rank it even though it’s never visited the page).

You can also give specific instructions to specific bots. One move that more paranoid webmasters or their security teams like to do is to specifically allow Google and Bing, but block every other kind of robot. It can also be used as part of a honey trap: make a page, link to it, and tell bots in robots.txt not to visit it: any bot that does visit that page is a naughty bot that you can then block.

In General, Don’t Worry About It

As a general rule of thumb, most webmasters do not need to block anything on their robots.txt file. That said, if you are blocking things, just be certain that you don’t use sweeping logic and end up blocking Google from the entire site. This happens far more often than you might think.

It’s worth checking your robots.txt file to make sure you don’t have something like /disallow * (blocking everything), but odds are that you’re fine, and that you won’t need to worry about your robots.txt ever again.

If you’re curious about how other sites set up their robots.txt file, you can just go look. After all, it’s a public file in a standard place on every site (it has to be, for the bots to find it). Just go to and you’ll see their file.

You can see Amazon’s here, for example:

XML Sitemaps

A sitemap, or XML sitemap, is a text file that lists every page of your site that is designed to help search engines crawl your site. It’s published somewhere on your site (usually and then you can submit it to Google and Bing via their webmaster tools. A XML sitemap looks like this:

Example of a XML sitemap

In theory, XML sitemaps help Google find and crawl all the pages of your site. In reality, modern search engines do not need sitemaps to understand or index your site. A sitemap does not in any way influence the ranking of your site. It will not convince Google to index a page that Google decided not to index. Though a sitemap lets you tell search engines how often to index pages, search engines generally ignore those instructions and figure out their crawl schedule on their own.

Basically I’m telling you that a XML sitemap is not useful for the vast majority of sites out there. About the only time a XML sitemap will do anything for you is when you have orphaned pages (pages that are not linked to from any other page). And the solution for those pages is to make sure they’re part of the site hierarchy.

As SEOs we continue to build sitemaps mostly because clients expect us to — or because competing agencies use the lack of a sitemap as a way in to steal clients. But the fact is you almost certainly don’t need one, and having one will not help your SEO.

How to Make a XML Sitemap If You Really Want One

You can make a sitemap my manually creating it in excel or in a text file. There are also a lot of free automatic sitemap generators out there, and they do a fine job. If you are creating a sitemap yourself, manually or programmatically, the formatting details can be found here.

Once you’ve created the sitemap, just upload the file to somewhere on your site.

Submitting Your Sitemap to Google

To submit your sitemap ust log into the Google Search Console for your site and from the left menu select Crawl > Sitemaps. Then click the big Add/Test Sitemap button in the upper right corner. Just give Google the URL where you have uploaded your sitemap and click Submit Sitemap.

Submitting a XML sitemap to Google

Once you do, Google will eventually get around to checking it out, and by the next day will report to you how many pages of your sitemap it has indexed. Large sites will quickly note that Google gleefully reports to you how it’s ignoring tons of pages on your sitemap — again, a sitemap does not improve the chances that Google will index your pages.

It’s important to note here that Google is not telling you how many pages it has indexed, but instead is only telling you how many pages on your sitemap it has indexed. For example: you might have 100 pages on your sitemap, and Google tells you it’s indexed 98 out of the 100. However, Google may well have thousands of pages of your site indexed, including URLs you never even knew you had!

One Useful Thing About Sitemaps

The one nice thing about sitemaps is you can use them to get Google to tell you how much of your site it’s crawling.

The way to take advantage of this reporting is to split your sitemap into multiple different site maps, each covering a different selection of URLs. For an ecommerce site you might have your product pages on one sitemap, category pages on another, and list pages on a third. A large service-based site might put all the About pages on one sitemap, pages describing services on another, and blog posts on a third.

This then gives you slightly better insight into what Google is indexing. If you find that only half your pages in one category are indexed, you can then start investigating to find out which ones are being left out and why.

It’s worth stressing, however, that this kind of process is only really worthwhile for large sites. If your site only has a few hundred pages you are not going to have any issues within indexation.

For the smaller sites that I run, including and, I didn’t even bother with sitemaps. And I’ve even run sites with millions of pages without XML sitemaps (including one where we finally created a sitemap, and sure enough, it made zero impact on our indexation, rankings, or traffic).

Submitting Your Site to Search Engines

There are a lot of services out there who offer to submit your site to hundreds or thousands of search engines for a fee. They often tout SEO benefits and promise to help kickstart your fledgling site.

These services are scams!

As we already know, there are really only two search engines of matter in the English speaking world: Google and Bing. What these services really do is list your site in a bunch of spammy link directories.

The best you can hope for is that they just steal your money and do nothing. The worst-case scenario is they really do get you listed in a thousand directories, in which case your site may promptly be penalized for having nothing but spammy links.

In point of fact, most sites don’t need to submit themselves to search engines. After all, it is the job of search engines to discover and crawl every site out there, and these days they are very, very good at it. Once another site links to your new site, the search engine bots will eventually find the link and follow it to your site.

That said if you absolutely don’t want to wait, you can easily submit your site to the two search engines that matter yourself.

Crosslinking Pages

Another good strategy for large sites is to crosslink pages very low in the hierarchy to each other. This is one of the reasons that large ecommerce retailers will have Related Products and Customers Also Bought links on their products pages (the main reason, of course, is that it’s good for sales).

Since Googlebot crawls in from links (rather than strictly crawling top-down) this kind of cross linking can provide an additional path for the pages very low in a site’s hierarchy to get crawled.

It is usually not necessary to crosslink pages higher in the site hierarchy: by the very nature of being high in the site hierarchy, there will be lots of paths for Google to crawl to them (and they will have lots of authority flowing to them). It’s also worth noting that smaller sites with only a few hundred pages usually do not need to worry about this kind of crosslinking at all.

Crosslinking Gone Wild!

I have seen plenty of sites that get out of control with crosslinking: every page has a giant list of dozens (or hundreds!) of links at the bottom. An inexperienced SEO figured that crosslinking would help flow authority around, and they wanted to get as much authority to as many pages as possible.

At first glance, it seems like a nice theory, but them problem is you are essentially removing the hierarchy of the site and creating a flat structure. Yes, you are getting more authority to all those low hierarchy pages, but at the cost of lowering the authority of your most important pages.

I’ll explain how this works in detail in the PageRank Flow section. But for now just understand that you want a hierarchal site structure. Crosslinking the bottom of your hierarchy to a few other pages on the bottom can be good for large sites, but going too far will hurt your overall ranking ability.

Hierarchal Structure

Google assumes that your site will have a hierarchal structure, and its algorithm is built on that assumption. This starts with your home page at the top of the hierarchy. Then your global navigation (the links that appear on every page of your site, in the header and footer usually) should link to the next most important pages. Those should link to the next level down, and so on.

As a general rule of thumb, every page should link to the top layer of the hierarchy (the global nav) and each page should also link to the pages above and below it in the hierarchy. For larger sites this is where breadcrumbs come in useful: go to the product page of most ecommerce sites and you’ll see a list of links showing the hierarchal path you took to get there. These breadcumbs provide another crawl path for bots, as well as flowing link authority up to more important pages.

Example of breadcrumbs on a product page

This hierarchal structure is very important for ranking, as well discuss in Authority: On Site, but it’s also important for crawlability. Googlebot generally crawls into your site from an external link: that link may point to your home page, or it may point to something at the very bottom of your hierarchy. Googlebot will continue to crawl through the links it finds on that page and subsequent pages; however, at some point it’ll stop.

A strong hierarchal structure will make sure that regardless of where Googlebot enters your site, it’s definitely going to crawl the most important pages. If anything on your site gets skipped, you want it to be the least important ones.

As you can imagine, it’s vital that every page of your site needs to be linked to from somewhere within that hierarchy. A page is only going to get crawled if another page somewhere on the web is linking to it.

By Definition Hierarchy is Not Flat

It’s worth stressing here that you should not go crazy and embed hundreds of links on every page linking like crazy to every other page. This will do bad thing for your authority, as discussed later in the authority sections, but it also removes the hierarchal structure of your site.

Remember, Google expects the most important pages to be linked to the most, and the least important to be linked to the least. I know that some business owners think every page of their site is the most important, but if you link to everything equally, you are creating a flat structure where no pages are important in Google’s eyes.

The Dear God Don’t List of Crawlability

The crawlability of your site is vital for improving your indexation; however, it’s also the easiest part of SEO. Most sites that have a logical structure will have no problem being crawled by Google — after all, Google was built to be able to crawl sites. The vast majority of SEOs don’t have to worry about crawlability at all.

However, there are site designers out there who manage to do some truly spectacularly bone-headed things that prevent Google from seeing the site.

The Dear God Don’t List

Before we get into the stuff that you should be doing, here are the truly horrible things that you absolutely do not want to be doing. These are the spectacular SEO fails of site design. Unfortunately I have indeed seen sites that do each of these things — any one of them will prevent Google from crawling your site, which means you won’t ever show up in Google at all.

  • Don’t design your site in Flash or within a single ajax frame. If you do this Google cannot see any of your content. There are technically ways to design a single page application (SPA) that doesn’t totally destroy your SEO, but even those are proven to do worse than a real site time after time.
  • Don’t use JavaScript for your links: doing so can essentially hide your links from Google. Not only will Google not be able to use your navigation links to find other parts of your site, but it may think that you’re trying to cloak your links and penalize you for it.
  • Don’t block all bots in your robots.txt file — doing so is explicitly telling Google not to crawl any page of your site.
  • Don’t use the noindex tag on every page of your site — doing this explicitly tells Google not to index those pages.

Okay, got that out of the way. Now we can sit back and have a pleasant conversation about SEO and crawlability.

Competitor Research

The final step of keyword research and getting started on SEO is to do some competitor research. Here you want to know who your online competition is, how authoritative their sites and pages are, and how well optimized their sites and pages are.

This information will tell you how difficult it will be rank for given keywords, and could even change your SEO strategy: if your competition is much stronger than you had thought, you might switch to a long tail strategy and not worry about your head terms at the beginning, or you might set some keywords aside to work on after your site has grown in authority.

There are a lot of tools out there to help you with competitor research: some of them have value, and others are just a waste of your time. Happily, the very best method of evaluating competition is completely free: just look at the SERPs yourself.

Who is your competition?

When I was working in an agency just about every other client I had completely misunderstood who their competition was. These guys were generally brick & mortal businesses: retail or service or manufacturing. When we asked about their competition they inevitably listed the guys down the street or the big players in their industry. But most of the time these were not their online competition.

If someone is searching on Google for your products, service, or information, then those people clearly don’t already know where to go. To them, the only options are the sites that come up on the top of the search results. Those sites are your competition.

So when you’re scouting the competition, what you’re really looking for is the sites that rank in the top 5-10 results for the searches that you’re going after.

Finding Your Real Competitors

A good way of evaluating your competition is to sit down with your list of keywords that you developed in the keyword research phase, and search in Google for each of those keywords. Make sure you have personalization turned off for this (how to do that is explained on xxx).

For each keyword you’re targeting, make a note of what sites are in the top five position (or top 10 if you’re being very thorough). Remember we’re only looking at organic results here, and not paid ads.

Once you’ve gone through all your keywords, now take a look and see what sites appear most often on your list. Those top five or ten sites are your online competition. Those are the guys that you’re trying to outrank.

Evaluating the Competition

Now go back to some of your keywords and go visit the sites that are ranking in the top three positions. Check out the page that’s ranking and see how well optimized the page is: does it have good title tags, URL, keyword use on the page, and how long is the article or page? If the page is fairly weak in on-page optimization, that’s great news — this indicates that you can make a far better page and easily get better topicality signals, which means you will need lower authority signals to outrank that page.

The next step is to evaluate the authority of the site as a whole, and the page that’s ranking in specific. To do this you have to use some tools (which you generally have to pay for with a subscription) that evaluate the links to a page or site. The simplest route is to measure the Domain Authority of the page and the site’s homepage. Better answers can come from evaluating the number of linking domains pointing to the page,. We’ll talk about what tools are available in xxx.

Evaluating Page Authority

Evaluating the authority of the ranking pages is the most important part of competitor research. Unless they have very weak on-page optimization, you can generally assume that you will need authority signals at least as strong as theirs to have a chance of outranking them.

If you come across a page that’s Page Authority 50 with 200 linking domains pointing at the target page… that page is not going to be something you’re likely to outrank with a new site, not for a long time. It is going to take a lot of time to build up hundreds of linking domains to a specific interior page (there are shortcuts, of course, but these are generally against the rules and are likely to get a site penalized).

Evaluating Domain Authority

The overall authority of competing domains is also going to give you an idea of how much authority your site is going to need to do well in your niche. Take all of the top sites on your competitor list and check out the number of linking domains to their entire site. While you’re at it, spot check their linking domains to make sure most of them are legitimate links (many sites have a large number of garbage links that aren’t really helping them much — links from directories, scrapers, press releases, or article sites). In general, your SEO goal is going to be to get more linking domains than your competition has, and to get those links from sites that are at least as authoritative as the sites linking to the competition.

If your competition are all Domain Authority 20 or 30 sites with a couple hundred linking domains — good news! You should have no problem building something more authoritative over the course of a year.

If the competition are at Domain Authority 40 or 50 and have 800 or more linking domains, then you’re going to have a lot of work. It will likely take a couple years to build up to that level, and you’re going to have to invest significant time, and likely some money, into marketing and outreach.

If your competition is at Domain Authority 60+ and has thousands of linking domains, then they are a truly large site with substantial authority. You’re going to need to be an equally large site with similar reach and marketing budgets, or you’re going to have to have something go viral, and you’re going to have to be smarter than them at SEO.

Page Authority Can Beat Massive Domain Authority

Keep in mind, however, that you’re comparing the Page Authority or linking domains of the page that is ranking (not the entire site) against the page on your site that you want to rank. Sure, Amazon is an Authority behemoth, but I’ve worked on dozens of sites that have moved up to outrank for their searches. While Amazon has massive authority (and often pretty darned good topicality), the specific page that you’re competing against may not.

I’ve see Domain Authority 20 sites climb up and pass rankings — but against pages on that only have a few or no linking domains pointing to the page.

The big lesson to take home from your competitor research is that you should come out of it with a realistic expectation of how much work and time it will take for you to start ranking well in your niche.

The good news is that it’s not an all or nothing game.You don’t have to outrank Amazon to start driving SEO traffic.

As you start optimizing your site and building your authority, you’ll slowly start gaining in long tail, and then torso rankings. Long before you’re able to compete for your head terms you’ll be getting good amounts of search engine traffic from all the rest of your terms – and always remember that those long tail terms represent far more traffic than the big intimidating head terms.

Ranking the Long Tail

So we know we care about the long tail keywords because we know they make up around 75% of all searches. The other big advantage of the long tail is that it’s often easier to rank for long tail searches.

If I started a site today selling custom-built laptops, I would have no practical likelihood of ranking for the head term “laptops.” Instead a smart SEO strategy would be to include a lot of very thorough articles about how to customize laptops, and detailed reviews of different machines or components — in other words I’m creating the kind of content that places like NewEgg and don’t have, which gives me the opportunity to capture long tail searches that the big guys either aren’t ranking for, or aren’t very topically relevant for.

Optimizing for the long tail is a bit more subtle, since we often don’t even know what terms we’re specifically trying to capture. The key to long tail optimization is to have a lot of very good content (and by content, I mean word count).

More Content for More (Long Tail) Keywords

Sites targeting only head terms often have only a paragraph of text that stressing their head term keywords and, frankly, doesn’t really provide much information.

Long tail optimization stresses having much longer text that provides truly useful information that people are looking for, and answering questions that people might have. The more quality content you have, the better the odds that you struck some vital word combination that someone might search for.

Long articles that aren’t truly useful, or good, tend not to do very well with long tail, because by the very nature of bloating your word count with fluff, you aren’t saying things that people are really looking for (indeed, these types of articles aren’t really saying anything at all). No one is searching for your stream of consciousness ramble about different types of laptops. By focusing on providing real information that real users are looking for, you’ll end up hitting lots of things people are, or will be, searching for.

Rank the Head to Rank the Tail

The other part of long tail optimization is simply to rank for the head term. A site that ranks for the head terms is far more likely to rank for the long tail, as long as it’s topically relevant to the search query.

Thus doing all the things you can to build authority and rank for the head term — even if you know you have no chance of ranking for it — will increase your ability to rank for all the related long tail terms.

In my laptop example above: while the new site can write some awesome articles to capture long tail traffic; if BestBuy or NewEgg had the same articles, they would outrank the new site due to their overwhelming authority.

Happily giant sites (or rather, giant corporations) tend to be really bad about providing lots of great content — or indeed much of it at all. My theory is that once they get really big, they get the idea that it’s not worth pursuing things that take lots of man-hours, but instead only want to do something they can scale across their massive site. As a result they leave lots of scraps around for the little guys — and a smart site owner can use those scraps to grow their brand to the point where it can compete with the big guys.

The Head, the Tail, and Everything in Between

In the world of SEO you’ll hear a lot about Head Terms and the Long Tail. Sometimes you’ll hear talk about the in between, called torso terms, and sometimes called the chubby middle (fat head, chubby middle, long tail).

The concept of head and tail terms is very important for SEOs to understand, as it has huge implications for how best to optimize a site.

  • Head Terms: the keywords that drive a hugely disproportionate amount of traffic. “iphone” is a head term with over a million searches per month, while “how to get my iphone to flush down a toilet” is a long tail term that maybe has one search a month. There is no hard definition of what makes a head term, other than they are the very top 5-10% of your keywords. How much search volume they have depends on your industry: for head terms might have a search volume of over 100,000 per month.’s head terms have a search volume of over 500 searches per month.
  • Long tail: These are the keywords with very, very little search volume. If you looked up the search volume for these keywords Google would report < 10 or 0 searches for most of them. In most sites these are keywords that drive no more than 5 visits a month, and often only 1 or 2 visits a year.
  • Torso Terms: These are just everything in between. Torso terms are usually those keywords that would be a head term to someone smaller than you. At or there were thousands of keywords with a few hundred searches per month that weren’t worth my time to chase down, but that might be worthwhile for a smaller niche site to focus on (since they don’t have a chance at what I considered head terms).

When people do their keyword research, they are usually focusing on head terms — and this makes perfect sense. After all, why would you spend time researching terms that show up as having no search volume at all?

But here’s the thing: The long tail is larger than all of the head terms combined. Much larger.

For most sites, the long tail represents up to 75% of the traffic — all that from keywords that are driving only a few visits per month. While the head terms drive huge amounts of visits, the long tail overcomes the head by having a massive number of different keywords — many of which are slight variations.

Head Terms vs Long Tail Keywords

Head terms drive big chunks of traffic alone — but this graph is truncated at the end. If you continued it to include every keyword it would stretch to the right for page after page. The total surface area of the entire tail is around four times the surface area of the head.

The reason for this is that everyone searches differently, and more and more people are searching Google for increasingly specific things. Instead of searching for “iphone” people might be searching for “where can i buy an iphone that my kids won’t destroy in a week.” The first is a head term; the second is a long tail search query.

In fact, Google reports that a whopping 16% – 20% of all searches made each year are for a keyword phrase that has never been searched before. That means nearly a quarter of all searches made will never show up in any keyword tool… because they haven’t been searched for yet.

The long tail is very important. Happily, pursuing the head is one part of a strategy to pursue that long tail.