Another way to control the authority flow within your site that is specific to duplicate content is rel=canonical. This code is used when your site has multiple pages that are nearly identical. This happens in almost every site of any size and isn’t necessarily a bad thing. Some site features that create duplicate content include:
- On ecommerce sites anything that lets you change the order in which products are displayed usually creates duplicate content. If you let the user sort by best-selling, or by price (and you should) that will usually create a variant URL, but the content of the page is identical — just in a different order
- On blogs your archive pages usually create duplicate content. For example when you look under categories, or tags, you’re getting a list of the same blog posts that exist elsewhere.
- Ecommerce sites that use the category structure in the URL create duplicate product pages when there are different paths to navigate to a product. You might access a product both at site.com/blue-shoes/awesome-sneakers and site.com/mens-shoes/awesome-sneakers for example
Having internal duplicate content isn’t necessarily a bad thing. Google has even said that 25% of all the pages on the internet are duplicate pages; it’s a part of site design that makes navigation better for users and Google has learned how to deal with it.
The problem for SEOs is controlling which page ranks, and the greater issue of dilution of authority.
When Google sees two or more pages on your site that are mostly duplicates of each other (they can be slightly different and still be duplicate content) Google will choose just one of those pages to rank, and that might not be the page you want it to be.
Furthermore, perhaps some people link to one URL of your content and other people link to the other URL. Maybe you have 20 links to this great page you’ve created, but there are only 10 links to each version. Now that page only has half the links it should and isn’t ranking nearly as highly.
This is where rel=canonical comes in. This tag sits in the <head> portion of your site code and tells Google which version of a page is the canonical version — which is the official version that Google should rank. Every version should have the rel=canonical tag and they should all point to the same official URL. Then any links to any version of the page counts as if it was going to the canonical version (technically you still lose 15% of the link juice, so 85% of the authority passes).
Here’s what rel=canonical should look like, somewhere between the <head> and </head> tags:
<link rel=”canonical” href=”http://www.domain.com/canonical-url.html” />
As a best practice, every page of your site that you let Google index should have the rel=canonical tag pointing to the official version. The reason is there are a lot of ways for people to link to you that changes the URL: you could get links with a refid or UTM code (which are parameters that are used for tracking purposes) and most large sites have multiple ways to render a URL and even the engineers who work on the site code aren’t familiar with all of them.
What you have to be careful to avoid, however, is having duplicate content that each canonical to different sources. This won’t hurt you or penalize you, just Google will then decide it needs to ignore the canonical and once again makes its own decision about what to rank.
You can also point the canonical tag to an entirely different site (or subdomain). If you have two different websites, but have certain content that is identical on both sites (duplicate articles, guides, products, etc.) you can choose which site should be the canonical version.
In most cases, canonical implementation is something that you’ll need to talk to your webmaster about. They can either code them to dynamically generate based on internal logic, or they can code them so that you can set the canonical version on each page of your CMS yourself. If you have a WordPress site, you don’t need to worry about canonical tags, because WordPress takes care of that for you.
Rel=canonical is a very useful tool for making sure the correct page is ranking, and ensuring that it ranks as best as it can and it’s an important SEO best practice that should be put in place when your site is created — and as soon as possible if your site was built without it.