All About Duplicate Content

A very common inquiry we receive is regarding just how problematic “duplicate content” is for SEO. This article aims to explain our particular interpretation of this common SEO consideration.

First of All, What Exactly is Duplicate Content?

For SEO purposes, we define duplicate content as a situation that occurs when two or more URLs (links) on the web contain the same or very similar content.

Those two or more URLs could be on the same domain (yourcompany.com) or different domains. For example, if the following two links loaded the same exact content, that would be an example of full duplicate content: http://yourcompany.com/page1.htm and http://yourcompany.com/page1.html (Note that one ends in .htm and the other in .html.)

There are also scenarios where partial duplicate content exists. Duplicate content of this nature commonly occurs when other websites “syndicate” or “scrape” content, with or without permission.

If http://yourcompany.com/blog2 and http://othercompany.com/article2 contain the same article, that would be partial duplicate content (because the header and footer of the sites are different, but the core body content is the same). This isn’t necessarily always bad, as we will explain later in this article.

Another common example of partial duplicate content occurs when a manufacturer’s website publishes a product description, and then multiple resellers copy that same product description onto their e-commerce sites. Even if they title the products differently, there’s an exact copy of the product description amongst several different websites.

Many ecommerce sites also end up with this problem within their own site. If the same product gets placed in several categories and the category is part of the URL, you can end up with the same content rendering on:

http://yourstore.com/shirts/shirt1

http://yourstore.com/t-shirts/shirt1

(Note that one URL has a category of “shirts” and the other has a category of “t-shirts”, but the product is the same.)

So, just how much of a page needs to be similar in order for search engines to consider it duplicate content?

Unfortunately, search engines don’t dictate a certain percentage of uniqueness that you can work off of, but that’s not really how it works anyway.

Search engines assess page content in sections – so they look at the titles separately from the body copy, separately from the footer, etc. So if you have multiple pages with the same titles, but everything else on that page is unique, that can still be problematic since that particular section is duplicated across multiple URLs.

How to Check for Duplicate Content

If you are curious to compare two URLs from a percentage perspective, since a high percentage would indeed be indicative of a whole section being duplicated, you can use this tool: http://www.copyscape.com/compare.php

You can also use CopyScape to check the web for copies of content across multiple domains. Visit their homepage for that tool: http://www.copyscape.com/

It is also recommended to use a tool like Screaming Frog SEO Spider to assess individual elements like title tags and H1s for duplicate content in those particular sections. (If you’re a client of ours – don’t worry about this technical stuff, we’ve already done this for you.)

So How Bad of a Problem is Duplicate Content?

It depends. But first, let’s clear something up.

There is no such thing as a duplicate content “penalty” in Google. A penalty is an deliberate action that is taken against a site to demote or remove that site’s content from search results.

The downside to duplicate content is twofold:

  1. Search engines never want to show two copies of the same result, so they will simply ignore one copy. Often they prefer the original (oldest) copy that was published first, ignoring newer copies – but other times they will default to showing the copy that is on the highest authority website. For example, if CNN.com published a copy of an article from our site – it would be likely that the CNN copy would win over ours, even if our copy came out first.
  2. When duplicate content exists within a single site, it simply makes it harder for the search engine crawlers to understand and decide which is the preferable copy to show in search results. But if you cannot avoid having duplicate copies of content on different URLs amongst your own site, there is a way to tell the search engines which copy you prefer be shown in search results. More on that shortly…

So unless your duplicate content issues are rampant (which could be a red flag for the Panda algorithm), you need not worry about a literal penalty for a bit of duplicate content here or there. Sometimes, it can even be beneficial.

How is Duplicate Content Potentially Beneficial?

If you write an article on your site that then gets picked up by (also published by) another website, and that website has a very good search engine reputation (like the CNN example above), then you A) are more likely to have your article and brand discovered on that site versus your own and B) very likely are getting a high-authority link back to your site from the article. A single or few instances of that scenario can help more than they hurt. However, if your articles are being syndicated to a plethora of lower quality sites, that most certainly does not have the same benefit and can do more harm than good.

Overall, it is best to try to avoid duplicate content whenever possible. If high authority sites are willing to publish your content, then consider writing unique content just for those sites, that won’t be published on your own site. You can even use that as an opportunity to optimize the article (and get found for) higher competition keywords than you otherwise would be able to use on your own site. We often recommend that our clients use LinkedIn’s publishing platform for this purpose. Write unique content for that, which will not appear on your own site, and use more difficult to rank for keywords than you would otherwise use on your own site, as the LinkedIn.com domain is likely to be favored for terms like that versus small business websites.

What to do if You Can’t Avoid Duplicate Content Within Your Site

Rel=Canonical is the Answer

If you do have scenarios within your own site where you cannot avoid having two copies of the same content on multiple URLs, there’s a simple way of telling the search engines which copy you want them to treat as the original that will get displayed in search results. That is done through the rel=canonical tag. It’s a simple code tag added to the header of the website to let the search engines know which is the original copy. In the product example given above, this could be done as such:

Copy 1: http://yourstore.com/shirts/shirt1

Copy 2: http://yourstore.com/t-shirts/shirt1 ← add the rel=canonical tag to this one indicating that “http://yourstore.com/shirts/shirt1” is the original/preferred copy

Rewriting is an option, too, but…

If you’re considering editing two sets of extremely similar content to try to make them unique from each other, that often ends up being even harder and more time consuming than writing an entirely new page or article from scratch. We’d suggest either A) Just going back to a clean slate and writing brand new copy with an entirely different spin on the topic, or B) Stick with the two copies and use the rel=canonical tag to mark one as the original.

In Conclusion

We hope this information was helpful, but often duplicate content concerns are best addressed on a case-by-case basis. We’re happy to discuss any concerns our clients (or potential clients) may have about a particular duplicate content challenge. That’s what we’re here for!

Pam Aungst Cronin
Latest posts by Pam Aungst Cronin (see all)