Duplicate Content

Duplicate Content

Duplicate Content can be classified as large blocks of content that are the same or very similar and appears on the internet in more than one URL. When there is duplicate content the search engines are placed in a position where they have to determine which version is the most appropriate to a given search query. In the process of providing the best search experience, search engines will typically not populate numerous pieces of the same content, it is then down to the search engine to decide which version is the original.

Search engines want to be able to provide a degree of variety when a search is run, they want to be able to show ten different results on a search page each with relevance as opposed to ten different URL's that all display the same content. The purpose of the search engines is to provide users with the best user experience by showing relevant and varied results.

A major source of duplicate content is for those involved in e-commerce sites that sell products from a selection of manufacturers. Most manufacturers supply product information to include on websites, the problem is that the same description is given to all those involved in selling their products. This results in the same content appearing across a number of wesites. There is no quick solution, a way to differentiate a site from the others can be achieved by writing new product descriptions, taking new photos or including new content on the website such as customer reviews.

Affiliate sites can also run into problems with duplicate content, and should ensure that the website is of value to users. The best way to achieve this is by introducing unique content that is appropriate to the website and offers visitors a reason to visit.

Legitimate duplicate content

In instances where an article from another source is relevant to visitors, the article can be published without penalty by implementing a cross-domain canonical tag. It is necessary to place the canonical tag on the page showing that the content is published at a different location. This course of action informs the search engines that the article is copied and all weight link to the page should be attributed to the original source of the article.

Accidental duplicate content

Accidental duplicate content can occur when a webserver is misconfigured and is displaying the same content across multiple locations. Examples below:

  • Protocol content duplication example:
    http://www.example.com and https://www.example.com serving the same content.
  • Subdomain content duplication example:
    http://www.example.com and http://example.com serving the same content.

In these instances it can not be assumed that the search engines will know what the correct location is and this could result in a duplicate content issue as well as leading to links to the correct URL not being credited. A canonical tag on each duplicate page pointing to the correct URL is one solution to overcome this this problem, however if multiple locations have been indexed a better course of action would be a 301 redirect that permanently redirects to the correct location. This will guarantee that all requests for a resource resolve to the correct location as well as passing link credit.

To avoid accidental duplicate content issues, prior to going live a webserver and or CMS should be configured to serve content only from one location. This can save a lot of duplicate content headaches later on.

Duplicate content resolution

The following steps can be taken resolve duplicate content issues:

  • Remove the duplicate content URL and return a 404 error page. This is relevant where the content has no actual value for visitors or the search engines and there are no inbound links.
  • Implement a 301 redirect from the duplicate content URL to the proper URL. This is important if the duplicate page has inbound links, this will advise bots that the page has been permanently moved to a different location and link weight will be passed accordingly.
  • Leave the duplicate content for human visitors but block it for search engines. This can be achieved by using a robots.txt file. This works best on uncrawled content but is not effective at removing content that has already been indexed. Not a preferred choice particularly with the major search engines who don't approve of its overuse.
  • Use a canonical tag to establish a canonical version for any page. When a search engine arrives on a page with a canonical tag, they assign the page to the canonical URL, despite what URL was used to access the page.

Duplicate content conclusion

Duplicate content is one of those issues that can occur unintentionally, the good thing is that it is fixable and definitely should be resolved. Unfortunately it is one of those issues that will take time to put right, however it is an area that needs to be addressed as a matter of urgency and is worth doing it correctly. It is also an area that needs continual monitoring but, the rewards can be worthwhile. Just by the process of removing duplicate content can result in a huge ranking boost.

During a website audit Seopler uses a hash algorithm to detect any onsite duplicate pages. Sign up to one of our free or paid plans now and test your website for duplicate content issues.