Is translated content duplicate content?
28th October 2019
We often hear this question come up in translation conversations, so we thought a blog explaining content translation and content duplication would be a good place to start.
What is duplicate content?
According to Google “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include discussion forums that can generate both regular and stripped-down pages targeted at mobile devices, store items shown or linked via multiple distinct URLs and printer-only versions of web pages”.
Duplicate content can be caused by various factors including:
1. URL variations especially when URLs are using parameters or product IDs.
For example, https://www.foreigntongues.co.uk/services/translation?c….. is a duplicate of https://www.foreigntongues.co.uk/services/translation?c…&page3 which is a duplicate of https://www.foreigntongues.co.uk/services/translation. Every CMS (content management system) can create parameters, depending on your settings. The best way to tackle these is by adding self-referencing canonical tags to all original pages (in our case, the master page is https://www.foreigntongues.co.uk/services/translation) and excluding parameters from being crawled in robots.txt file. If these pages have been indexed you need to add noindex meta tags
2. Different website versions
Every website can have several versions, for example https://www.foreigntongues.co.uk/, https://foreigntongues.co.uk/ (without www), https://www.foreigntongues.co.uk/ (unsecure) https://foreigntongues.co.uk/.
In most cases, these versions would be redirected to the default versions by your developer prior to website launch, but if, by accident, two are live at the same time, they effectively create duplicates of each of those pages/versions (so a search engine would think you have two websites with the same content but different URLs). This applies to pages with http, https, with or without www. Ensuring that correct redirects have been added header.php file would fix this issue. Furthermore, adding self-referencing canonical tags would help also.
3. Scraped content
This is your content ‘borrowed’ and republished by other sites. This is also a common problem for ecommerce websites, with many of them selling the same products and using the same product descriptions.
4. Printer versions
If your site has a “regular” and “printer” version of an article, with exactly the same content. As before canonical tags, exclusion in robots.txt file and noindex meta tags can solve this problem.
Does duplicate content lead to penalties?
Technically there is not a penalty for duplicate content, but it can impact search engine rankings and lead to traffic losses. Having multiple pieces of almost identical content in more than one place on the internet, can make it difficult for the search engines to decide which one is the original version that should be ranked highest. Google will filter through duplicate content and pick the page they think should be ranking in search results, which may not be the page you wanted to rank.
Moreover, search engines don’t know which page should be getting the link metrics such as trust, authority, link equity. Should they direct it to one page or share the metrics be shared across all of the duplicate pages.
It’s also worth mentioning that if Google thinks content duplication is done to manipulate search engine rankings or win more traffic, the site might be removed entirely from the Google index!
Does translation cause a duplicate content issue?
If we have one page in English and another that is translated into French, does Google consider that duplicate content? It will not be considered duplicate content. Google once said that content in different languages is quite different.
“So if you have content in German and English, even if it’s the same content but translated, that’s not duplicate content. It’s different words. Different words on the page. These are translations, so it’s not naturally a duplicate.” John Mueller, Google
There are a few things a website owner should remember:
- Human translation prevents duplicate content, auto generated machine translation doesn’t so if you want to rank in search engines, hire a translation agency to create language rich and informative content.
- Content localisation is important when targeting non-English-speaking users. This means that if you have a Spanish language site targeted at the market in Spain, you need it localised for the Mexican, Argentinian and other Latin American markets, and the Spanish-speaking market in the USA.
- Implement hreflang markup so that Google knows that 2 versions of a page are actually the same (just translated) and show the correct language to searchers. We have covered hreflang tags in another article here.
We hope this article addresses your concerns around translation and duplicate content. Don’t hesitate to get in touch with us, when you require help with content translation.