Data Blog by Lizeo

From product data to unique and SEO optimised product descriptions

From product data to unique and SEO optimised product descriptions

Did you know that 25 to 30% of content on the web is duplicated?* 
It is indeed more and more common to see identical content appearing on different websites, especially for product descriptions. With web page downgrades, lower SEO and reduced user experience, the problem of Duplicate Content is a real headache for marketing teams that needs to be addressed. The good news is that, thanks to quality data and AI, there is a technology that can address this problem: Natural Language Generation (NLG).

From structured data to unique, multilingual and SEO optimised product descriptions ​

It is essential for an e-commerce site to provide quality marketing descriptions for products, for reasons of user experience and to improve website performance. Writing product sheets only about star products is no longer enough, to optimise the performance of a website, it is essential to tick several boxes:
  • To have a complete editorial catalogue of product sheets covering most of the products on the website in order to provide maximum information to users,
  • Provide users with accurate descriptions to give them the best experience on the website and to offer them a product that meets their needs,
  • Offer these descriptions in multiple languages to attract and convert new online visits into purchases,
  • Optimise the SEO of the website to get maximum visibility,
  • Respond to the threat of Duplicate Content.


Creating and writing this content is often a time-consuming task, especially if the catalogue of products to be described is large, which does not always allow for manual writing of content on all products.
AI-based Natural Language Generation technology is an ideal solution to these problems. It transforms structured data into large quantities of text in the desired formats, optimised according to your SEO keywords, in several languages and in a matter of seconds. 
Accompanied by a specific algorithm, such as Simhash, which is used by Lizeo, and quality data, it even makes it possible to propose unique content, with an almost zero rate of Duplicate Content…

A threat: the Duplicate Content ​

What is it?

The content of a product sheet, a web page or a website is considered as being in Duplicate Content as soon as it is reproduced almost or totally identically elsewhere on the Web. 

Search engines then judge the information as being copied. This poses a real problem because they can then sanction the pages concerned, including the original page…

There are two types of duplicate content

  1. Duplicate content on the same website, through a technical or human error, is the most regularly encountered type of Duplicate Content. Indeed, a page in desktop version and a page in mobile version of the same website, when made separately, can be considered as Duplicate Content. The same applies if a new and old version of a web page do not have the dedicated tags.
  2. The second type of duplicate content is between several websites. A product description can be accidentally copied from another site, especially when the latter has comparable characteristics or when the two sites have the same content provider. But it can also be plagiarism, which is much more serious, because the original content is deliberately copied identically.It is in this case that the search engines penalize the pages and sites concerned most severely.


The methods and systems used by search engines to calculate similarity are strictly confidential.

What impact on your SEO?

Search engines aim to display the most relevant results in relation to the query made.

If two identical pieces of content answer a query made by a user, the search engine will waste time choosing which content to offer to the user and this will degrade the user experience. The engines therefore want to waste as little time as possible in carrying out these tasks while offering the right content. 

This is why they “track” duplicate content with very powerful and increasingly precise detection tools and algorithms (kept secret, of course…), sometimes in spite of the site having the authorship of the content.

Indeed, tests have shown that search engines display the content of the oldest and most popular site. This means that a site with the authorship of a content can be judged as a copy by the algorithm of an engine like Google, because it is less popular.

If a website copies other content on the web en masse, the whole site can be penalised and this can lead to a drop in traffic of up to 95%, or even the removal of these sites from the search engine results in the most extreme cases.

Moreover, if you put yourself in the place of a human and not the Google robot, obviously, the prospect or customer realizing that your content is identical to a previously browsed site, will offer little credibility to what you propose…

Finally, Duplicate Content also falls within the scope of Articles L111-1 and L.123-1 of the Intellectual Property Code (France), as it is subject to copyright. Plagiarising content can therefore be severely punished by law.

How can it be avoided?

  • Regularly check the content you offer


This starts even before the publication of new content. It is advisable to use solutions that allow you to check whether the content may be subject to Duplicate Content or not (as a reminder, it is impossible to know the algorithms used by the search engines, but it is possible to come close).

To do this, there are 3 use cases:

  1. Use the Natural Language Generation solution in addition to specific algorithms, such as Simhash which is used by Lizeo, to automatically generate content without Duplicate Content. Indeed, the NLG technology allows you to generate different contents, the algorithm allows you to check the Duplicate Content rate between them. You therefore have control over the rate of Duplicate Content between the content that you publish and can therefore drastically reduce this problem.
  2. Use online tools to compare content already published on the web with your own. These tools scan the internet to identify content that is similar to yours, and potentially content that is deemed to be too similar or duplicated. If your content to be published is too similar to existing content, it is up to you to make the necessary changes to your texts. There are a multitude of web tools that offer solutions to this need. If you have the time, it is possible to carry out these checks manually, simply by copying some of your content onto the search bar of an engine such as Google (don’t forget the inverted commas!).
  3. In the context of internal duplicate content, make sure that your URLs, Title and Description tags are unique. Web tools are also available to do your analysis. If you are translating content, the problem of duplicate content can also arise if you do not use the right tags.


  •  Pay attention to your settings


In the case where two contents on your site are voluntarily identical, you must integrate “rel=cannonical” tags with the original URL or 301 redirects, which will indicate to the crawlers which is the original content, which will be the only one to be indexed. 

You can also use the “Noindex” tag to avoid indexing a page with content copied from another page or another website.

If your website is multilingual, if you want to implement an international SEO strategy, or if you simply have identical content translated into several languages in several countries, you should use the hreflang tag. This tag makes it clear to search engines that the content is not competing with each other, but is aimed at different localities and therefore different audiences.

Finally, in the case where you are simply quoting sources, quotations, sentences quoted by experts, it is perfectly possible for you to display this content by putting your texts between the <blockquote> tags. Displaying this type of content on your website is even recommended to improve the user’s reading experience, prove your expertise to the user and improve your natural referencing.

It is possible that the CMS tools you use allow you to make such SEO settings (plugins…).


  • Write your contracts correctly


If you use an external service provider to write your content, it is necessary to specify that the content provided by the service provider will only be used by you and that this content has not previously been provided to third party websites.

With NLG, the preparation and quality of the data = the quality of the product description ​

The main challenge in generating descriptions automatically using Artificial Intelligence ‘Natural Language Generation’ technology is to have quality data despite the variety of sources, dimensions, complexity, etc.
NLG technology uses so-called “structured” data. This means that the data is structured in a predefined format. This allows the tool to read the same categories of information for each product.
If we take the example of the tyre descriptions made by Lizeo, thanks to structured data, the tool will be able to read for each of the tyres what the name is, the brand, the braking or comfort performance, the type of terrain on which each one is mainly used, etc.
Poorly prepared data that are not properly structured or of poor quality lead to misreading of information by NLG technology, or to errors that make descriptions incomprehensible to a human reader.
Putting in place an efficient data preparation and data quality system (definition of rules, deduplication, harmonisation, deletion of obsolete data, perpetuation, monitoring, etc.) are compulsory steps to transform the data collected into information that can be exploited by the NLG technology. In order to ensure that your data is correctly structured and of high quality, you can be assisted by a company with expertise in these matters, such as Lizeo.

* (2013) – Matt Cutts – Working in the Quality Research Group at Google

Thanks to the Natural Language Generation technology, marketing teams of e-commerce sites and retail companies now have the possibility to generate product marketing descriptions automatically on their entire product catalogue, with a structure and keywords that allow for SEO optimization, without Duplicate Content. 
Are you an e-commerce site, a manufacturer with a large online distribution network and would like to get SEO optimised, unique and multilingual descriptions to boost your sales?

In this article