Now is the time that ‘Data’ specialist expertise comes into play.
The description of a tire sold online is one of the elements that can vary significantly from one e-commerce website to another.
The list of attributes which constitute the description of a tire and which can vary are as follows:
- The brand
- The product name, with correct punctuation
- The technical markings (runflat, OE Marking)
- The tire manufacturer code
To be able to attach the prices collected to the right tire, it is necessary to “understand” and decipher the key elements displayed which allow this product to be identified in order to compare them with a ‘specific’ reference base of existing tires (current or in the past) and validated (with an official source attesting it). This is the data unification step.
This referent database is the cornerstone of an efficient matching system to deliver data with an unparalleled level of quality.
The comparison of data collected with this referent database checks that the information found online for this tire exists: does this tire come from this brand? with its technical attributes? etc.
This allows data collected online to be systematically and automatically categorized:
- Unknown product: is it new?
- Known product: attachment of the collected price, its source and the collection date
- “False” product: the combination of information is not possible - the product does not exist
Finally, the most outrageous prices are also filtered (e.g.: a touring tire sold for less than $20 is unlikely to be real)
The combination of a highly qualitative tire database, matching technologies, Machine Learning algorithms and product marketing expertise provides quality data with a high level of completeness.
This will allow you to focus on your business (the analysis) and gain in efficiency. Indeed, analysts and Data Scientists spend between 50 and 80% of their time cleaning data before they can start handling it!