Data Blog by Lizeo

Impacts of Dirty Data in your competitor price analysis

Impacts of Dirty Data in your competitor price analysis

There is an increase in the daily availability of data volume for on and offline competitor market prices for businesses.  
This data is characterized by multiple formats (due to a disparate price data display on different digital platforms, or in the way they were collected) which complicates the task of providing clean, uniform, and matched competitor price data for analysis. Diverse data, in multiple formats is called dirty competitor price data.
There is no point in developing your price intelligence with this data unless you prefer spending a tremendous amount of time trying to make it talk.
Let’s take a look at dirty competitor price data in the context of the Tire Industry and its impacts.

What is Dirty Data? ​

Dirty data is a general expression defining data that is inaccurate, incorrect, inconsistent, duplicated, incomplete or that violates business rules.

Below is a list of the 6 most common dirty data with examples applied to competitor price data for the Tire Industry:

Incomplete data​

Incomplete data has missing fields or values that is necessary and mandatory to run a pricing process. In the Tire industry, technical attributes of a tire such as Load Index, Speed Index or OE Marking have a strong influence on the price. Example: 205/55 R16 91V and a 205/55R16 94H are 2 different tires (different Load Index and Speed Index) with 2 different prices. If your dataset of competitor prices is missing these fields, your pricing analysis will be wrong.

Duplicate data

Duplicate data might be one of the most popular dirty data examples.  Most companies deal with this issue with duplicate customer records in their CRM, duplicate products in their Master Data Management system or in their ERP. In the analysis of online tire sell-out prices, duplicate data can slow down the efficiency of the pricing analyst: 2 tires look different due to a misspelling but are in reality the same (Michelin Pilot Sport 4 and Mich PS4). Aggregation is then necessary to ‘attach’ these tire to the same price line.

Incorrect data​

Incorrect data can be defined as field values that is outside of the valid range of values. In the Tire Industry, this could be illustrated with a tire size (geobox) that does not exist: 195/25 R23.

Inaccurate data​​

The definition of accurate data can be summarized by the following question: Does the data accurately represent the scope you defined in the first steps of the price intelligence requirement definition? Data can be intrinsically correct but inaccurate given the scope of business context. An extreme example could be to perform a price analysis of a Nordic tire with data coming from Spanish websites.

Business rule violations​

Business rules are essential to turn ‘standard’ competitor price data into your vision of the business and market. This data is specific to the industry, the business process and context. For tires, the season is critical to perform accurate market price analysis. Mixing summer tire price data and winter tire price data would be a business rule violation.

Inconsistent data

Data Consistency can be defined as a stable definition of the data and/or the field values over time. In another words, data is produced regularly within a regulated and predictable framework. For a tire, the way the dimension is displayed online is a good example: 205/55/R16 or 205-55-R 16 or 20555R16. This can lead to inconsistent data in your database during data collection.

What are the impacts of dirty data on competitive price monitoring?

According to Gartner’s Data Quality Market Survey in 2017, the cost of dirty Data for companies is estimated at $15M/year on average.

That cost may be underestimated as the survey was mainly directed at Marketing departments, who are huge data consumers, but not the only ones. 
For pricing teams, the impact of dirty data is not only at competitor price analysis, but throughout the whole pricing journey.

Impacts of Dirty Data in your Pricing Process

Beyond the extra time spent cleaning competitor price data, there are direct impacts of Dirty Data on your pricing journey.
In the Tire Industry, inaccurate market price analysis can happen due to many issues in the tire price data:
  • A mix of tires with and without OE Marking
  • Duplicates: Michelin Pilot Sport 4 vs Mich. PS4
  • Inconsistent price level: unit price mixed with group prices (basket of 1 or 2 tires)
  • etc.
This dirty data also has a direct impact on internal projects such as delays in deploying a new process, a new tool or a new solution, but can also affect the trust and credibility in current analytical tools (Business Intelligence).
For pricing tools and platforms, dirty data will become a nightmare for data flow matching and pricing strategy implementation will be unable to deliver the expected value.
  • Inexact matching between competitor price data with internal data (sell-in price data, sales volumes, etc.)
  • Increase complexity in building tire comparison panels to set the pricing rules strategy
In the end, the major cost will be a potential loss of revenue and market share due to a bad price setup.

Impacts of Dirty Data in your Pricing Data Science Project

Data Scientist is the dirtiest job of the 21st Century

According to a survey led by CrowdFlower, Data Scientists spends between 60% to 80% of their time cleaning dirty data…all before doing what they are good at: statistics, modelling, etc.

To assess a rough estimate of the cost of Dirty Data on Pricing Data Science projects, let’s do some simple math:

Yearly / Average cost of a Junior Data Scientist (according to Glassdoor): $200k/year (estimate). Based on the fact that he/she spends 60% of the time cleaning data, it costs $120k/year per Data Scientist.

And of course, there are hidden and indirect costs created by dirty data
  • Delays in your Data Science Project and expected insights
  • A drop of motivation of your Data Science team
  • Impossible to run Machine Learning or AI tools

How to get rid of Dirty Data?

Without standard guidelines and processes to start and keep competitor price data clean, dirty data issues are bound to happen.
Productivity is lost when the pricing analysts waste their time checking the accuracy and reliability of the price data they are playing with to build up their analysis and extract insights for their management. The same issues occur with data scientists who are mostly occupied in cleaning, normalizing, and preparing data before playing with statistical models or machine learning tools.
Concretely, the first step of your journey to get rid of Dirty Data is data cleansing.

If you want to know more about Data Cleansing:

In this article