Data Quality – Establish the “As Is”

Whether embarking on a data quality improvement program, implementing a new CRM system, developing BI, or attempting a full blown MDM solution, an early step in any of these activities must be to establish the current state of data across the organisation – the “as is”. The current situation will help define the scope of the problem and enable expectations to be managed.

Identify Data Entry Points.

Identify and document the points at which data enters your business such as:-

  • Operators take details over the phone and enter these through a data capture screen.
  • Customers / Prospects enter their own details through e-commerce applications i.e. your web site.
  • Data captured from physical media i.e. application forms.
  • External data sources such as suppression lists, bought lists of prospects.
  • New products are defined

Tip – get organised. This is the beginning of a data diagram which shows you how data flows through your business.

Identify Data Consumption Points.

Identify where data is used or retrieved such as:-

  • Operators answer customer queries and have to view the customers’ data.
  • Reports are produced such as the latest sales figures.
  • Mailing campaigns are created.
  • Sales / Orders are produced.
  • Orders are fulfilled.
  • Orders are despatched.

Tip – talk to people. Find out how they do their jobs. Don’t forget a point of consumption may also be a point of input i.e. order despatched – update the despatch date.

Identify Data Processing.

Data may go through a number of processing steps in between the point at which it is created and the point at which it is consumed:-

  • Data may be cleansed through third party applications.
  • Data may be copied to multiple data-bases.
  • Data may be merged with other data sources.
  • Data may be re-formatted.

Tip – each processing step is a potential source of data quality problems. Look for unexpected differences i.e. input count vs. output count, changes in data formats.

Identify Key Personnel.

Identify those people throughout the business who create, modify and use the data. Try and identify those who are responsible or make decisions relating to the data. These people will be needed as you progress with your data project in order to establish and agree business rules and data formats. They may also go on to perform the role of Data Steward within any data governance structure you set up.

Tip – don’t forget the same person may create, modify and consume data. Make sure you explain what is happening and why – you need to get people on-side.

Data Profiling.

Analyse the data with a view to documenting what you have. Start by simply looking at the data as this is a low cost / low tech starting point which can reveal basic issues such as:-

  • Missing data.
  • Poorly formatted data.
  • Quality of data capture.

More comprehensive data profiling is necessary to:-

  • Identify data patterns and formats.
  • Create frequency counts.
  • Identify relationships between data items e.g. product vs. age
  • Identify invalid data – data that does not meet specified business rules.

Profile the data at various points throughout its journey from input to consumption and analyse any differences.

Tip – data profiling does not have to be an expensive exercise. Much can be done with existing tools such as excel or by utilising open source applications such as Talend Open Studio for Data Quality.


With a comprehensive understanding of how data gets into your business, what you have, where it goes, where it is stored and who uses it for what purpose and what quality issues exist, you have a good starting point for any data related project. What you find at this discovery stage will set the scene for further development and progress and will assist in managing expectations relating to time, cost and quality of solutions.

This entry was posted in Data Quality. Bookmark the permalink.