There are many components of a data migration project that make such projects challenging – the technology, the politics, requirements, security, systems integration, budgets etc. So much so that the data itself can get overlooked. Crazy since the data is what the project is all about in the first place. And its the data itself than can end up being the biggest headache, particularly Customer Data. Scarily, there is a real danger that this headache will not be felt until the migration is completed and the “new data” actually see the light of day.
So what, from a Data perspective makes data migration projects such a challenge? In particular, what makes Customer Data migrations so arduous.
Customer Data Migrations in the simplest terms involve:-
- Moving data from one source or multiple sources into a new repository.
- Transforming data from its original format(s) into a single standardised format.
- And in the process identifying duplicate customers and taking some action to resolve those duplicates that are found.
So if you are about to embark on a Customer Migration program it would be a good idea, at the planning stage to make some sort of assessment regarding the complexity of the data component of the project.
There are four areas that I would focus on for each data source to be migrated.
Source System Documentation
Having accurate, well maintained and reliable documentation on the source data is invaluable. Without it you will be relying on analysing the source data and in some instances relying on experienced guesswork to establish the detailed content of some data elements.
Following a period of analysis and investigation including interviewing relevant personnel, I would attempt to assign a score relating to the quality of the available documentation:-
- Fully maintained documentation as part of a mature Data Governance program
- In use documentation plus local knowledge
- Historic documentation plus local knowledge
- Historic documentation – accuracy unknown
- No Documentation or local knowledge
Data Capture Method
How was the data captured in the first place? Or more specifically who captured it?
- Customer sourced – entered on-line by the customer themselves.
- Service Desk sourced – captured by an employee e.g. over the telephone.
Again I would attempt to assign a score along the lines of :-
- Fully trained and quality motivated as part of a mature Data Governance program
- Well trained staff with quality based targets
- Well trained staff with no or quality based targets
- Partly trained staff with no or quality based targets / Customer entered data
- Untrained / minimally trained with quantity or time based targets
Of course the actual definitions behind each score may need to vary depending on individual circumstances. And you can argue about the score given to customer entered data but its a good start.
Data Capture Validation
How rigorous was the data entry validation? In other words how much effort has gone into the design of the data entry screens in order to prevent invalid data and variations from entering the dataset.
So the scoring here might be something like:-
- Extensive business rule application to ensure data accuracy and duplicate checking as part of a mature Data Governance program.
- Extensive application of business rules and use of drop down lists
- Some application of business rules
- Basic validation – Alpha-numeric, Date formats
- All elements free form with zero validation
Has the data been maintained? We all know that data quality deteriorates over time, and even though a data validation / correction step is highly likely to be part of the migration process (Address correction, Gone away screening etc) knowing the history of the data and its provenance gives us some idea of the likelihood of unforeseen issues “crawling out of the woodwork” somewhere down the line.
So again, the scoring applied would be something like:-
- Fully maintained as part of a mature Data Governance program (address updates, name changes applied, gone-away screening, contact preferences updated etc)
- Regular address updates applied, some Name and preference updates applied
- Regular address cleansing applied
- Occasional / ad hoc cleansing applied
- Data is never changed once it is captured
So by assigning the four scores to each data source being migrated it is possible to create an overall ” migration complexity score” that, although somewhat rough and ready, I would argue is certainly better than sucking air through ones teeth and saying something like – “dunno Guv, could be tricky…”.
The simplest project would involve a single source scoring 4. Everything just gets more complex from there upwards and this can be used to inform the planning stage of the project.