On the front page of the Sunday Times money section this weekend was an article concerning a well known credit reference agency who had managed to merge data from two different individuals into a single report, thereby allowing each individual to view very personal and sensitive financial data about each other. Not good.
Of course I knew almost immediately what the source of the problem was given the title of the article and the fact that when the poor lady concerned rang said agency to complain, she was asked immediately if she was a twin.
It has to be due to the reliance on the Date of Birth in the match routines. I know from experience the lure of this particular piece of personal data.
You see name and address matching is very difficult. I will quality that. It is easy most of the time yet it is nigh on impossible to get it correct in 100% of cases. The problem facing credit reference agencies is that they are presented with vast amounts of data from multiple sources with little or no control over the quality or accuracy of that data. They just have to deal with what they are presented with and create the holy grail – a Single Customer View by using sophisticated name and address matching logic. This logic has to decide whether a piece of data for person A at address B should be merged with data from person A2 at Address B2. If this data in both cases is complete and accurate this should be easy. But more often than not it isn’t. There are differences or variations in the way the data is presented. For example Bob Smith vs Robert Smith. Andrew Jones vs Andrwe Jones. Peter Brown vs Paul Brown. Now the software utilised is invariable very clever and sophisticated and by and large will do a very good job. However there is always a grey area. When does a variation in data become a genuine difference? That is every developers nightmare question. I personally must have lost many nights sleep over that one over the years.
That brings me to the allure of the Date of Birth as a match element. One would think that this piece of data could be used to overcome many of the variations presented. And one would be correct – mostly. The argument is one based on probability. Statistically it is not very likely that there will be two people at the same address with the same date of birth so statistically it is relatively safe to use a matching Date of Birth to “over-ride” other differences in the data. “P Smith” vs “Peter Smith” with the same Date of Birth? Most likely to be the same individual. So in the business environment where agencies are judged over whether they get more matches than others it makes sense to utilise the “power” of this piece of personal data. But there have to be limits. One has to question the validity of using this data to over-ride definite differences in the data. Matching “Janet Smith” to “Louise Smith” simply because they share the same Date of Birth? Surely a risk too far. Perhaps the person in question is “Janet Louise Smith” and she is using different names in an attempt to confuse…but perhaps not.
I have heard people argue that the Date of Birth itself is less subject to variation / miss-typing than other name and address elements. In my experience this is not so and was definitely not the case historically. I have seen individual data that has been generated by “splitting” joint account data where the single Date of Birth present on the joint data has been posted onto both individual records – clearly wrong. And think about it. If your bank has your name and address details wrong you would most likely know about it. But if they held the wrong date of birth? How would you know?
When I was asked to incorporate Date of Birth into a SCV solution that I had developed, a detailed study of its impact led me to the conclusion that it could do more harm than good!It certainly wasn’t going to be the “silver bullet” that many in the business hoped it would be.
In the end its all about risk. The risk of false positives versus the risks of false negatives. I would like to think that the agency in question was aware that its matching algorithms could be somewhat “gung ho!”. That it had weighed up the pros and cons and decided that overall it was better to get more matches than fewer. I’ll bet they were able to point to lots of impressive correct matches that would never have been possible had the Date of Birth not been used in the way it was. But whether those benefits outweigh the negative publicity generated by blunders such as this is questionable.
Date of Birth – Use with care!!