Fuzzy Matching – Catch 22

Data will never be 100% accurate. We all know that. But as data processors we are still expected to ensure that data is cleaned and screened accurately. Imagine the consequences of sending an inappropriate mailing to the widow of a man who’s details were accurately stored on the Mortality suppression file just because of a minor typographical error in her surname.  Excuses are unlikely to neither compensate for the anguish caused nor be acceptable to the regulatory bodies.

In order to overcome data quality problems, data matching software can use an array of “fuzzy match” techniques.  These can vary in sophistication and effectiveness but they are all intended to do one thing – identify a match between data that is by definition different. If the data was identical we wouldn’t need the fuzzy logic. The fact that we do means that the data is not identical. And there is the risk. By choosing to use the fuzzy logic we are making an assumption that the data should be identical even though it is not. That assumption carries a risk of being incorrect. The more fuzzy match rules that apply the greater the risk.

So don’t use fuzzy logic and you will fail to find matches that you really want to find. Use fuzzy logic and you risk the false positives.

There is no simple solution to this conundrum. It’s a case of being clear about the business rules that should apply to a particular application, and understanding and evaluating the risks associated with making an incorrect matching decision. This understanding can then be used to choose the right software solution for your business.