You are an individual. There is, by definition only one of you in the entire universe. Fact.
But none of the information which defines you is unique. You live in a given town – so do lots of other people. You live on a street – It’s likely that there are lots of other streets in the country with the same name. You live at number 32 – 32 is not unique. Your name is John Smith – a very common name. Individually none of these pieces of data is sufficient to uniquely identify you. But surely the combination of all these pieces of data is enough : –
John Smith, 32 Acacia Avenue, Manchester
Statistically there is a probability of a second individual sharing the same basic details as you but living in a different postcode within the same town. A study of UK electoral roll data puts this probability as less than 1 in 50,000. In other words there is a 0.00002% chance that there is another person who shares you name, your house number, your street name and your town.
However the electoral roll is just one data source, designed to hold the details of each individual once only. What happens if we repeat the study but include other data sources? In an ideal world with accurate data and accurate matching, whilst we would expect to find duplicates for you (you have more than one credit card – right?), we would not expect to find your details at a different address to the one you currently live at (actually there is a 0.0002% chance that we would). However this is found not to be the case. The more additional data sources we throw in to the mix, the higher the probability of finding your Doppelganger. Include enough data sources and the probability can increase to around 1 in 100. In other words there is a 0.01% probability of an organisation having your address details incorrectly stored i.e. they may have the bulk of the address correct but the postcode / district information is wrong. So the chances of you having a data Doppelganger are 500 times greater than having a genuine one.
How does this happen? It can only be down to data capture problems and erroneous data processing, particularly with address verification & enhancement services. I know this happens because I have seen it with my own eyes. Often addresses are presented for correction that are incomplete. The house, street and town may be present but that’s it. Now if that combination is unique – no problem. If there happens to be more than one instance of those elements on PAF then there is a problem. Which one does the software pick. The right thing to do (if there is genuinely no way to decide) is to pick neither. The address is unresolvable. It could be one of many. Unfortunately some applications may be set (deliberately or inadvertantly) to pick one of the addresses at random. After all there is a 50% chance of being correct. And who’s to know? And if customers choose the service that reports the highest number of matched addresses….