What is reference data anyway? And why does it matter?
Imagine a choir turning up for practice. The choirmaster says “We’re singing A mighty fortress is our God”. Everyone smiles, they all know this song. At the signal, they start to sing. Half sing “A mighty fortress is our God, a bulwark never failing”, the other half sings “A mighty fortress is our God, a trusty shield and weapon”. Oh dear. Perhaps it’s time to get the hymn sheets out.
Reference data serves more or less the same purpose as those hymn sheets. It won’t help you sing more harmoniously, but it will make sure you are all singing from the same hymn sheet.
So, that’s one reason we need reference data – consistency. Here’s another: imagine again we’re at choir practice and we want our group to be the best. We listen to each person sing and get those with similar singing voices to stand together. We can see we have mostly sopranos, maybe a few altos and a contralto. Putting things, people, concepts and more into groups helps us understand them better. It helps us find patterns, add rankings, hierarchies and more.
Consistency and classification are two great reasons to use reference data. Now comes the dilemna. In our own organisations, we have some control over the reference data we use. It could be titles like Mr, Mrs, Ms, etc. It could be genders like neutral, male, female. It could even be a list of months of the year. Outside our organisations, we usually need to find common ground. That’s where standards come in. They can be industry-wide, issued by public bodies or controlled by private organisations. Even better, they could be freely available as open data.
The secret to the success of reference data is consistency and we get that by using a unique label – something that doesn’t change (at least not quickly) and is used to pinpoint exactly the reference we’re talking about. We call these identifiers. Think about your NHS number, your social security number, or your VIN.
What reference data do you use in your organisation and who owns it? I’m working with the Open Knowledge Foundation to curate, publish and more importantly, maintain, high quality reference data. Is there reference data you need that isn’t readily available? Get involved: contact me in the comments or on twitter; @ekoner.