Sometimes information is missing. Maybe it was never collected, maybe it was wrong. Whatever the case, let blanks be blank.
Don’t use placeholders when you mean “I don’t know”. Using abbreviations like “N/K” or even words like “Unknown” can seem helpful but are only really useful for humans who speak your language (and understand what the abbreviation really means!)
Placeholders make it harder to work out that there is actually something missing. Think about all the ways you can write “Not Known”! It’s far better to leave a blank. Blanks are familiar and can be picked up by lots of tools.
The license, a text that describes how data can and can’t be used, is a must for open data. Without that clear statement of use, it’s impossible to tell if the information isn’t subject to copyright or other restrictions, so you’re using it at risk of litigation and legal challenges.
A good open data license has few restrictions – it allows as many people as possible to use the data with as few conditions as possible. The more conditions on your data, the harder it is to use it with other data.
Let’s see some examples of excellent open data licenses:
The golden rule for open data that’s useful is consistency.
Consistency means picking a naming strategy for your files then sticking to it. This makes it easy to spot that files are missing or out of place.
You’ll also want to keep your table headers the same for each new file so that anyone using your data, for example combining files, can do that easily. Changes to your headers break code and make your files harder to use.
Finally, keep your contents the same. ’12’ and ‘twelve’ aren’t the same thing. This makes it harder to use the information for analysis [see Tidy Data by Hadley Wickham].
Tip: If you can’t do maths on it, it’s text not a number.
Good quality open data comes in the form of data not a report. What’s the difference?
For data, put one thing in each column so that the values are friendly for machines and easier to re-use. In the first row, you can see a payment of £18,000 was made on the 3rd of November 2016 in the Yorkshire region. It’s not as friendly for a human but it’s a great starting point for analysis and machines.
A report is friendly for humans and easy to read. The way the information is laid out makes it look nicer but is a headache for machines. To use this data, you’d have to do more work to make it tidy [see Tidy Data by Hadley Wickham].
Tip: Make your open data more usable by making it easy to use.
In it, I reveal the confusion finding how many hospitals there are in the UK. So many public bodies publish their own, slightly different lists. As someone who supports people sharing who they’ve given money to, I’d like to see one single list with a hospital’s identifying number. I’d like that list to be complete, accurate and kept up-to-date so I can recommend it to people preparing open data
Please suggest the tools you find useful and add your experiences with them in the toolbox or through the open contracting groups. There’s a wealth of tools and resources that help anyone involved in OGP, so do take a look around while you’re there.