Consultant, Data + Design + Culture. I wrote the book on open standards for data http://standards.theodi.org | Open Standards Board | Open Heroines | FRSA
You want people to use your data. They want confidence that they can trust your data and rely on it, now and in the future. A good open data policy can help with that.
An open data policy sets out your commitment to your open data ecosystem. It should detail how you will collect, process, publish and share data. It will set expectations for anyone using your open data and if you stick to it, lead to confidence about what to expect.
How do I make my open data as useful as possible? How do I connect it with other data to boost insight? How do I answer really tough questions with open data? Make it play well with other data – make it interoperable.
interoperable
(ˌɪntərˈɒprəbəl)
adj
(ComputerScience) of or relating to theability to sharedatabetweendifferentcomputersystems,esp on differentmachines:interoperablenetworkmanagementsystems.
Why should you care about this?
If you want your open data to help answer questions, solve problems, boost the economy by fuelling innovation or used in research, you need to go beyond names and places.
Do these mean the same company?
ACME
ACME Limited
A.C.M.E
How about now?
GB-COH-123456: ACME
GB-COH-123456: ACME Limited
GB-COH-123456: A.C.M.E
Bit more confident? You can take that code 123456* and find the company on Companies House (Hint, that’s what the GB-COH- tells the machine using your open data!). Go you, you’ve just opened up a whole new world of information! This example is using a shared standard way of talking about organisations, find out more on org-id.guide.
(* P.S This is just an example, ACME doesn’t really exist!)
Now what?
You can start to answer question like this:
Answer tough questions with good quality open data
These codes or Identifiers are a gold mine. Every country has agencies that give codes to businesses, charities, non profits and more. Use those codes where you can.
Can I share codes for anything else?
Of course! You can identify places, things, categories, types and much, much more.
Tip: Make your open data more useful by making it easy to connect with other data.
Exploring how 360Giving underpins the data infrastructure for charitable grant making and how it supports an information commons for the sector. It all starts with a little clarity.
For 2 years, I helped funders open up information flow in the non-profit sector by publishing what they fund. This is powerful insight. Understanding the 360Giving data standard is crucial for more funders to adopt it.
Funders need to know:
What is the data standard?
What must be provided, what’s recommended and what’s optional? (and why?)
How does the standard fit together?
How does our data map to the standard?
How can we ensure we’re telling the true story of our funding?
Supporting the standard meant creating tools, reports, and visualisations to provide clarity and provoke discussion (the standard isn’t static, so your voices as funders, data users, tech and non-profits are hugely important).
One question I hadn’t answered to my satisfaction was “How does the standard fit together?”. So I created a data visualisation to explain what 360Giving helps you share and how it’s put together to support good quality open data on funding.
360Giving Data Schema Visualised
With funding information shared in a similar way, charitable grant making organisations can ask & answer questions like:
How can we share the story of our funding?
Can we find partners by sharing our grant making?
How can we tackle our shared missions together?
Sharing data openly connects organisations. That’s why open data is the basis of a shared charitable sector information commons. Historically, the non-profit sector had it tough – no-one wanted to fund infrastructure. Here’s what Friends Provident Foundation‘s Danielle Walker-Palmour had to say at a social investment event:
No one wants to fund infrastructure – we need to think of infrastructure as a commons to achieve our sector’s collective goals.
Times and perceptions are shifting; Barrow Cadbury‘s Connect Fund is making headway investing in infrastructure for social investment. Similar initiatives are expected to follow.
A shared commons of information needs standards that make information simple to combine, easy to understand and usable by organisations of every size. The 360Giving data standard is an integral part of the commons and the sector’s data infrastructure. The goal? A shared information commons that sees more of the non-profit sector working together, seamlessly.
A few years ago, I worked with an organisation that sells automotive intelligence to streamline the way they got insight from data. I came up with a generic data pipeline to explain to the board how their new data science process could work. It was a hit!
Visuals are a great way to explore a concept and explain a process that could otherwise lose folks along the way.
The key to a good data pipeline is it’s part of an overall process (not shown here) where you know what the problem is, why it’s important to solve it and that data is definitely going to help.
The pipeline focuses on continuous feedback – feedback at every key stage of the process. This could be to the problem owner, other teams, or any other stakeholder to keep them informed and fold their feedback back into the pipeline.
So, here’s my blast from the past – feel free to substitute out Domain Data Science step for other processes that make sense or drop it altogether; whatever works for your situation.
Good quality open data is accessible; easy to get hold of and easy to use.
Accessible means different things depending on your audience and how much data we’re talking about.
Mostly Human
Is your audience mostly human?
Publish files humans can use, like spreadsheets for information in rows and columns, and shapefiles for geography.
Tip: Make sure the files aren’t too big to open on an everyday person’s computer.
Mostly Machines
Is your audience mostly machine?
Machines can work with spreadsheets or formats like JSON or xml, designed for exchanging data.
Do you have a little data? Start with files but think about making it even easier for machines with an API – a way of accessing information. Think of API as a bartender that serves up your open data.
Do you have a lot of data? You definitely want a good API that lets machines ask for a little data or a lot, ask for new data or just what’s changed, depending on their needs.
Bulk Is Good!
For everyone
Make it easy to get all your open data in one go – a bulk download. If you’ve got far too much for one download, break your open data into manageable chunks. Don’t forget to squish those files down as much as possible by zipping them up.
Sometimes information is missing. Maybe it was never collected, maybe it was wrong. Whatever the case, let blanks be blank.
Are you missing something?
Don’t use placeholders when you mean “I don’t know”. Using abbreviations like “N/K” or even words like “Unknown” can seem helpful but are only really useful for humans who speak your language (and understand what the abbreviation really means!)
Don’t use placeholders
Placeholders make it harder to work out that there is actually something missing. Think about all the ways you can write “Not Known”! It’s far better to leave a blank. Blanks are familiar and can be picked up by lots of tools.
The license, a text that describes how data can and can’t be used, is a must for open data. Without that clear statement of use, it’s impossible to tell if the information isn’t subject to copyright or other restrictions, so you’re using it at risk of litigation and legal challenges.
A good open data license has few restrictions – it allows as many people as possible to use the data with as few conditions as possible. The more conditions on your data, the harder it is to use it with other data.
Open government license
Let’s see some examples of excellent open data licenses: