When we talk about coverage or completeness, we want to know a couple of things. First, what’s there? and second, what’s missing? We want to survey the land and get a short but complete overview. How do we do this? We look at our data from more than one angle.
A map is not the territory…
Data is a tool
It represents something we’re interested in. That thing could be cars, loans, flowers, or cups. Whatever it is, we want to record or review information about it. Knowing can help us sell the right cars, guide our clients to the right loans, report on the state of the flower industry or manufacture more instragrammable cups.
Data describes concepts
It represents ideas we’re sharing. There are many styles and shapes of cups in the world, but the icon of a cup is pretty much universally understood. I may not know the style or shape of your cup but I understand “cup-ness”.
How does this help us understand completeness?
Let’s take a step back. We’re unlikely to be interested in every cup that ever existed, so we have a scope. Let’s say we’re interested in cups we make and sell. Our universe of cups is limited to just those cups.
We want to know a few things about our cups: the materials we used, how large or small they are, that sort of thing. So we decide on headings or columns for each of the attributes (information about cups) that we’re interested in.
This list of things about cups is the schema. It’s a template that describes what we want to know about cups. It isn’t our data on cups (we’ll add that under the headings) but it gives us some direction about what to record.
Unlike the concept of a cup, the schema of a cup isn’t intuitive. We’d struggle to instantly recognise “cup-ness” by looking over this list. We’ve taken reality, abstracted it to a concept then made that into a schema which is the container for our data.
So back to completeness. When we talk about completeness, we could be talking about the concept or the schema. These are different questions but together gives us insight into the state of our cup data.
- Concept – How many cups are we reporting?
- Schema – How many cup attributes are we reporting?
Concepts & Schemas: How are they different?
In general, when we talk about a concept of a cup, we have a list of information we need to understand “cup-ness”. So we may agree it’s not a “cup” unless we have these things: cup#, name and type. That’s close enough to our concept of a cup that we can ask questions about the number of cups. This is the sort of information we use to plan campaigns, make strategic decisions and launch new cups to the market.
In reality, we don’t record everything diligently. We miss things out for a host of reasons. This is even more obvious when we aren’t recording the data ourselves.
Data has gaps
Understanding where those gaps are is important. Gaps affect how we report on concepts. If we’re missing cup names, that reduces the number of cups we report. We use information about gaps to improve our data collection so that we can make better strategic and planning decisions.
The upshot? To understand how complete our data is, we survey our data landscape in two ways: by concepts and by schema. We can count conceptual cups or count cup attributes to find what’s there and what’s missing. The two strands help us understand what’s going on in our data.
- Some things (or attributes) are more important than others, they map to concepts;
- Some things are conceptual (“cup-ness”) others are schematic (the cup attributes);
- Some things are more useful for planning and strategy (concept) and others for improving data quality (schema).