How can open data help our city? Part 1

Cities are important urban areas where many of us live, work, study and raise families. There’s a push to make cities smarter.

Why? A smart city is a connected city – a place where Anna can get around easily because she can see her bus and train schedules in real time. A smart city is effective, integrated and innovative – a city that embraces Gina’s startup and makes it easy for Ali to find the best school for his daughter.

A smart city uses open data. 

Open data is knowledge for everyone; it’s information that can be shared with anyone for any purpose without restriction. We aren’t talking sensitive or personal information, we’re talking about the data that drives decisions and is the lifeblood of a city’s civic landscape. 

Open data helps cities connect people and organisations, share information, create new tools, products and services. Using open data well means smarter cities. This series covers everything you need to know about open data for smarter cities. First, let’s get you started.

Where do I start with open data for my city?

It’s hard to know where to start with open data when you’re a city. With many moving parts, stakeholders and a huge wish list, getting started can be daunting. Here’s the key things to consider:

  1. Start with why: Have some idea of why you’ll need open during ata and how you’ll measure the success of your initiative. It doesn’t have to be perfect at the start but it will keep you on track.
  2. Think about delivery: You’ll need a way to deliver open data to the people and organisations that will use it.  There are several platforms and tools available. The right one will play nicely with your existing platforms. Remember, don’t reinvent the wheel!
  3. Find your audience:  Get to know who could use you open data so you can work out their needs. Check FOIA requests – that tells you what people want to know!
  4. Work with your community: You’ll need a community engaged around data, including developers, citizens & businesses – back to those FOIA requests and materials you already publish. What are people interested in? Who are these people?
  5. Make sure your open data all plays well together: Open data doesn’t have to be perfect. Start where you are and keep improving. Think about what’s needed to connect data together from the start: how would you connect data on parks to data on air quality.
  6. Think value, value, value: Think about the benefit of sharing and connecting data. Local government departments will probably be biggest user and benefit the most from connected open data, so keep them onside.

For more resources, the Open Data Institute is a good place to start.

     

     

    3 data wrangling lessons from Arts Council England national portfolio

    What questions can we ask? Will this data help solve our problem? Can we use this algorithm or that one?

    Welcome to data wrangling 101. Exploring our data before we dive in and start playing with it or reshaping it means more productive data science or data analysis. If you’re lucky, you know enough about the domain to understand the quirks a dataset throws your way or you have someone to badger. On your own with an unfamiliar dataset? That happens too. So here’s 3 lessons from wrangling the Arts Council England 2018-2022 national portfolio dataset.

    First, a little bit about Arts Council England:

    Arts Council England is a public body supporting arts and culture in England. It is funded by public funds from the UK government and the National Lottery. Between 2015 and 2018, it will invest £1.8 billion in arts, museums and libraries. The funds will support art and culture experiences including theatre, digital art, reading, dance, music, literature, crafts and collections.

    Why on earth are we interested in the national portfolio dataset?

    The National Portfolio programme supports organisations considered by Arts Council England to represent the best of global arts practice. Funding is given over multiple years, currently 3. Between 2015 and 2018, £1 billion will be invested in 663 organisations.

    That’s a lot of money and  lot of prestige! I’m still exploring the dataset but here’s what I’ve learned so far.

    Lesson 1: Test your assumptions

    My first assumption was a bust. One thing it’s usefuk to know is “Which fields make the data unique?”. This helps us report on stuff like “How many grants were issued by the Arts Council?” and “To how many organisations?”. It was easy to jump in at first glance and say the organisation’s name, the Applicant Name. Unfortunately, an organisation can be awarded under multiple funds.

    Ah OK, so maybe Applicant Name and the type of fund, the Funding Band? At first that worked great but then 1 rogue entry popped up… It turns out that most of the time, an organisation gets 1 grant, sometimes 2 but Tyne & Wear Archives & Museums got 3!

    Arts Council England - Anomalies
    Arts Council England – Anomalies

    The upshot? Test your assumptions. This might be an anomaly or it might be legitimate. We can’t always tell, so we’re going to have to ask.

    Lesson 2: Don’t be afraid to ask

    📢 Data isn’t a perfect reflection of the real world.

    When we collect, share or use data, we curate it. We make decisions about what and how much detail to include. We can’t assume that data is perfect, so sometimes we have to ask the hard questions like “Why was Tyne & Wear Archives & Museums awarded 3 grants?”

    Other oddities cropped up in the data that needed that human touch. Arts Council England share a lot of geographic information. Check out what you can find:

    • Local Authority
    • ACE Region
    • Area
    • ONS Region

    They’re all slightly different. Some are clearly internal like ACE Region and others are official geographies like ONS Region. But what about Area? I was stumped, so I asked the very friendly Arts Council England support team.

    Here’s what I heard back:

    Dear Edafe,

    I have heard back from our Digital Team and they advised that the ‘area’ column on the sheet attached by the person making the enquiry refers to Arts Council areas, these are:

    • London – comprising NUTS 1 region of London
    • Midlands – comprising NUTS 1 regions of East Midlands and West Midlands
    • North – comprising NUTS 1 regions of North East, North West and Yorkshire and the Humber
    • South East – comprising NUTS 1 regions of East of England and South East (excluding the county of Hampshire, and Unitary authorities of Isle of Wight, Portsmouth and Southampton)
    • South West – comprising NUTS 1 regions of South West plus the county of Hampshire, and Unitary authorities of Isle of Wight, Portsmouth and Southampton

    More information on the areas can be found here: http://www.artscouncil.org.uk/about-us/your-area

    The organisations labelled National are certain Sector Support Organisations with a national remit.

    The NUTS 1 region which each organisation is located in can be found in the column headed ‘ONS region.’

    Hope this helps.

    Ah, that’s really handy to know. If we need to, we can map Area to Nomenclature of Units for Territorial Statistics (NUTS) regions or decide if we know enough about geography from other columns and can ignore Area.

    The upshot? Don’t be afraid to ask. Making assumptions can come back to bite you. If you can, ask someone who knows so you understand their design choices. You don’t have to do this for every single column, focus on the ones that are most likely to solve your problem. You can also come back as you iterate. Remember, it’s a cycle.

    Lesson 3: Remember it’s a cycle

    There are a few methodologies,  good practices and guidelines that help you punch through the worst bits of data wrangling so you can get to the good bits. You might be data mining or predicting or deep learning. No matter your intended application, you’ll most likely be iterating – going around in a cycle of try, test, understand till you have a good enough answer.

    When you first start working with data it can seem overwhelming. Remembering it’s a cycle will keep you sane. You might miss things the first time, that’s OK. That’s why we test and iterate.

    In conclusion

    I started exploring the the Arts Council England 2018-2022 national portfolio dataset to answer a friend’s question and then to streamline my practice. Along the way I made assumptions, backtracked, tried data visualisations that didn’t work and rolled my eyes – a lot. Each iteration, I learned something new and useful about the story of national portfolio funding for the next 3 years. I hope you have too.

    Arts Council England - National Portfolio Dataset - Column Count
    Visualising Column Count

    Featured image: Arts Council England – Sign on the door by Howard Lake (CC BY-SA 2.0)

    Legacy Code Rocks: Open Data with Edafe Onerhime

    I just loved chatting with Andrea on the Legacy Code Rocks! podcast. Listen: Open Data with Edafe Onerhime

    Edafe Onerhime is a consultant on Data Science and Data Analysis who has over 20 years of experience answering difficult questions about open data. She has helped governments, charities and businesses make better decisions and build stronger relationships by understanding, using and sharing their data. In this episode, we discuss the history of open data, its importance in building communities and its similarities to open source and open science.

    Have a good open data policy

    Can I Trust Your Open Data?

    You want people to use your data. They want confidence that they can trust your data and rely on it, now and in the future. A good open data policy can help with that.

    An open data policy sets out your commitment to your open data ecosystem. It should detail how you will collect, process, publish and share data. It will set expectations for anyone using your open data and if you stick to it, lead to confidence about what to expect.

    You can create your own open data policy from the Open Data Services  open data policy template, check out the Sunlight Foundation guidelines or Socrata’s How to develop your open data policy article. Here’s some open data policies in the wild:

    Remember: It’s not enough to have a policy, you have to stick it to build trust and confidence in you as an open data publisher and in your open data.

    Make It Play Well With Other Data

    How do I make my open data as useful as possible? How do I connect it with other data to boost insight? How do I answer really tough questions with open data? Make it play well with other data – make it interoperable.

    interoperable

    (ˌɪntərˈɒprəbəl)

    adj

    (Computer Science) of or relating to the ability to share data between different computer systems, esp on different machines: interoperable network management systems.

     Why should you care about this?

    If you want your open data to help answer questions, solve problems, boost the economy by fuelling innovation or used in research, you need to go beyond names and places.
    Do these mean the same company?
    • ACME
    • ACME Limited
    • A.C.M.E
    How about now?
    • GB-COH-123456: ACME
    • GB-COH-123456: ACME Limited
    • GB-COH-123456: A.C.M.E
    Bit more confident? You can take that code 123456* and find the company on Companies House (Hint, that’s what the GB-COH- tells the machine using your open data!). Go you, you’ve just opened up a whole new world of information! This example is using a shared standard way of talking about organisations, find out more on org-id.guide.
    (* P.S This is just an example, ACME doesn’t really exist!)

    Now what?

    You can start to answer question like this:
    Answer tough questions with good quality open data
    Answer tough questions with good quality open data
    These codes or Identifiers are  a gold mine. Every country has agencies that give codes to businesses, charities, non profits and more. Use those codes where you can.

    Can I share codes for anything else?

    Of course! You can identify places, things, categories, types and much, much more.

    Tip: Make your open data more useful by making it easy to connect with other data.

    See all the tips in one place: Good Quality Open Data

    More on: interoperable
    Courtesy of Collins English Dictionary – Complete and Unabridged, 12th Edition 2014 © HarperCollins Publishers 1991, 1994, 1998, 2000, 2003, 2006, 2007, 2009, 2011, 2014

    Information commons for the UK charitable sector

    Exploring how 360Giving underpins the data infrastructure for charitable grant making and how it supports an information commons for the sector. It all starts with a little clarity.

    For 2 years, I helped funders open up information flow in the non-profit sector by publishing what they fund. This is powerful insight. Understanding the 360Giving data standard is crucial for more funders to adopt it.

    Funders need to know:

    1. What is the data standard?
    2. What must be provided, what’s recommended and what’s optional? (and why?)
    3. How does the standard fit together?
    4. How does our data map to the standard?
    5. How can we ensure we’re telling the true story of our funding?

    Supporting the standard meant creating tools, reports, and visualisations to provide clarity and provoke discussion (the standard isn’t static, so your voices as funders, data users, tech and non-profits are hugely important).

    One question I hadn’t answered to my satisfaction was “How does the standard fit together?”. So I created a data visualisation to explain what 360Giving helps you share and how it’s put together to support good quality open data on funding.

     

    360Giving Data Schema Visualised
    360Giving Data Schema Visualised

     

    With funding information shared in a similar way, charitable grant making organisations can ask & answer questions like:

    • How can we share the story of our funding?
    • Can we find partners by sharing our grant making?
    • How can we tackle our shared missions together?

    Sharing data openly connects organisations. That’s why open data is the basis of a shared charitable sector information commons. Historically, the non-profit sector had it tough – no-one wanted to fund infrastructure. Here’s what Friends Provident Foundation‘s Danielle Walker-Palmour had to say at a social investment event:

    No one wants to fund infrastructure – we need to think of infrastructure as a commons to achieve our sector’s collective goals.

    Times and perceptions are shifting; Barrow Cadbury‘s Connect Fund is making headway investing in infrastructure for social investment. Similar initiatives are expected to follow.

    A shared commons of information needs standards that make information simple to combine, easy to understand and usable by organisations of every size. The 360Giving data standard is an integral part of the commons and the sector’s data infrastructure. The goal? A shared information commons that sees more of the non-profit sector working together, seamlessly.

    Here’s the original Twitter moment:

    Make It Easy To Get Hold Of

    Good quality open data is accessible; easy to get hold of and easy to use.

    Accessible means different things depending on your audience and how much data we’re talking about.

    Mostly Human
    Mostly Human

    Is your audience mostly human?

    Publish files humans can use, like spreadsheets for information in rows and columns, and shapefiles for geography.

    Tip: Make sure the files aren’t too big to open on an everyday person’s computer.

    Mostly Machines
    Mostly Machines

    Is your audience mostly machine?

    Machines can work with spreadsheets or formats like JSON or xml, designed for exchanging data.

    Do you have a little data? Start with files but think about making it even easier for machines with an API – a way of accessing information. Think of API as a bartender that serves up your open data.

    Do you have a lot of data? You definitely want a good API that lets machines ask for a little data or a lot, ask for new data or just what’s changed, depending on their needs.

    Bulk Is Good!
    Bulk Is Good!

    For everyone

    Make it easy to get all your open data in one go – a bulk download. If you’ve got far too much for one download, break your open data into manageable chunks. Don’t forget to squish those files down as much as possible by zipping them up.

    See all the tips in one place: Good Quality Open Data