3 data wrangling lessons from Arts Council England national portfolio

What questions can we ask? Will this data help solve our problem? Can we use this algorithm or that one?

Welcome to data wrangling 101. Exploring our data before we dive in and start playing with it or reshaping it means more productive data science or data analysis. If you’re lucky, you know enough about the domain to understand the quirks a dataset throws your way or you have someone to badger. On your own with an unfamiliar dataset? That happens too. So here’s 3 lessons from wrangling the Arts Council England 2018-2022 national portfolio dataset.

First, a little bit about Arts Council England:

Arts Council England is a public body supporting arts and culture in England. It is funded by public funds from the UK government and the National Lottery. Between 2015 and 2018, it will invest £1.8 billion in arts, museums and libraries. The funds will support art and culture experiences including theatre, digital art, reading, dance, music, literature, crafts and collections.

Why on earth are we interested in the national portfolio dataset?

The National Portfolio programme supports organisations considered by Arts Council England to represent the best of global arts practice. Funding is given over multiple years, currently 3. Between 2015 and 2018, £1 billion will be invested in 663 organisations.

That’s a lot of money and  lot of prestige! I’m still exploring the dataset but here’s what I’ve learned so far.

Lesson 1: Test your assumptions

My first assumption was a bust. One thing it’s useful to know is “Which fields make the data unique?”. This helps us report on stuff like “How many grants were issued by the Arts Council?” and “To how many organisations?”. It was easy to jump in at first glance and say the organisation’s name, the Applicant Name. Unfortunately, an organisation can be awarded under multiple funds.

Ah OK, so maybe Applicant Name and the type of fund, the Funding Band? At first that worked great but then 1 rogue entry popped up… It turns out that most of the time, an organisation gets 1 grant, sometimes 2 but Tyne & Wear Archives & Museums got 3!

Arts Council England - Anomalies
Arts Council England – Anomalies

The upshot? Test your assumptions. This might be an anomaly or it might be legitimate. We can’t always tell, so we’re going to have to ask.

Lesson 2: Don’t be afraid to ask

📢 Data isn’t a perfect reflection of the real world.

When we collect, share or use data, we curate it. We make decisions about what and how much detail to include. We can’t assume that data is perfect, so sometimes we have to ask the hard questions like “Why was Tyne & Wear Archives & Museums awarded 3 grants?”

Other oddities cropped up in the data that needed that human touch. Arts Council England share a lot of geographic information. Check out what you can find:

  • Local Authority
  • ACE Region
  • Area
  • ONS Region

They’re all slightly different. Some are clearly internal like ACE Region and others are official geographies like ONS Region. But what about Area? I was stumped, so I asked the very friendly Arts Council England support team.

Here’s what I heard back:

Dear Edafe,

I have heard back from our Digital Team and they advised that the ‘area’ column on the sheet attached by the person making the enquiry refers to Arts Council areas, these are:

  • London – comprising NUTS 1 region of London
  • Midlands – comprising NUTS 1 regions of East Midlands and West Midlands
  • North – comprising NUTS 1 regions of North East, North West and Yorkshire and the Humber
  • South East – comprising NUTS 1 regions of East of England and South East (excluding the county of Hampshire, and Unitary authorities of Isle of Wight, Portsmouth and Southampton)
  • South West – comprising NUTS 1 regions of South West plus the county of Hampshire, and Unitary authorities of Isle of Wight, Portsmouth and Southampton

More information on the areas can be found here: http://www.artscouncil.org.uk/about-us/your-area

The organisations labelled National are certain Sector Support Organisations with a national remit.

The NUTS 1 region which each organisation is located in can be found in the column headed ‘ONS region.’

Hope this helps.

Ah, that’s really handy to know. If we need to, we can map Area to Nomenclature of Units for Territorial Statistics (NUTS) regions or decide if we know enough about geography from other columns and can ignore Area.

The upshot? Don’t be afraid to ask. Making assumptions can come back to bite you. If you can, ask someone who knows so you understand their design choices. You don’t have to do this for every single column, focus on the ones that are most likely to solve your problem. You can also come back as you iterate. Remember, it’s a cycle.

Lesson 3: Remember it’s a cycle

There are a few methodologies,  good practices and guidelines that help you punch through the worst bits of data wrangling so you can get to the good bits. You might be data mining or predicting or deep learning. No matter your intended application, you’ll most likely be iterating – going around in a cycle of try, test, understand till you have a good enough answer.

When you first start working with data it can seem overwhelming. Remembering it’s a cycle will keep you sane. You might miss things the first time, that’s OK. That’s why we test and iterate.

In conclusion

I started exploring the the Arts Council England 2018-2022 national portfolio dataset to answer a friend’s question and then to streamline my practice. Along the way I made assumptions, backtracked, tried data visualisations that didn’t work and rolled my eyes – a lot. Each iteration, I learned something new and useful about the story of national portfolio funding for the next 3 years. I hope you have too.

Arts Council England - National Portfolio Dataset - Column Count
Visualising Column Count

Featured image: Arts Council England – Sign on the door by Howard Lake (CC BY-SA 2.0)

Grazing the Open Data Skills Framework

Where are you on your Open Data journey?

From novice to expert, the Open Data Institute’s Open Data Skills Framework has evolved to help guide your Open Data learning experience. With everyone starting at the Explorer stage, learning is balanced so you gain skills and experiences without the fatigue of too much information.

As a trainer and foodie, this subtle tension was familiar; it whetted my appetite to explore a foodie approach to getting the best out of the Open Data Skills Framework.

Sitting comfortably? Let’s begin.

Explorers: an Open Data Explorer has a basic understanding of open data. They can define it, point to examples or case studies and explain how it can be used to create change.

Serving Suggestion

The Amuse Bouche

Focus on mini case studies

Explorers have just started on their open data journey. They may be enthusiastic or apprehensive, or somewhere in-between. New information and ideas may need to be integrated and mulled over.

For explorers, I recommend bite sized case studies to entice them to learn more and clear signposting to where to get more information.

  • The 24 of open data – how open data is changing how we live, work and learn
  • Open data in numbers – a look at open data adoption
  • Crouching tech, hidden data – the open data you use every day

Strategists: an Open Data Strategist is someone who integrates open data into a strategy or manages an open data project. They have the planning and management techniques to drive forward an open data initiative, and they understand the challenges inherent in this process.

Serving Suggestion

The Starter

Focus on Methodology

Strategists know the drill, now they want to deploy it. For strategists, I recommend tips on how to determine what will work for their strategy or project, and what won’t.

This is less about open data itself and more about managing the people, projects, processes and pitfalls that come with introducing new ways of thinking.

  • Open Data Policies and how to get them right
  • Black-box thinking with open data – experimenting your way to smoother adoption
  • Start with why – is Open Data really what you need?

Practitioners: Open Data Practitioners have the practical skills necessary to conduct basic operations on an open dataset. They get hands-on with the data, and are familiar with the tools and techniques necessary to manage and publish an open dataset.

Serving Suggestion

The Taster Plate

Focus on tooling and techniques

Practitioners may range from reluctant to enthusiastic adopters of Open Data, but they want to get the job done.

For practitioners, I recommend revealing what tooling and techniques are out there and what for, including what’s new, what’s hot and what’s not.

  • From understanding to deployment – getting to useful open data using CRISP-DM
  • Automate, Improve, Optimise – how to work smarter with open data
  • Quick and dirty – rapid techniques for open data insight

Pioneers: Open Data Pioneers apply their data knowledge to their sector to solve challenges. They can point to sector-specific case studies, identify future trends in the sector and understand the data challenges specific to their sector.

Serving Suggestion

The Pot Luck

Focus on future trends and sharing knowledge

Pioneers are veterans who’ve tackled the challenges of open data, so they are ideally placed to look at where new challenges and opportunities lie.

For pioneers, I recommend a cross-pollination of ideas, challenges and opportunities from other sectors. Here, a focused conversation and guided workshop around where open data challenges lie may encourage contributions from experts and build a shared understanding of challenges.

Suggestions From the provocative:

  • What has open data ever done for us?
  • What is your open data return on investment?
  • Open data – has it failed?

To the exploratory:

  • What next for open data after Brexit?
  • What lessons can open data learn from open science?

The Open Data Skills Framework provides an ideal opportunity for learners to assess where they are and where they want to be on their open data journey. It also provides a landscape for trainers to adapt, create and innovate around sharing open data skills and techniques.

I hope to deliver one or more of these sessions at the ODI summit and look forward to continuing my own open data journey. Where are you on your Open Data journey?

Health Innovation Lab – Innovating Type-2 Diabetes

On Saturday 27.02.16, I volunteered at the Leeds Student Data Labs health hack – innovating around type 2 diabetes using open data. We were inspired by DJ Patil and his approach to data science.

It was a long but productive day, with students from multiple disciplines and all over Yorkshire. Here’s how it went: health innovation lab storify.

5 Tips From Wakefield Business Week: Thriving in the Northern Powerhouse

As a small business owner in Wakefield, a city in the Leeds city region, I was keen to get as much as possible out of the Wakefield Business Week.

Today’s F5 (Refresh) Your Digital & Creative Skills was on point. It hit the sweet spot between appealing to non-technical small and medium business owners and advocating for productivity the tech and digital sectors offer.

Here’s 5 tips to help business owners thrive in the Northern Powerhouse:

1. Collaborate
From large organisations like Google, Microsoft and BT to regional influencers like the LEP, White Rose credit union and locals like Wakefield Council and Cognitiv, there’s an abundant opportunity to collaborate and grow.

Councillor Jack Hemingway, made it clear that Wakefield Council’s business support team were ready and willing to help businesses in Wakefield thrive.
Tip: Don’t go it alone, collaborate.

2. Go Local
Wakefield has a wealth of digital and creative companies that end up working outside the region. By getting in touch with a membership organisation like Cognitiv, you get access to expertise on your doorstep. From next month, the last Friday of the month will be a casual breakfast drop-in at Unity Works. Who know who’ll you meet?

Dan Conboy of Cognitiv laid out the pillars of their mission o make Wakefield a thriving place for small and medium enterprises: Collaboration, Promotion, Representation. These, along with promoting Code Club, digital literacy and facing common challenges like skills and training, make Cognitiv a valuable addition to Wakefield.
Tip: Go local for great talent.

3. Think about the Cloud
Could the cloud and related technology help your business innovate and grow? Daniel Langton of Microsoft showed it’s not rocket science to transform your business, no matter it’s size, with technology.

Think about your vision for your business, are you:

  • Paying too much?
  • Working effectively?
  • Trapping business insights in legacy tools or paper?
  • Communicating quickly and effectively?
  • Giving clients and employees what they want?
  • Over or under covered for information security?

Tip: Think about how technology could help your business thrive.

4. Be Mobile Friendly
Google is the de-facto platform for search and mobile is now overtaking desktops and laptops. To thrive, you need a website that tells your story and sells your brand. More than that, your website needs to be mobile friendly to rocket up Google’s ranking.

Abbey from the Google digital garage covered a number of free tools for business that can help with everything search-related including SEO – search engine optimisation, SEM – search engine marketing and more. See Google’s business page for more.
Tip: Make yourself easy to find, especially on mobile devices.

5. Eat the Free Lunch
At the end of the talks, a trio of organisations: the LEP, Tech Partnership and Leeds Beckett University urged small businesses to take advantage of funding for training. This one is a no-brainer for any business that needs to improve their skills to grow.

Both the LEP and Tech Partnership will fund up to 50% (with some additional criteria) and Leeds Beckett University introduced a number of other organisations that can help with funding, research, training and more.
Tip: Fund your skills and grow, the money is out there.

What Next?
Firstly, a huge “thank you” to Wakefield First for organising the free event. I learned a lot and met several talented and lovely local people. Next for my small business? I’ll be joining Cognitiv, popping into for a free consultation at Google Garage Leeds and taking advantage of the Wakefield Business Week. See you at the next breakfast meeting?

Header Image: WakefieldFirst logo

Bursting out of the data bubble

Knowledge for everyone? Only if we brave communities outside our own


Seifenblase (Bulle de Savon, Soap Bubble) by Photo Clinique

Is your professional community your comfort zone? When was the last time you went outside it?

These questions and more have nagged at me for a few years. As a developer, project manager and all round data person, I was struck by how little my fellow “techies” understood. Not technology or how to do their jobs but how few really understood the businesses they worked in.

This will be a familiar story to anyone who’s worked a development role in certain types of organisations: Manager decide something needs to be done. They talk to business analyst who creates a spec. Who talks to systems analyst who create another spec. Who hands it over to project manager who gets a high level estimate, maybe from a tech team leader. Who parcels out the work.

This “Chinese whispers” method was how I go most of my work when I started out. It was no wonder tensions ran high when the people actually doing the job came to do a user acceptance test and declared: “No, doesn’t work”.

Agile was meant to change all that.

How? By bringing the people with the need in contact with the people with the skills. For this to work, you do need to speak enough of the same language. That means you need some understanding of how the business works and they need some understanding of  how the technology works.

What motivates the people you’re working with? What do they actually mean when they say “x”? Are you actually developing “y”? How realistic are their expectations? How realistic are yours?

To be fair, this isn’t limited to tech. Breaking down silos in any institution means playing outside your bubble, your team, your comfort zone.

These days, I work for myself. Mostly because it is a rare thing to find an organisation that wants you to work “with” not “for” them. One where you have autonomy, challenge, reward and support to be the best you, so they get your best work. So, these days, breaking out of my tech bubble isn’t about popping into other departments and spending time with them to understand what they do.

These days I go out of my data community comfort zone.

I pop into the Leeds Creative Timebank to meet creatives: artists, photographers, dance coaches to learn what they do, how they do it and in turn fine tune my language. This means I must keep translating “techie” concepts into useful, tangible tools that mean something and do something.

I speak with everyone: students, barbers, waiters, managers, farmers and for the next five days, enthusiastic readers and writers. Data, information and the knowledge that provides is for everyone, so why keep it to ourselves? Why make it such a complex, mythical thing, others can’t get as excited, inspired and buoyed up (but practical) as we are?

It helps that I’m fascinated by skillful competence and the potential for incorporating them into my practice. All in all, it makes me a better person, a better “techie” and boosts my understanding of how data can work for them.

Knowledge for everyone? Yes. As long as we don’t keep it to ourselves.