I think the crowdsourcing project that got the biggest response was a piece on the Olympic ticket ballot. Thousands of people in the UK tried to get tickets for the 2012 Olympics and there was a lot of fury that people hadn’t got them. People had ordered hundreds of pounds worth and were told that they’d get nothing. But no one really knew if it was just some people complaining quite loudly while actually most people were happy. So we tried to work out a way to find out.
We decided the best thing we could really do, with the absence of any good data on the topic, was to ask people. And we thought we’d have to treat it as a light thing because it wasn’t a balanced sample.
We created a Google form and asked very specific questions. It was actually a long form, it asked how much in value people had ordered their tickets, how much their card had been debited for, which events they went for, this kind of thing.
We put it up as a small picture on the front of the site and it was shared around really rapidly. I think this is one of the key things, you can’t just think ‘what do I want to know for my story’, you have to think ‘what do people want to tell me right now’. And it’s only when you tap into what people want to talk about that crowdsourcing is going to be successful. The volume of responses for this project, which is one of our first attempts at crowdsourcing, was huge. We had a thousand responses in less than an hour and seven thousands by the end of that day.
So obviously we took presenting the results a bit more seriously at this point. Initially we had no idea how well it would do. So we added some caveats: Guardian readers may be more wealthy than other people, people who got less than they expected might be more willing to talk to us, and so on.
We didn’t know how much value the results would have. We ended up having a good seven thousand records to base our piece on, and we found about half the people who’d asked for tickets had got nothing. We ran all of this stuff and because so many people had taken part the day before, there was a lot of interest in the results.
A few weeks later, the official summary report came out, and our numbers were shockingly close. They were almost exactly spot on. I think partly through luck but also because we got just so many people.
If you start asking your readers about something like this on a comments thread, you will be limited in what you can do with the results. So you have to start by thinking: ‘What is the best tool for what I want to know?’ Is it a comment thread? Or is it building an app? And if it is building an app, you have to think ‘Is this worth the wait? And is it worth the resources that are required to do it?’
In this case we thought of Google Forms. If someone fills in the form you can see the result as a row on a spreadsheet. This meant that even if it was still updating, even if results were still coming in, I could open up the spreadsheet and see all of the results straight away.
I could have tried to do the work in Google but I downloaded it into Microsoft Excel and then did things like sort it from low to high and found the people who decided to write in instead of putting digits on how much they spent and fixed all of those. I decided not to exclude as little as I could. So rather than taking only valid responses, I tried to fix other ones. People had used foreign currencies so I converted them to sterling, all of which was a bit painstaking.
But the whole analysis was done in a few hours, and I knocked out the obviously silly entries. A lot of people decided to fill it out pointing out they spent nothing on tickets. That’s a bit facetious but fine. That was less than a hundred out of over seven thousands entries.
Then there were a few dozen who put in obviously fake high amounts to try to distort the results. Things like ten million pounds. So that left me with a set that I could use with the normal data principles we use every day. I did what’s called a ‘pivot table’. I did some averaging. That kind of thing.
We didn’t have any idea how much momentum the project would have, so it was just me working with the Sports blog editor. We put our heads together and thought this might be a fun project. We did it, start to finish, in 24 hours. We had the idea, we put something up at lunch time, we put it on the front of the site, we saw it was proving quite popular, we kept it on the front of the site for the rest of the day and we presented the results online the next morning.
We decided to use Google Docs because it gives complete control over the results. I didn’t have to use anyone else’s analytic tools. I can put it easily into a database software or into spreadsheets. When you start using specialist polling software, you are often restricted to using their tools. If the information we’d been asking for was particularly sensitive, we might have hesitated before using Google and thought about doing something ‘in house’. But generally, it is very easy to drop a Google Form into a Guardian page and it’s virtually invisible to the user that we are using one. So it is very convenient.
In terms of advice for data journalists who want to use crowdsourcing: you have to have very specific things you want to know. Ask things that get multiple choice responses as much as possible. Try to get some basic demographics of who you are talking to so you can see if your sample might be biased. If you are asking for amounts and things like this, try in the guidance to specify that it’s in digits, that they have to use a specific currency and things like that. A lot won’t, but the more you hold their hand through, the better. And always, always add a comment box because a lot of people will fill out the other things but what they really want is to give you their opinion on the story. Especially on a consumer story or an outrage.
- Marianne Bouchart, Data Journalism Blog