The term ‘data journalism’ is misleading. It gives the impression of journalists as statisticians, crouched over computer databases doing SQL queries and writing code. This may be one aspect of data journalism but only a tiny one. And it is certainly not why data journalism is the future of news.
Data journalism is shorthand for being able to cope with information abundance. It is not just about numbers. Neither is it about being a mathmo or a techie. It means being able to combine three things: individual human intelligence, networked intelligence, and computing power.
We live in an age of information abundance – both static and dynamic. By static data I mean things like annual reports, historical crime data, censuses (censi?). This is information that is collected – often by public bodies – categorized, and published. By dynamic data I mean real time information, flowing in through micro-blogs, social networks, live cameras.
Static data, which used to lie relatively dormant in archives and libraries is increasingly being made public (on places like data.gov.uk and data.gov). On data.gov.uk there are already 5,600 data sets. In January most of the UK’s local councils (293 out of 326 at the last count) published all their spending records over £500.
Dynamic data comes at us in a torrent. 25 billion tweets were sent in 2010. 100 million new twitter accounts were created in 2010. 35 hours of video are uploaded to YouTube every minute. There were 600 million people on Facebook by the end of 2010 (data from royal.pingdom). If you want, you can watch live CCTV cameras on the streets of London.
Data journalism is about coping with both of these. It’s about:
- being able to work out what is happening in Tahrir square in real time from tweets, video footage, and social networks – while at the same time contextualising that with diplomatic news from Cairo and Washington (see services like Storyful and Sulia)
- being able to upload, add metadata and analyse thousands of pages of legal documents (e.g. via Document Cloud)
- being able to map crime data (e.g. see Oakland Crimespotters)
- being able to harness the intelligence of the ‘crowd’ to unearth stories from mountains of detailed data; as the Guardian did with MPs expenses, getting 170,000 read and checked in just over three days (and, separately, to identify all the Doctor Who baddies)
- knowing how to use metadata – in publishing, searching and using information (heard of hNews? Or RDFa? Or Open Calais?)
- building the tools that enable people to see the relevance of public information to them (as the New York Times did with its series on toxic waters)
A data journalist should have the news sense of a traditional journalist, a broad and deep social media presence, and be tech-savvy enough to be able to do pivot tables in Excel and know how to use tools like Google Refine. The ability to code and do database queries would be an added bonus, but is not a pre-requisite.
This is where Stieg Larsson and his ‘The Girl with/who..’ series comes in. Larsson got data journalism. He understood how rare it is for a journalist with news sense and story-telling skills to be a tech wiz as well. So he didn’t try to combine everything in one character. He created two: Mikael Blomqvist and Lisbeth Salander. Together they can source the data, analyse the data, and tell the story.
Compare Salander with Wikileaks. Wikileaks spent four years publishing leaked data without much public profile. Then it started to turn its data into stories (the edited footage of the US Apache helicopter attack in Iraq) and to partner with existing news organisations and journalists (particularly Nick Davies and David Leigh at the Guardian) and it became one of the most well known organisations on the planet (the leak of the Afghan warlogs, the Iraq warlogs and the diplomatic cables helped of course).
Data needs journalism. This is where the rather misleading phrase ‘data journalism’ is also quite helpful. There is a myth that all we need to do to make the world a better place is to make everything open and transparent. Openness will help, but it only gets us halfway there. Without people and organisations able and willing to take the open data, clean it, structure it, add metadata to it, create tools to analyse it, analyse it, and tell stories from it, then the data might as well go back in the archive.
Start with Jonathan Stray’s excellent reading list on his blog
The Guardian’s Datablog is one of the pioneers in this area, particularly notable are the way it dealt with MPs expenses, how it maps things like Swine Flu, and how it handled the Wikileaks warlogs data
Propublica has published a series of guides on collecting data
Conrad Quilty Harper has a good run down of open data and its uses (good and bad) at The Telegraph
See panel discussion about data journalism at The Book Club, Shoreditch, with Mark Stephens and Ben Leapman, 9-2-11