Archive for the ‘data.gov.uk’ tag
The term ‘data journalism’ is misleading. It gives the impression of journalists as statisticians, crouched over computer databases doing SQL queries and writing code. This may be one aspect of data journalism but only a tiny one. And it is certainly not why data journalism is the future of news.
Data journalism is shorthand for being able to cope with information abundance. It is not just about numbers. Neither is it about being a mathmo or a techie. It means being able to combine three things: individual human intelligence, networked intelligence, and computing power.
We live in an age of information abundance – both static and dynamic. By static data I mean things like annual reports, historical crime data, censuses (censi?). This is information that is collected – often by public bodies – categorized, and published. By dynamic data I mean real time information, flowing in through micro-blogs, social networks, live cameras.
Static data, which used to lie relatively dormant in archives and libraries is increasingly being made public (on places like data.gov.uk and data.gov). On data.gov.uk there are already 5,600 data sets. In January most of the UK’s local councils (293 out of 326 at the last count) published all their spending records over £500.
Dynamic data comes at us in a torrent. 25 billion tweets were sent in 2010. 100 million new twitter accounts were created in 2010. 35 hours of video are uploaded to YouTube every minute. There were 600 million people on Facebook by the end of 2010 (data from royal.pingdom). If you want, you can watch live CCTV cameras on the streets of London.
Data journalism is about coping with both of these. It’s about:
- being able to work out what is happening in Tahrir square in real time from tweets, video footage, and social networks – while at the same time contextualising that with diplomatic news from Cairo and Washington (see services like Storyful and Sulia)
- being able to upload, add metadata and analyse thousands of pages of legal documents (e.g. via Document Cloud)
- being able to map crime data (e.g. see Oakland Crimespotters)
- being able to harness the intelligence of the ‘crowd’ to unearth stories from mountains of detailed data; as the Guardian did with MPs expenses, getting 170,000 read and checked in just over three days (and, separately, to identify all the Doctor Who baddies)
- knowing how to use metadata – in publishing, searching and using information (heard of hNews? Or RDFa? Or Open Calais?)
- building the tools that enable people to see the relevance of public information to them (as the New York Times did with its series on toxic waters)
A data journalist should have the news sense of a traditional journalist, a broad and deep social media presence, and be tech-savvy enough to be able to do pivot tables in Excel and know how to use tools like Google Refine. The ability to code and do database queries would be an added bonus, but is not a pre-requisite.
This is where Stieg Larsson and his ‘The Girl with/who..’ series comes in. Larsson got data journalism. He understood how rare it is for a journalist with news sense and story-telling skills to be a tech wiz as well. So he didn’t try to combine everything in one character. He created two: Mikael Blomqvist and Lisbeth Salander. Together they can source the data, analyse the data, and tell the story.
Compare Salander with Wikileaks. Wikileaks spent four years publishing leaked data without much public profile. Then it started to turn its data into stories (the edited footage of the US Apache helicopter attack in Iraq) and to partner with existing news organisations and journalists (particularly Nick Davies and David Leigh at the Guardian) and it became one of the most well known organisations on the planet (the leak of the Afghan warlogs, the Iraq warlogs and the diplomatic cables helped of course).
Data needs journalism. This is where the rather misleading phrase ‘data journalism’ is also quite helpful. There is a myth that all we need to do to make the world a better place is to make everything open and transparent. Openness will help, but it only gets us halfway there. Without people and organisations able and willing to take the open data, clean it, structure it, add metadata to it, create tools to analyse it, analyse it, and tell stories from it, then the data might as well go back in the archive.
Start with Jonathan Stray’s excellent reading list on his blog
The Guardian’s Datablog is one of the pioneers in this area, particularly notable are the way it dealt with MPs expenses, how it maps things like Swine Flu, and how it handled the Wikileaks warlogs data
Propublica has published a series of guides on collecting data
Conrad Quilty Harper has a good run down of open data and its uses (good and bad) at The Telegraph
See panel discussion about data journalism at The Book Club, Shoreditch, with Mark Stephens and Ben Leapman, 9-2-11
[A version of this article was first published at PBS MediaShift IdeasLab]
On a news organization’s list of priorities, publishing articles as ‘linked data’ probably comes slightly above remembering to turn the computer monitors off in the evening and slightly below getting a new coffee machine.
It shouldn’t, and I’ll list 10 reasons why.
Before I do I should briefly explain what I mean by ‘linked data’. Linked data is a way of publishing information so that it can easily – and automatically – be linked to other, similar data on the web. For example, if I refer to ‘Paris’ in a news article it’s not immediately apparent to search engines whether that is Paris – France, Paris – Texas, or Paris Hilton (or indeed another Paris entirely). If published in linked data Paris would be linked to another reference point that would make clear which one it referred to (e.g. to the entry for Paris, France on dbpedia – the structured data version of wikipedia).
Until a short while ago I was reasonably clueless as to both the meaning and the value of linked data. I’m still far from being an expert, but enough people who are far smarter than me have convinced me that it’s worth trying. This was especially the case a couple of months back, at a News Linked Data Summit that we (the Media Standards Trust) organized with the BBC (which you can read about on a previous blog).
So, 10 reasons why news organizations should bump linked data up their priority list:
1. Linked data can boost SEO (search engine optimization)
People who tell you they can boost your SEO usually sound like witch doctors, telling you to tag all sorts of hocus pocus that doesn’t make rational sense or just seems like cynical populism. But at its simplest, SEO works through links. The more something is linked to, the higher it will come in people’s search results. So publishing content as linked data should, quite naturally, increase its SEO. A great example of this is the BBC’s natural history output. Type ‘Lion’ into Google and, chances are, a BBC linked data page will come in the first 10 results. This never used to happen until the BBC started tagging their natural history content as linked data.
2. Linked data allows others to link to your site much more easily
The world wide web is, more and more, being powered by algorithms; the Google search algorithm is perhaps the most obvious. But most sites now take advantage of some mechanized intelligence. ‘If you liked reading this, you might enjoy this…’ sort of thing. Problem is, algorithms – though intelligent – aren’t that intelligent. They have trouble telling the difference between, for example, Martin Moore (me), Martin Moore (kitchens), and Daniel Martin Moore (the Kentucky singer songwriter). But use linked data and they can. And once they can, sites like the BBC can link externally much more easily and intelligently.
3. Helps you build services based on your content
As it becomes increasingly difficult to get people to pay for news, so news organizations will need to build services based on their news – and other content – that people will pay for. You could, for example, provide a service that enabled people to compare schools in different areas, based on inspection reports, league tables, news reports, and parents’ stories. Creating services to do this is lots and lots easier if content is already made machine-readable through linked data.
4. Enables other people to build services based on your content – that you could profit from
Other people often have ideas you haven’t thought of. Other people also often have the space and time to experiment that you don’t have. Give them the opportunity to build stuff through linked data and they might come up with ‘killer apps’ that make you money. iphone apps anyone?
5. Allows you to link direct to source
You’re a news organization. Your brand is based partly on how much people trust the stuff you publish. Publish stuff in linked data and it enables you to link directly back to the report / research or statistics on which it was based – especially if that source is itself linked data (like http://data.gov.uk). That way, if you cite a crime statistic, say, you can link it directly back to the original source.
6. Helps journalists with their work
As a news organisation publishes more of its news content in linked data, so it can start providing its journalists with more helpful information to inform the articles they’re writing, and to make suggestions as to what else to link to when it’s published.
7. Throws bait over the paywall
Once content is behind a paywall it becomes invisible – unless you pay (that’s sort of the point). This is the same for joe public as for a search engine. But how are you, joe public, supposed to work out whether you want to pay for something if it’s invisible? Publish in linked data and there will be enough visible bits of information to help people work out if they want to pay for it. [This will probably be less of a deal with big search engines like Google, but more relevant to other search engines and third party services. Mind you, one of these bit players will, most likely, be the next Google or Facebook].
8. Makes data associated with your content dynamic
There is an ever growing mountain of information on the net that never gets updated. Pages devoted to football teams whose last score was added in 2006. Topic pages about political issues that haven’t seen a new story in months. But if those pages were filled with linked data, and linked to others that were too, they’d be automatically updated – rising from the dead like Frankenstein without you having to do diddly squat.
9. Start defining news events in linked data now and you could become a ‘canonical reference point’ (CRP)
What the heck is a canonical reference point, I hear you ask. Well, it’s a little like a virtual Grand Central Station. It’s a junction point for linked data; a hub which hundreds or even thousands of other sites link to as a way of helping to define their references. Examples of such hubs include: http://musicbrainz.org for music and musicians, data.gov.uk for UK gov stuff, http://dbpedia.org for almost anything. If you’re a news organization, why would you not want to be a hub?
10. Raises the platform for all
A web of linked data is a more intelligent web. A more mature and less superficial web. Not quite a semantic web, but getting there.
Of course, some of these benefits will come disproportionately to first movers (as with the BBC’s natural history pages). Which is exactly why news organizations, who have previously been pretty slow when it comes to web innovation, need to get their skates on.
More on linked data:
‘Linked data is blooming – why you should care’ on the ever readable Read Write Web, May 2009 (325 retweets to date)
A graphic of the linked data web: http://linkeddat
Tim Berners-Lee talking about linked data at TED 2009
My blog about our linked data summit
On Friday we co-hosted a news linked data summit, along with the BBC (and with some help from the Guardian).
The purpose of the day was to talk about linked data –what a linked data future might look like, what role linked data had for news organizations, and what news organizations should do about it. I’ll note down what I can remember from it in this blog, though given I was probably the least technical person there any tech references come with a big caveat (and I’d welcome being corrected on them).
The day was particularly opportune given that on Thursday Sir Tim Berners-Lee and Professor Nigel Shadbolt had launched data.gov.uk – a new site that provides a route into ‘a wealth of government data’.
Nigel Shadbolt was also at the news linked data summit, giving his vision of what a linked data future might look like – including examples of a ‘post code newspaper’, a mash-up of cycle route blackspots, and a clever illustration of how our income tax gets spent.
Martin Belam, of the Guardian and currybet.net, talked about the value of linked data to news organizations (which you can read on the Guardian blog here), and Richard Wallis, of Talis, gave an overview of where news organizations are now in terms of linked data and metadata standards (see Richard’s presentation here).
Those at the day included us (the Media Standards Trust), and people from the BBC, the Guardian, the Times, News International, the Telegraph, the Associated Press, Thomson Reuters, the Press Association, the New York Times, the FT, the Mail, the Newspaper Licensing Association (NLA), and the Association of Online Publishers (AOP).
The upshot was: everyone agreed that linked data could, potentially, be pretty exciting. It could enable much better and broader linking, it could help people discover the provenance of data, it could enable news to evolve much more dynamically than it does now, it could even do good things for SEO (though that’s a master art I won’t even try to figure out).
There was general agreement that the “One Ring To Rule Them All” approach doesn’t generally work on the web. In other words, you’ll never 100% agreement between organisations on which things are actually events or concepts, so the best you can do is to try and provide some mapping where sensible.
Therefore there would, inevitably, be multiple vocabularies and multiple places to link. Although one could imagine some sources being ‘canonical’, i.e. they become the default reference for most linked data. A good example of this would be the names of UK schools. One could imagine, for example, their being a list of these at the department of education website which would act as a sort of central repository.
There was also agreement that it would be a good thing if people started dipping their toe in the water. No-one is going to know how valuable – or not – linked data is without giving it a try.
For some of the news organizations the forthcoming general election seemed like a good place to start. There could be a lot of public value in linking, for example, parliamentary candidates.
If you want to know more about the day, or keep in touch with the progress of linked data and news, you can contact me at martin DOT moore AT mediastandardstrust DOT org.