You are on page 1of 15

The Data Journalism Handbook

Version 0.2

Contributors
Contributors to this book include:

David Banisar, Article 19 Caelainn Barr, EU Data Journalist Mariano Blejman, Hacks/Hackers Marianne Bouchart, Data Journalism Blog Liliana Bounegru, European Journalism Centre Brian Boyer, Chicago Tribune Jane Park, Creative Commons Paul Bradshaw, Birmingham City University, City University London Lucy Chambers, Open Knowledge Foundation Helen Darbishire, Access Info Europe Steve Doig, Cronkite School of Journalism David Erwin, New York Times Lisa Evans, Guardian Datablog Tom Fries, Bertelsmann Stiftung Duncan Geere, Wired.co.uk Rich Gordon, Northwestern University Jonathan Gray, Open Knowledge Foundation Ted Han, DocumentCloud Kate Hudson, Open Journalism Francis Irving, ScraperWiki Lizzie Jackson, Ravensbourne College Nicolas Kayser-Bril, Data Journalist John Keefe, New York Public Radio Friedrich Lindenberg, Open Knowledge Foundation Lorenz Matzat, OpenDataCity Aidan McGuire, ScraperWiki Philip Meyer, University of North Carolina at Chapel Hill Cynthia O'Murchu, Financial Times

Aron Pilhofer, New York Times Anthony Reuben, BBC Simon Rogers, Guardian Datablog Amanda Rossi, freelance journalist Fabrizio Scrollini, London School of Economics Adam Thomas, Source Fabric Andrew Vande Moere, infosthetics.com Sascha Venohr, Zeit Online Jerry Vermanen, De Stentor Csar Viana, Estacio de Sa University Farida Vis, University of Leicester

Coordinators
European Journalism Centre Open Knowledge Foundation

This work is licensed under a Creative Commons Attribution Sharealike license.

Table of contents
The Data Journalism Handbook Contributors Table of contents 0. Preface 0.1 The purpose of this book 0.2 Add to this book 0.3 Share this book 1. Introduction 1.1 What is data journalism? 1.2 Why is it important? 2. Introducing data journalism in the newsroom 2.1 Changes in the newsroom 2.2 How is it done: journo-developers vs. coders for hire 3 Types of outcomes/projects and case studies 3.1 Data powered stories 3.2 Data served with stories 3.3 Data driven applications 4. Working on the data story 4.1. Step 1: Getting your data 4.1.1 Where does data live? 4.1.2 Asking for data 4.1.3 Getting your own data 4.2 Step 2: Understanding your data 4.2.1 Data literacy 4.2.2 Working with data tips 4.2.3 Tools and techniques for analysing data

4.2.4 Harnessing external expertise 4.3 Step 3: Finding a story in your data 4.3.1 From datasets to stories - approaches 4.4 Step 4: Delivering your data project 4.4.1 Serving data with stories 4.4.2 Visualising data 4.4.3 Data driven applications 5. Engagement, outreach and community 6. How to make data journalism sustainable 6.1 Measuring impact 6.2 Business models 7. Appendix 7.1 Further resources 7.2 Glossary:

Project hashtag: #ddjbook Overview of progress: http://bit.ly/ssiDYe (as of Sunday 6 November). More recent updates in text below in yellow highlight.
Questions? Want to contribute? Get in touch: Liliana Bounegru (bounegru@ejc.net) Lucy Chambers (lucy.chambers@okfn.org)

0. Preface
0.1 The purpose of this book
Overview: Explain what this book does and doesnt aim to do Authors: Lucy Chambers, Liliana Bounegru Length: 0.5-1 page

0.2 Add to this book


Overview: Explain how to contribute to future versions of this book Authors: Lucy Chambers, Liliana Bounegru Length: 0.5 page

0.3 Share this book


Overview: Encourage people to share this book Authors: Lucy Chambers, Liliana Bounegru

Length: 0.5 page

1. Introduction
1.1 What is data journalism?
Overview: Define and describe data journalism and how it is different from other forms of journalism. Authors: Paul Bradshaw, Jonathan Gray, Aron Pilhofer, Jerry Vermanen, Philip Meyer, Duncan Geere, David Anderton, Federica Cocco, Brian Boyer, JV Chamary, [Heather Brooke], [Simon Rogers], [Richard Gordon] Length: 4 pages (with quotes from different people) Editor: Liliana Bounegru (European Journalism Centre) Peer-reviewer: Jonathan Gray (Open Knowledge Foundation) UPDATE: First draft of chapter finished. STILL NEED: Peer-review.

1.2 Why is it important?


Overview: Put data journalism into context and explain why it matters and what potential it has. Authors: Tom Fries, [Heather Brooke], [Simon Rogers], Nicolas Kayser-Bril, [Richard Gordon] , Jerry Vermanen Length: 2.5 pages (with quotes) Editor: Liliana Bounegru (European Journalism Centre) Peer-reviewer: Jonathan Gray (Open Knowledge Foundation) UPDATE: First draft finished. STILL NEED: Peer-review.

2. Introducing data journalism in the newsroom


2.1 Changes in the newsroom
Overview: Explain what transformations a newsroom needs to undergo when data journalism is integrated in terms of: staff, management, workflow and newsroom tools. Authors: Justin Arenstein Length: 1-2 page Editor: Liliana Bounegru (EJC)

UPDATE: Pending input from Justin Arenstein STILL NEED: Input from Justin Arenstein

2.2 How is it done: journo-developers vs. coders for hire


Overview: Explain different ways of doing data journalism (e.g. journalists who can code vs coders for hire, off the shelf tools vs. custom web applications, in house graphics departments vs hired data visualisation experts, etc). Give examples of how it is being done in different newsrooms. Authors: Brian Boyer, Lucy Chambers, Sascha Venohr, Jerry Vermanen Length: 2-3 pages (with examples and quotes) UPDATE: Ready for review EDITOR: Lucy

3 Outcomes, projects and case studies


3.0 Notes on classification of data stories
Overview: What types of data stories are out there. A taxonomy. Authoer: Martin Rosenbaum, BBC UPDATE: Ready for review EDITOR: Lucy

3.1 Data powered stories


Overview: Give and describe successful examples of data powered stories you worked on. Describe how you produced these stories. The aim is to give journalists and decision-makers in newsrooms who might be interested in data journalism a sense of what the potential of data powered stories is and how they could go about producing them. What data did you use and how did you obtain it? What determined you to start this project? What did the project aim to achieve? How long did you work on the project? How many people worked on it? What was the cost of the project? What were the skills necessary for this project? (domain knowledge, coding, research, visualisation, etc.) What is the role of datasets in these stories? (e.g.: give rise to new stories, enrich stories, contextualize stories, help journalists explore topics in new ways, etc.)

What was your approach? (exploratory vs. hypothesis approach) What techniques and tools did you use? How did you present the data powered story? What is the potential of data powered stories? Why should journalists/newsrooms be interested in producing such projects? What were the challenges in producing these stories? What tips and advice would you give to journalists who want to work on similar projects? Please include relevant links, videos and images. Authors: Steve Doig, Cynthia O'Murchu, Caelainn Barr, Sascha Venohr, Amanda Rossi Length: 1.5-3 pages per example UPDATE: Ready for review EDITOR: Lucy/Kat

3.2 Data served with stories


Overview: Give and describe successful examples of data served with stories you worked on. Describe how you produced these projects. The aim is to give journalists and decision-makers in newsrooms who might be interested in data journalism a sense of what the potential of data served with stories is and how they could go about producing them. What data did you use and how did you obtain it? What determined you to start this project? What did the project aim to achieve? How long did you work on the project? How many people worked on it? What was the cost of the project? What were the skills necessary for this project? (domain knowledge, coding, research, visualisation, etc.) What is the role of datasets in these stories? (e.g.: provide additional context or insight, etc.) What was your approach? (exploratory vs. hypothesis approach) What techniques and tools did you use? How did you present the story and the data served with it? What is the potential of such projects? Why should journalists/newsrooms be interested in producing such projects? What were the challenges in producing these projects? What tips and advice would you give to journalists who want to work on similar projects? Include relevant links, videos and images. Authors: Martin Rosenbaum (BBC), Esa Mkinen (Helsingin Sanomat) Length: 1.5-3 pages per example UPDATE: Ready for Review

EDITOR: Lucy/Kat

3.3 Data driven applications

Overview: Give and describe successful examples of data driven applications you worked on. Describe how you produced these applications. The aim is to give journalists and decision-makers in newsrooms who might be interested in data journalism a sense of what the potential of data driven applications is and how they could go about producing them. What data did you use and how did you obtain it? What determined you to start this project? What did the project aim to achieve? How long did you work on the project? How many people worked on it? What was the cost of the project? What were the skills necessary for this project? (domain knowledge, coding, research, visualisation, etc.) What was your approach? What techniques and tools did you use? How did you present the outcome? What is the potential of such projects? Why should journalists/newsrooms be interested in producing such projects? What were the challenges in producing these projects? What tips and advice would you give to journalists who want to work on similar projects? Include relevant links, videos and images. Authors: Aron Pilhofer, Matt Stiles Length: 1.5- 3 pages per example UPDATE: needs doing! STILL NEED: Guardian, NYT, BBC, EDITOR: Lucy/Kat

4. Working on the data story


4.1. Step 1: Getting your data
4.1.1 Where does data live?
Open data Overview: An overview of open data sources, what they contain, how to find them, how to search them, examples of open data being used by journalists

Authors: Jonathan Gray, Brian Boyer Length: 1-3 pages (with links and examples) Social data services Overview: An overview of community driven websites which aim to help you find the data you need - such as GetTheData.org and TheDataHub.org - and their function in enabling collaboration around datasets Authors: Jonathan Gray Length: 0.5-1 page (with links and examples) Research data Overview: An overview of sites to find research data Authors: Length: 0.5-1 page (with links and examples) UPDATE: Great input and notes from Brian Boyer/Chicago Tribune, Jane Park/Creative Commons, John Keefe/WNYC, Chrys Wu/HacksHackers. STILL NEED: Needs to be written up and expanded. EDITOR: Friedrich

4.1.2 Asking for data


Your right to data Freedom of Information laws Your Right to Reuse the Information Received When things go wrong... The Journalist-Public Official Relationship Successful investigations based on FOI requests Useful resources Overview: An overview of FOI legislation, an example of making an FOI request, information on resource in this area, how to get help from FOI experts; How talking directly with public servants or engaging with official open data initaitves might help you to find the data you need Authors: Helen Darbishire (Access Info), Fabrizio Scrollini (London School of Economics) Length: 18 pages (with links and examples) Peer-reviewers: Martin Rosenbaum, BBC Editor: Sam Leon (Open Knowledge Foundation), Liliana Bounegru (European Journalism Centre) UPDATE: First draft done except 2.2.6. Sent for peer-review. STILL NEED:

4.1.3 Getting your own data


Scraping data Overview: Explaining basic idea of web scraping, why this can be necessary, examples of how this has been used by journalists and guide for absolute beginners on how it can be done based on an interesting case study Authors: Friedrich Lindenberg Length: 2-3 pages (with links, examples, and a basic tutorial) UPDATE: Ready for review STILL NEED: Old version (multiple authors) needs breaking up into useful resources and putting on DDJnet EDITOR: Lucy Crowdsourcing data collection Overview: Explaining basic idea of crowdsourcing data, how various projects have used this, and how to do this (e.g. using Google Spreadsheets, forms, maps, Twitter hashtags, etc) Authors: [Simon Rogers], [Lisa Evans] Length: 1-3 pages (with links and examples) UPDATE: Input from Marianne Bouchart and others (not in the Google doc yet), Guardian (notes) STILL NEED: Nicolas-Kayser Bril (water data) and other examples EDITOR: Liliana/Friedrich

4.2 Step 2: Understanding your data


4.2.1 Data literacy
Overview: Explaining data literacy and its importance (including statistical/ numerical literacy, use of mathematics, technical literacy, etc) Author: Nicolas Kayser-Bril (J++), Michael Blastland (BBC) Length: 6 pages Editor: Liliana Bounegru (European Journalism Centre) UPDATE: First draft done with input from Nicolas Kayser-Bril and Michael Blastland STILL NEED: -

4.2.2 Working with data tips


Overview: What you need to work with datasets: background knowledge,

technical ability, etc. (case study approach with lessons learned from each project presented)

Authors: Steve Doig (Cronkite School of Journalism), Lisa Evans (Guardian), Richard Gordon (Medill School of Journalism), Lizzie Jackson (Ravensbourne College), Amanda Rossi (freelance journalist), JV Chamary (BBC), Fabrizio Scrollini (London School of Economics), Ted Han (DocumentCloud), Claire Miller (Wales Online) Length: 9 pages Editor: Liliana Bounegru (European Journalism Centre) Peer-reviewer: UPDATE: Input mainly from Steve Doig (Cronkite School of Journalism) and Claire Miller (Wales Online) STILL NEED: Input from Friedrich Lindenberg on types of errors to look for when

working with scraped / extracted / manipulated data 4.2.3 Tools and techniques for cleaning and analysing data
Overview: Overview of different types of tools for analysing and working with datasets, examples of how they can be used, examples of how they have been used by journalists. Authors: Liliana Boungeru, Lucy Chambers, Claire Miller Length: 1-2 pages per case study UPDATE: Needs doing! STILL NEED: Input from Friedrich. EDITOR: Friedrich.

4.2.4 Harnessing external expertise


Overview: How to enable people to annotate and comment on datasets Authors: Ted Han [asked] Length: 1 page UPDATE: Needs doing! STILL NEED: Input from Guardian, OWNI, NYT? EDITOR: Liliana

4.3 Step 3: Finding a story in your data


4.3.1 From datasets to stories - approaches

Overview: Explaining how to find stories in datasets (various approaches), including examples and case studies. Also looking at the broader role of data journalists in the newsroom, how they work with other journalists, etc. Authors: Caelainn Barr, Claire Miller Length: 0.5-1 page per approach/case study UPDATE: Ready for Review EDITOR: Lucy

4.4 Step 4: Delivering your data project


4.4.1 Serving data with stories
Overview: Overview of ways to publish data including examples. Embedding data, raw data (formats), live data live data, updating data, APIs. Who is your data for. Also a section on knowing the law, ethics and privacy and open licensing. Authors: Length: 1-2 pages UPDATE: To be merged with 3.2 EDITOR: Lucy

4.4.2 Visualising data

Overview: Roles of visualisation in journalism what function(s) visualisations play in reportage (what do journalists use visualisations for): (1) to find stories, (2) to tell a story Tools, tutorials and good examples of using visualisations to find stories When do you need to visualise a dataset to explore it and find a story? When dont you need to? How do you go about discovering a story? What tools do you use? What protocol do you follow? What clues do you follow, what do you pay attention to? (lessons, tips, advice). Examples of how to explore a dataset with a visualisation tool with a step by step description of the protocol followed to find the story. Tools, tutorials and good examples of using visualisations to tell stories When do you need to visualise a story and when dont you need to?

What types of visualisations are good for presenting what types of stories? How do you go about visualising a story? What tools do you use? What steps do you take? (lessons, tips, advice). What makes a good visualisation, what makes a bad visualisation? Examples of good and base use of visualisations to tell a story with explanation of what makes them a good/bad case.

Note: The aim of this chapter is not to show journalists how to do a data visualisation but to explain when a visualisation could be useful in their work, what could visualisations help them with, how they could assess the quality of a visualisation, getting them familiar with the vocabulary so they know what to ask for from designers, getting them familiar, introducing and showing them how to use visualisation tools for non-experts. Authors: Sarah Cohen (Knight Professor of the Practice of Journalism and Public Policy, Sanford), Geoff McGhee, David Erwin (New York Times), Aron Pilhofer (New York Times), Farida Vis (University of Leicester), Kate Hudson (openjournalism.ca), Lulu Pinney (infographics specialist), Mariano Blejman (Hacks/Hackers), Length: 1-2 pages per case study Editor: Liliana Bounegru (European Journalism Centre)
UPDATE: Good start! STILL NEED: Needs expanding and editing, and more examples. EDITOR: Liliana Bounegru (EJC)

4.4.3 Data driven applications


Overview: Step by step guide, tips and tricks for how newsrooms can produce data driven applications What are the resources (skills, costs, etc.) needed? What are the steps to take when you want to build a data driven application? What useful lessons did you learn from your own experience? Why should newsrooms be interested in producing data driven applications? What is the potential of such projects? Authors: Aron Pilhofer, Matt Stiles Length: 2-3 pages (including examples) UPDATE: Pending content from Matt Stiles STILL NEED: Input from Matt Stiles

EDITOR: Liliana Bounegru (EJC) 4.4.4 Telling Stories through Social Media UPDATE: Pending content from Luca Dello Iacovo STILL NEED: Above content EDITOR: Lucy Chambers

5. Engagement, outreach and community


Overview: Knowing your audience (and pitching appropriately), dissemination and outreach, social media, building community, engaging with existing communities (designers, developers, etc). Authors: Length: 1-2 pages UPDATE: Duncan (Wired) working on it now. Needs more input. EDITOR: Jonathan

6. How to make data journalism sustainable


6.1 Measuring impact
Overview: Give overview of the potential of data journalism (e.g. engaging with new audiences, the future of journalism on the web) and how it could be measured. Authors: Mirko Lorenz (Deutsche Welle), [Lorenz Matzat] Length: 1 page

6.2 Business models


Overview: Discuss costs, sustainability and business models for data journalism. Provide successful and less successful examples and explain what lessons can be learned from them. Authors: Mirko Lorenz (Deutsche Welle), Sascha Venohr (Zeit Online), Lorenz Matzat Length: 1-2 pages

UPDATE: Case study on measuring impact from Sascha Venohr (Zeit Online). Excellent input from Mirko Lorenz (Deutsche Welle) on making the case for data journalism facts to keep in mind when thinking about sustainability and business models for data journalism. Case studies from Lorenz Matzat (OpenDataCity), Mark Hunter on Kaas & Mulvad (pending permission), Clement Renaud on OWNI (in progress). STILL NEED: more case studies sustainability, business models and measuring impact of data journalism from Guardian, NYT, Chicago Tribune, etc. EDITOR: Liliana Bounegru (European Journalism Centre)

7. Appendix
7.1 Further resources
Overview: Lists of links, resources, examples and other bits and pieces that dont fit in the handbook Authors: Everyone! Length: 5 pages

7.2 Glossary:
Link: 5.2 Glossary UPDATE: Needs doing! STILL NEED: Lots of ideas from everyone. EDITOR: Jonathan

You might also like