You are on page 1of 23

The Beauty of

Quality Data
How we can improve open data quality,
and why we must.

John Grin

OK, but we need to ask questions:

Is there any missing data?

How was the data collected?

How are terms defined? E.g. SME

What is the license - how am I allowed to use it?

What do we mean by data quality?

This is why we cant have nice things

CSV Files

Open format
Simple
Everyone gets them
Were already using them

x You can put any old crap in them

1. Full dataset spread over multiple files


2000

2001

Sector

Size

Sales (M)

Sector

Size

Sales (M)

Education

SME

Education

SME

Education

Large

Education

Large

Health

SME

Health

SME

13

Health

Large

11

Health

Large

18

Sector

Size

Year

Sales (M)

Education

SME

2000

Education

SME

2001

Education

SME

2002

19

Education

SME

2003

42

2. Chracter Encoding Issues

3. Non-normalised schema
Sector

Size

2000

2001

2002

2003

Education

SME

19

42

Education

Large

10

23

45

Health

SME

18

29

67

Health

Large

11

28

36

80

Sector

Size

Year

Sales (M)

Education

SME

2000

Education

SME

2001

Education

SME

2002

19

Education

SME

2003

42

Introductory Text Before Header

This dataset is subject to the Open


Government
License
Year
Sales
(M)
Sector

Size

Year

Sales (M)

Education

SME

2000

Education

SME

2001

Education

SME

2002

19

Education

SME

2003

42

Empty cells where there should be a

Sector

Size

Year

Sales (M)

Education

SME

2000

2001

Education
Education

SME

2002

19

Education

SME

2003

42

Duplicate misspelt category terms

Sector

Size

Year

Sales (M)

Education

SME

2000

education

SME

2001

Education

SME

2002

19

EDU

SME

2003

42

OpenRefine can help if youre in


this situation

What can we do to improve


machine-readability?

Schemas - csvlint.io
(Also see: CSV On the Web W3C Primer)

Registers

ODI Certificates

Why Bother?
This Code is issued to meet the Governments desire to place
more power into citizens hands to increase democratic
accountability and make it easier for local people to
contribute to the local decision making process and help
shape public services.
Local Government Transparency Code 2015

low
hanging
fruit

John Grin
Principal Consultant

atchai.com

twitter.com/johngrin

getdataseed.com

john@atchai.com

You might also like