You are on page 1of 28

Part 2: Data Visualization

How to communicate complex ideas with simple, efficient and accurate


data graphics
Why visualize data?

The human eye is extremely sensitive to differences in:

Pattern Colors Format


2322456 8921111
232245678921111
426323689641111
426323689641111
456723893591111 456 23893591111
452356789871111 452356 89831111
322456788901111
322456 88901111

Because of our amazing ability to decipher these differences instantly,


representing complex data sets with data graphics is an efficient method to
communicate what the numbers are saying.

The visual display of quantitative information serves as a vehicle to


traverse a complex data world. Graphics reveal data.
What is the best way to display the data?

Let the data instruct you

Do not have a pre-specified mode of displaying


the data. Do whatever it takes to display data in
the most appropriate way. Design should be
content-driven not methodology driven.
Care Provided
CONTEXT, CONTEXT, CONTEXT!
Documented

Put the data into a human context


Chart Selected

What are we comparing the data to? Data Collected


Previous rounds (historical context) Has the clinic
performance rate improved over time? Data Analyzed

Other similar clinics How well is the clinic


Data Visualized
performing compared to other clinics:
In the same district/province/region (geographic
context) Data Reported
With the same caseload
With the same resources Data Interpreted

Decisions Made
Graphical Excellence
Have the audience in mind. What is the purpose of the graphic?
Description, exploration

Make large data sets coherent

Reveal the data at several levels of detail

Induce reader to think about the content, not the methodology

Encourage eye to compare different pieces of data


Spatial orientation, patterns, colors, formatting

Avoid distortion of the data


Axes, scaling, labeling

Clear and easy to read


Tufte, Edward. The Visual Display of
Quantitative Information. Connecticute,
Integrate words and numbers with graphics Graphic Press: 2001. Page 13.
Theory of Data Graphics
Before
Clinical Visits
Percentage of adult patients who had at least

Above all else show the data


one visit in each half of the year

Performane Rate
0.8
0.6
1) Maximize data-ink ratio. 0.4
0.2
I. Erase non-data-ink 0
1 2 3 4 5 6 7 8 9
II. Erase redundant Clinic

data-ink
After
2) Remove Chart Junk. Clinical Visits
I. Shadows Percentage of adult patients who had at least one
visit in each half of the year

II. 3D-rendering 1

Performane Rate
0.8
III. Other ornaments 0.6
0.4

3) Avoid Optical Vibration 0.2


0
1 2 3 4 5 6 7 8 9
Clinic

Tufte, Edward. The Visual Display of Quantitative Information. Connecticut, Graphic Press: 2001. Page 13.
120

100

80

60

40

20

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Examples
Bar Charts
Good for comparing a set of categorical values. Best when there are not too
many categories and/or variables.
Clinical Visits
Percentage of adult patients who had at least one visit in
each half of the year
1

0.8
Performane Rate

0.6

0.4

0.2

0
1 2 3 4 5 6 7 8 9
Clinic
Tips:
Organizing data from largest to smallest may be helpful in highlighting data.
Keep it simple: do not use shadows or 3D rectangles.
Too many categories can make bar charts messy. When there are this many bars on a bar
graph, make sure to ask yourself if it is contextually appropriate to compare all of the
values on the bar chart.

Clinical visits (2011)


Percentage of eligible adult patients who had at least one
clinical visit in each half of the year.
100
90
80
Performance Rate (%)

70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Clinic
Too many variables per category can also make bar charts messy. Is it appropriate to
compare all of the variables within a category?

Mean Clinic Scores by Indicator (2011)


100
90
80
Performance Rate (%)

70
Clinical Visits
60
TB Screening
50
CTX
40
Nutritional Assessment
30 Prevention Education
20 Alcohol Screening
10
0
A B C D E
Clinic
Pie Charts
Work well if you want to compare individual slices of the pie with the whole
pie. It may be difficult to compare different sections of a given pie chart or
to compare data across different pie charts. A bar chart (histogram or stack
chart) or table may be more appropriate in that case.
Too many variables make a pie chart hard to manage. If the variables are
numerical, consider using a histogram instead. You can also consider combining
categories but remember that this could hide variation and alter how the data
are interpreted.

<50
CD4 Count Distribution 51-100
101-150
CD4 Count Distribution
151-200
201-250
251-300
301-350
351-400 <50
451-500
501-550 51-100
551-600
601-650 101-200
651-700 201-250
701-750
751-800 400+
801-850
851-900
901-950
951-1000
1000+
Tables
Tables often work better than bar charts and pie charts when there are too many data
points and too many descriptors of those data points. Many people may not consider
this as a way to visualize data, but tables still use specific formatting and spatial
orientation to communicate the data more easily. In terms of data ink, every piece of
a table is critical information. However, tables may not be good at showing patterns
over time.
CD4 Monitoring Mean Clinic Scores
Percentage of eligible patients who had at least one CD4
count during the review period
Table Formatting Tips
Do not use gridlines. The space between the numbers visually separate categories.
Underline the column headers
Consider Zebra Striping: light shading to separate specific groups you want to
highlight.

CD4 Monitoring Indicator Results


Before Clinic Performance Rate Denominator
A 60% 100
B 75% 150
C 50% 120

CD4 Monitoring Indicator Results


Clinic Performance Rate (%) Denominator
A 60 100
After
B 75 150
C 50 120
Line Charts
Line charts work well to show trends over intervals of time (time series). The more
data points, the better. Line charts show a continuous line even though data may be
discrete.

Tips:
Use different colors to differentiate between different line. Remember that our eyes
will naturally compare two different lines on the same chart. If two data points are not
comparable, then maybe they should not be on the same graph.
Label the lines directly on the chart instead of using a legend.
Line charts are very prone to distortion.
Percentage of eligible patients screened for tuberculosis
Y Axis Scale: 0 to 25 Y Axis Scale: 0 to 100
25 100

20
Performance Rate (%)

Performance Rate (%)


75

15
50
10

25
5

0 0
Jan Feb Mar Apr May June Jan Feb Mar Apr May June

25 Y Axis Scale: 15 to 20
20
20
Performance Rate (%)

Y Axis Scale: 0 to 25
Performance Rate (%)
19
15
Height > Width 18

10 17

16
5
15
Jan Feb Mar Apr May June
0
Mar
Jan

Apr

June
May
Feb
Box-and-whisker Plots
Are a great way to compare different sets of data. Several
different descriptive statistics can be compared: Max, min,
upper quartile, median, lower quartile, range and interquartile
range.
Namibia Food Security

Oct 10 - Mar 11

Jan - Jun 10

Jul - Dec 09

Review Period
Jan - Jun 09

Jul - Dec 08

Jan - Jun 08

0 10 20 30 40 50 60 70 80 90 100
Performance Rate (%)
The next few examples illustrate how important labeling is.
Labeling provides more context to the data, allowing for
more rigorous and accurate interpretations of the data.

Mortality Rates of People Actively Playing


Popular Sports in 2011
Mortality Rate (# deaths / 1000

12

10
people/year)

0
Soccer Rugby Cricket Golf

Is playing golf more dangerous than other sports?


Mortality Rate (# deaths / 1000 people/year)

0
2
4
6
8
10
12
Average Age = 23

Soccer
Average Age = 20

Rugby
in 2011

Average Age = 25
Cricket

Average Age = 60
Golf
Mortality Rate of People Actively Playing Popular Sports
What can we conclude?
Percent of Adults who received a TB assessment
during the review period (Adult, 2008)
100
90
80
Performance Rate (%)

70
60
50
40
30
20
10
0
Clinic A Clinic B Clinic C
Percent of Adults who received a TB assessment
during the review period (Adult, 2008) n = 2
100
90
n = 150
80 n = 200
Performance Rate (%)

70
60
50
40
30
20
10
0
Clinic A Clinic B Clinic C

Clinic C only has 2 eligible patients!


Write on Graphs: Use words, numbers and graphics in
combinations
Use words directly on graphs to provide more context. For example, on a clinic level run
chart, use words and arrows to denote when a QI project was implemented. Heres an
example from Namibia.
Graph/Table Combinations
Graphs and tables can be utilized together. The table provides more context and detail
while the graph reveals any patterns of the data. Heres an example using data form
Uganda.
Sparklines: Intense, Simple, Word-Sized Graphics
Invented by Edward Tufte, these powerful graphics add tremendously to the meaning of
numbers. They provide context. For example, I can say that the current temperature is
30 degrees Celsius. However, if I include a sparkline that shows the weather during the
previous 24 hours, it immediately puts that 30 degrees into context. The sparklines I
showed in the previous slide show the spread of the data. Each little tick mark
represents an individual clinics score. The red mark is the mean of those scores. Since I
oriented the spreads in the same column, I can quickly see how the spread changes from
round to round.
Small Multiples
When clinic level data are aggregated, detail at the clinic level is lost. Looking at
longitudinal mean clinic scores, individual clinic trends cannot be extrapolated. There are
several visualization techniques that encourage the eye to examine both clinic level and
aggregate level patterns. Small multiples, a series of graphics that show the same
combination of variables, is one such technique. Here is an example of what it would look
like.

Created by Jorge Camoes


Mar Apr
Jul Dec

Jul Dec
Jan Jun

Jan Jun

Jan Jun
Heat Maps
Use color to encourage the eye to
examine both clinic level and aggregate A
level patterns. In this example, each B
color represents a range of performance
rates. The more red the color, the closer C
the performance rate is to 0%. The D
more green the color the closer the E
performance rate is to 100%.
F
G

Clinic
Namibia Food Security H
Key to Swatch
Indicator Results I Colors
Percentage of eligible adult patients J Rate (%)
0 to 10
assessed for food security by clinic
K 11 to 20
and review period. 21 to 30
L 31 to 40
41 to 50
M 51 to 60
N 61 to 70
71 to 80
O 81 to 90
91 to 100
P
Summary
Context is essential for graphical integrity.
Provide historical data when available.
Label axes properly.
Always provide denominators to percentages.

Do whatever it takes to display the data in the best way with


integrity and clarity.
Data visualization should be content-driven not
methodology driven
Use combinations of words, numbers and graphics.
Combine tables and charts together

Creating an excellent data graphic takes time. Like good


writing it requires revising and editing.

You might also like