Professional Documents
Culture Documents
Lecture 1
Serafeim Tsoukas
Graphical techniques
Education and employment data
Higher
A levels
education
In work
Unemployed
Inactive
Total
Other
No
qualification
qualification
Total
9,713
5,479
10,173
1,965
23,852
394
432
1,166
382
2,374
1,256
1,440
3,277
2,112
8,084
11,362
7,352
14,615
4,458
37,788
9,713
12,000
10,000
8,000
6,000
4,000
2,000
0
Higher education
Advanced level
Note: The height of each bar is determined by the associated frequency. The first bar is 9,713 units high,
the second is 5,479 and so on. The ordering of the bars could be reversed (no qualifications becoming
the first category) without altering the message.
12,000
10,000
8,000
In work
6,000
Unemployed
Inactive
4,000
2,000
0
Higher
education
Advanced level
Other
No qualifications
qualifications
14,000
12,000
10,000
Inactive
8,000
Unemployed
In work
6,000
4,000
2,000
0
Higher
education
Advanced level
Other
qualifications
No
qualifications
Figure 1.3 Stacked bar chart of educational qualifications and employment status
80%
60%
Inactive
Unemployed
40%
In work
20%
0%
Higher education
Advanced level
Other
qualifications
No qualifications
Numbers (thousands)
09,999
1,668
10,00024,999
1,318
25,00039,999
1,174
40,00049,999
662
50,00059,999
627
60,00079,999
1,095
80,00099,999
1,195
100,000149,999
3,267
150,000199,000
2,392
200,000299,000
2,885
300,000499,999
1,480
500,000999,999
628
1,000,0001,999,999
198
2,000,000 or more
Total
88
18,667
Number of individuals
3,000
2,500
2,000
1,500
1,000
500
Figure 1.7 Bar chart of the distribution of wealth in the UK, 2005
2,000,000
1,000,000
500,000
300,000
200,000
150,000
100,000
80,000
60,000
50,000
40,000
25,000
10,000
Numbers (thousands)
1,318
:
1,480
Number, or
Class width
Frequency density
Frequency
0
1,668
10,000
0.1668
10,000
1,318
15,000
0.0879
25,000
1,174
15,000
0.0783
40,000
662
10,000
0.0662
50,000
627
10,000
0.0627
60,000
1,095
20,000
0.0548
80,000
1,195
20,000
0.0598
Numerical techniques
We examine the measures of
Location
Dispersion
Skewness.
Measures of location
Mean strictly the arithmetic mean, the well known
average
Median the wealth of the person in the middle of
the distribution
Mode the level of wealth that occurs most often
fx
0-
5.0
1,668
8,340
10,000-
17.5
1,318
23,065
25,000-
32.5
1,174
38,155
40,000-
45.0
662
29,790
50,000-
55.0
627
34,485
60,000-
70.0
1,095
76,650
80,000-
90.0
1,195
107,550
100,000-
125.0
3,267
408,375
150,000-
175.0
2,392
418,600
200,000-
250.0
2,885
721,250
300,000-
400.0
1,480
592,000
500,000-
750.0
628
471,000
1,000,000-
1500.0
198
297,000
2,000,000-
3000.0
88
264,000
18,677
3,490,260
Total
fx
3,490,260
186.875
18,677
The median
The wealth of the middle person i.e. the one
located halfway through the distribution.
Poorest
Richest
This persons wealth
Simple example
Range
Frequency
Cumulative
frequency
1,668
1,668
10,000
1,318
2,986
25,000
1,174
4,160
40,000
662
4,822
50,000
627
5,449
60,000
1,095
6,544
80,000
1,195
7,739
100,000
3,267
11,006
2 F
xL xU xL
18,677,000
7,739,000
2
100,000 150,000 100,000
124,480
3,267,000
The mode
The mode is the observation with the highest
frequency
Size
Sales
8
10
12
14
16
18
7
25
36
11
3
1
Number, or
Class
Frequency
Frequency
width
density
1,668
10,000
0.1668
10,000
1,318
15,000
0.0879
25,000
1,174
15,000
0.0783
40,000
662
10,000
0.0662
50,000
627
10,000
0.0627
Mode = 010,000
Modal
class
Mode
Mean
0 10
25 40 50 60
80
100
150
200
Wealth (000)
Figure 1.12 The histogram with the mean, median and mode marked
Measures of dispersion
The range the difference between smallest and
largest observation. Not very informative for
wealth.
Inter-quartile range contains the middle half of
the observations.
Variance based on all observations in the
sample.
Inter-quartile range
First quartile one-quarter of the way through the
distribution, person ranked 4,669.25
4,669.25 4,160
Q1 40,000 50,000 40,000
47,692.6
662
Outlier
400
300
200
IQR
Third quartile
Median
100
First quartile
The variance
The variance is the average of all squared
deviations from the mean:
f x
Small
variance
Large
variance
Range
point x
Frequency,
(000)
Deviation
(x )
(x )2
f(x )2
5.0
1,668
181.9
33,078.4
55,174,821.9
10,000
17.5
1,318
169.4
28,687.8
37,810,535.3
25,000
32.5
1,174
154.4
23,831.6
27,978,261.2
40,000
45.0
662
141.9
20,128.4
13,325,033.3
50,000
55.0
627
131.9
17,391.0
10,904,128.1
1,000,000
1500.0
198
1,313.1
1,724,297.9
341,410,980.4
2,000,000
3000.0
88
2,813.1
7,913,673.6
696,403,275.4
Totals
18,677
1,499,890,455.1
f x 2 1,499,890,455.1
2
80,306.8
f
18,677
80,306.8 283.385
or 283,385.
Sample measures
For sample data, use
2
f x x
2
s
n 1
to calculate the sample variance.
Measuring skewness
Right skewed
CS > 0
Left skewed
CS < 0
f x
Coefficien t of skewness
N 3
x
5.0
17.5
:
750.0
1500.0
3000.0
x-
-181.9
-169.4
:
563.1
1313.1
2813.1
3,898.8
f
1,668
1,318
:
628
198
88
18,677
f x
N 3
(x-
-6,016,132
-4,858,991
:
178,572,660
2,264,219,059
22,262,154,853
24,692,431,323
2,506,882,551,023
5.898
18,677 22,757 ,714
f (x-
-10,034,907,815
-6,404,150,553
:
112,143,630,236
448,315,373,613
1,959,069,627,104
2,506,882,551,023
Summary
We can use graphical and numerical measures to
summarise data.
The aim is to simplify without distorting the
message.
Measures of location, dispersion and skewness
provide a good description of the data.
Year
Investment
1988
97,956
1989
113,478
1990
117,027
1991
107,838
1992
103,913
1993
103,997
1994
111,623
1995
121,364
1996
130,346
1997
138,307
1998
155,997
Year
Investment
1999
161,722
2000
167,172
2001
171,782
2002
180,551
2003
186,700
2004
200,415
2005
209,758
2006
227,234
2007
249,517
2008
240,361
2009
204,270
250,000
Investment
200,000
150,000
100,000
50,000
2009
2007
2005
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
1977
10,000
-10,000
-20,000
-30,000
-40,000
2009
2007
2005
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
0
1977
Change in investment
20,000
Log investment
12.0
11.5
11.0
10.5
10.0
9.5
9.0
8.5
2009
2007
2005
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
1977
8.0
Investment expenditures
90,000.00
80,000.00
70,000.00
Dwellings
60,000.00
Transport
50,000.00
Machinery
40,000.00
30,000.00
Other buildings
20,000.00
10,000.00
2009
2007
2005
2003
2001
1999
1997
1995
1993
1991
1989
1987
1985
1983
1981
1979
1977
0.00
16
12
10
150,000
8
100,000
Investment (m)
14
200,000
Interest rates
Investment
6
4
50,000
2
2005
2003
2001
1999
1997
1995
1993
1991
1989
1987
0
1985
Figure 1.21 Timeseries graph using two vertical scales: investment (LH scale)
and the interest rate (RH scale), 19852005
7.2050
x1
28, 351
32
7.205 1.0637
An approximate alternative
The average growth rate can also be calculated
as the arithmetic mean of the annual growth rates:
1.142 1.190 0.963 0.850
1.0664
32
i.e. 6.6%.
This gives approximately the right answer, as long
as the growth rate is not too big.
x2
0.020
0.036
0.017
:
0.007
0.010
0.001
0.023
0.3230
s2
2
2
x
n
x
n 1
0.323 32 0.066 2
31
0.0059
s 0.0766
Bivariate data
We examine the relationship between investment and
GDP.
300,000
Investment (m)
250,000
200,000
150,000
100,000
50,000
0
0
200,000
400,000
600,000
800,000
1,000,000 1,200,000
GDP (m)
Figure 1.24 Scatter diagram of investment (vertical axis) against GDP (horizontal
axis) (nominal values)
Real Investment
250 000
200 000
150 000
100 000
50 000
500 000
700 000
900 000
1100 000
1300 000
1500 000
Real GDP
Figure 1.25 The relationship between real investment and real output
Summary
Slightly different graphical and numerical techniques are
used for timeseries data
A variety of timeseries charts are available, both for single
and multiple series.
The mean and variance are both useful descriptive
devices, but it makes more sense to apply them to the
growth rate, rather than the level, of a trended variable.
Data transformations can be useful, e.g. taking logs or
differences and deflating to real terms.