You are on page 1of 51

Quantitative methods for finance

Lecture 1

Serafeim Tsoukas

Chapter 1: Descriptive statistics


Descriptive statistics summarises a mass of
information.
We may use graphical and/or numerical methods.

Examples of the former are the bar chart and XY


chart, examples of the latter are averages and
standard deviations.

Graphical techniques
Education and employment data
Higher

A levels

education
In work
Unemployed
Inactive

Total

Other

No

qualification

qualification

Total

9,713

5,479

10,173

1,965

23,852

394

432

1,166

382

2,374

1,256

1,440

3,277

2,112

8,084

11,362

7,352

14,615

4,458

37,788

Table 1.1 Economic status and educational qualifications, 2009 (numbers in


000s)
Source: Adapted from Department for Children, Schools and Familits, Education and Training Statistics for
the UK 2009, http://www.education.gov.uk/rsgateway/DB/VOL/v000891/, contains public sector information
licensed under the Open Government Licence (OGL) v1.0. http://www.nationalarchives.gov.uk/doc/opengovernmentlicence/open-government

The bar chart

9,713

Number of people (000s)

12,000
10,000
8,000
6,000
4,000
2,000
0
Higher education

Advanced level

Other qualifications No qualifications

Note: The height of each bar is determined by the associated frequency. The first bar is 9,713 units high,
the second is 5,479 and so on. The ordering of the bars could be reversed (no qualifications becoming
the first category) without altering the message.

Figure 1.1 Educational qualifications of people in work in the UK, 2009

A multiple bar chart


Number of people (000s)

12,000
10,000
8,000
In work

6,000

Unemployed
Inactive

4,000
2,000
0
Higher
education

Advanced level

Other
No qualifications
qualifications

Figure 1.2 Educational qualifications by employment category

The stacked bar chart


16,000

Number of people (000s)

14,000
12,000
10,000

Inactive

8,000

Unemployed
In work

6,000
4,000
2,000
0
Higher
education

Advanced level

Other
qualifications

No
qualifications

Figure 1.3 Stacked bar chart of educational qualifications and employment status

A stacked bar chart (percentages)


100%

80%

60%

Inactive
Unemployed

40%

In work

20%

0%
Higher education

Advanced level

Other
qualifications

No qualifications

Figure 1.4 Percentages in each employment category, by educational qualification

The pie chart

Figure 1.5 Educational qualifications of those in work

Data on wealth in the UK


Class interval ()

Numbers (thousands)

09,999

1,668

10,00024,999

1,318

25,00039,999

1,174

40,00049,999

662

50,00059,999

627

60,00079,999

1,095

80,00099,999

1,195

100,000149,999

3,267

150,000199,000

2,392

200,000299,000

2,885

300,000499,999

1,480

500,000999,999

628

1,000,0001,999,999

198

2,000,000 or more

Total

88
18,667

Table 1.3 The distribution of wealth, UK, 2005


Source: Adapted from HM Revenue and Customs Statistics, 2005, contains public sector information licensed
under the Open Government Licence (OGL) v1.0.http://www.nationalarchives.gov.uk/doc/open-governmentlicence/
open-government

A (misleading!) bar chart


3,500

Number of individuals

3,000
2,500
2,000
1,500
1,000
500

Income class (lower boundary)

Figure 1.7 Bar chart of the distribution of wealth in the UK, 2005

2,000,000

1,000,000

500,000

300,000

200,000

150,000

100,000

80,000

60,000

50,000

40,000

25,000

10,000

The histogram the correct picture

Figure 1.9 Histogram of the distribution of wealth in the UK, 2005

Histogram versus bar chart


The bar chart gives the wrong picture because of
varying class widths.
Class interval )
10,00024,999
:
300,000499,000

Numbers (thousands)
1,318
:
1,480

These two classes have similar frequencies (similar


heights for bar chart) but the second is over 13 times
wider. Adjusting for width, its frequency should be
111 (1,480 15/200).

The frequency density


Applying this principle leads to calculation of the
frequency densities:
Range

Number, or

Class width

Frequency density

Frequency
0

1,668

10,000

0.1668

10,000

1,318

15,000

0.0879

25,000

1,174

15,000

0.0783

40,000

662

10,000

0.0662

50,000

627

10,000

0.0627

60,000

1,095

20,000

0.0548

80,000

1,195

20,000

0.0598

Numerical techniques
We examine the measures of
Location
Dispersion
Skewness.

Measures of location
Mean strictly the arithmetic mean, the well known
average
Median the wealth of the person in the middle of
the distribution
Mode the level of wealth that occurs most often

These different measures can give different answer.

The mean of the wealth distribution


Range

fx

0-

5.0

1,668

8,340

10,000-

17.5

1,318

23,065

25,000-

32.5

1,174

38,155

40,000-

45.0

662

29,790

50,000-

55.0

627

34,485

60,000-

70.0

1,095

76,650

80,000-

90.0

1,195

107,550

100,000-

125.0

3,267

408,375

150,000-

175.0

2,392

418,600

200,000-

250.0

2,885

721,250

300,000-

400.0

1,480

592,000

500,000-

750.0

628

471,000

1,000,000-

1500.0

198

297,000

2,000,000-

3000.0

88

264,000

18,677

3,490,260

Total

fx

3,490,260
186.875
18,677

Mean wealth is 186,875

Locating the mean

The median
The wealth of the middle person i.e. the one
located halfway through the distribution.

Poorest

Richest
This persons wealth

The median is little affected by outliers, unlike the


mean.

Simple example

Values 45, 12, 33, 80, 77


What is the median value?

Put the values in order: 12, 33, 45, 77, 80


Median of the five values

Calculating the median wealth


18,677 (thousand) observations, hence person 9,338.5 in rank order
has the median wealth
This person is somewhere in the 100150k interval

Range

Frequency

Cumulative
frequency

1,668

1,668

10,000

1,318

2,986

25,000

1,174

4,160

40,000

662

4,822

50,000

627

5,449

60,000

1,095

6,544

80,000

1,195

7,739

100,000

3,267

11,006

Number with wealth


less than 100k
Number with wealth
less than 150k

Calculating the median (Continued)


To find the precise median value, use
N

2 F
xL xU xL

18,677,000

7,739,000

2
100,000 150,000 100,000
124,480
3,267,000

Median wealth is 124,480.

The mode
The mode is the observation with the highest
frequency
Size

Sales
8
10
12
14
16
18

7
25
36
11
3
1

Modal dress size = 12

The mode wealth data


For grouped data, the mode corresponds to the
interval with greatest frequency density.
Range

Number, or

Class

Frequency

Frequency

width

density

1,668

10,000

0.1668

10,000

1,318

15,000

0.0879

25,000

1,174

15,000

0.0783

40,000

662

10,000

0.0662

50,000

627

10,000

0.0627

Mode = 010,000

Modal
class

Differences between mean,


median and mode
Median

Mode

Mean

Class widths squeezed

0 10

25 40 50 60

80

100

150

200

Wealth (000)

Figure 1.12 The histogram with the mean, median and mode marked

Measures of dispersion
The range the difference between smallest and
largest observation. Not very informative for
wealth.
Inter-quartile range contains the middle half of
the observations.
Variance based on all observations in the
sample.

Inter-quartile range
First quartile one-quarter of the way through the
distribution, person ranked 4,669.25
4,669.25 4,160
Q1 40,000 50,000 40,000
47,692.6
662

Third quartile three quarters of the way through the


distribution, person ranked 14,007.75 hence
Q3 = 221,135.1
IQR = Q3 Q1 = 221,135 47,693 = 173,442.

Box and whiskers plot


Wealth
(000)

Outlier

400

300

200

IQR

Third quartile
Median

100

First quartile

The variance
The variance is the average of all squared
deviations from the mean:

f x

The larger this value, the greater the dispersion of


the observations.

The variance (Continued)

Small
variance
Large
variance

Calculation of the variance


Mid-

Range

point x

Frequency,

(000)

Deviation
(x )

(x )2

f(x )2

5.0

1,668

181.9

33,078.4

55,174,821.9

10,000

17.5

1,318

169.4

28,687.8

37,810,535.3

25,000

32.5

1,174

154.4

23,831.6

27,978,261.2

40,000

45.0

662

141.9

20,128.4

13,325,033.3

50,000

55.0

627

131.9

17,391.0

10,904,128.1

1,000,000

1500.0

198

1,313.1

1,724,297.9

341,410,980.4

2,000,000

3000.0

88

2,813.1

7,913,673.6

696,403,275.4

Totals

18,677

1,499,890,455.1

f x 2 1,499,890,455.1
2

80,306.8
f
18,677

The standard deviation


The variance is measured squared s (because
we used squared deviations).
Hence take the square root to get back to s.
This gives the standard deviation:

80,306.8 283.385
or 283,385.

Sample measures
For sample data, use
2
f x x

2
s
n 1
to calculate the sample variance.

This gives an unbiased estimate of the population


variance.
Take the square root of this for the sample standard
deviation.

Measuring skewness
Right skewed
CS > 0

Left skewed
CS < 0

f x

Coefficien t of skewness

N 3

Skew of the wealth distribution


Range
010,000 :
500,000 1,000,000 2,000,000 Total

x
5.0
17.5
:
750.0
1500.0
3000.0

x-
-181.9
-169.4
:
563.1
1313.1
2813.1
3,898.8

f
1,668
1,318
:
628
198
88
18,677

f x

N 3

(x-
-6,016,132
-4,858,991
:
178,572,660
2,264,219,059
22,262,154,853
24,692,431,323

2,506,882,551,023
5.898
18,677 22,757 ,714

f (x-
-10,034,907,815
-6,404,150,553
:
112,143,630,236
448,315,373,613
1,959,069,627,104
2,506,882,551,023

Summary
We can use graphical and numerical measures to
summarise data.
The aim is to simplify without distorting the
message.
Measures of location, dispersion and skewness
provide a good description of the data.

Descriptive statistics: Time series data


Slightly different techniques are used for time
series data data on one or more variable over
time.
We look at investment data in the UK by way of
example.

Investment data: 19772009


Year
Investment
1977
28,351
1978
32,387
1979
38,548
1980
43,612
1981
43,746
1982
47,935
1983
52,099
1984
59,278
1985
65,181
1986
69,581
1987
80,344

Year
Investment
1988
97,956
1989
113,478
1990
117,027
1991
107,838
1992
103,913
1993
103,997
1994
111,623
1995
121,364
1996
130,346
1997
138,307
1998
155,997

Year
Investment
1999
161,722
2000
167,172
2001
171,782
2002
180,551
2003
186,700
2004
200,415
2005
209,758
2006
227,234
2007
249,517
2008
240,361
2009
204,270

Not very informative we need a graph


Source: Data adapted from the Office for National Statistics licenced under the Open Government Licence v.1.0.
http://www.nationalarchives.gov.uk/doc/open-government-licence/open-government

Time series chart of investment


300,000

250,000

Investment

200,000

150,000

100,000

50,000

Figure 1.16 Timeseries graph of investment in the UK, 19772009

2009

2007

2005

2003

2001

1999

1997

1995

1993

1991

1989

1987

1985

1983

1981

1979

1977

Chart of the change in investment


30,000

10,000

-10,000

-20,000

-30,000

-40,000

Figure 1.17 Timeseries graph of the change in investment

2009

2007

2005

2003

2001

1999

1997

1995

1993

1991

1989

1987

1985

1983

1981

1979

0
1977

Change in investment

20,000

The logarithm of investment


13.0
12.5

Log investment

12.0
11.5
11.0
10.5

10.0
9.5
9.0
8.5
2009

2007

2005

2003

2001

1999

1997

1995

1993

1991

1989

1987

1985

1983

1981

1979

1977

8.0

Figure 1.18 Timeseries graph of the logarithm of investment expenditures

Graphing several series


100,000.00

Investment expenditures

90,000.00

80,000.00
70,000.00

Dwellings

60,000.00

Transport

50,000.00

Machinery

40,000.00

Intangible fixed assets

30,000.00

Other buildings

20,000.00
10,000.00
2009

2007

2005

2003

2001

1999

1997

1995

1993

1991

1989

1987

1985

1983

1981

1979

1977

0.00

Figure 1.20 A multiple timeseries graph of investment

An area graph of the same data

These are Excels


default colours!

Figure 1.22 Area graph of investment categories, 19772009

Using separate axes investment and the interest rate


250,000

16

12
10

150,000

8
100,000

Interest rate (%)

Investment (m)

14
200,000

Interest rates
Investment

6
4

50,000
2

2005

2003

2001

1999

1997

1995

1993

1991

1989

1987

0
1985

Figure 1.21 Timeseries graph using two vertical scales: investment (LH scale)
and the interest rate (RH scale), 19852005

Numerical summary measures


It makes more sense to calculate the average
growth rate of investment, rather than the level.
The growth rate is similar each year, the level
continuously increases.

The growth rate of investment


Calculate the growth factor over the whole time
period:
xT 204, 270

7.2050
x1
28, 351

Take the T1 root:

32

7.205 1.0637

Subtract 1: 1.0637 1 = 0.0637

The average growth rate is 6.4% p.a.

An approximate alternative
The average growth rate can also be calculated
as the arithmetic mean of the annual growth rates:
1.142 1.190 0.963 0.850
1.0664
32

i.e. 6.6%.
This gives approximately the right answer, as long
as the growth rate is not too big.

Variance of the growth rate

The stability of growth can be measured by calculating the


variance of the growth rate.
Growth
Year
Investment rate, x
1978
32,387
0.142
1979
38,548
0.190
1980
43,612
0.131
:
:
:
2006 227,234
0.083
2007 249,517
0.098
2008 240,361
-0.037
2009 204,270
-0.150
Totals
2.1253

x2
0.020
0.036
0.017
:
0.007
0.010
0.001
0.023
0.3230

s2

2
2
x

n
x

n 1
0.323 32 0.066 2

31
0.0059
s 0.0766

Bivariate data
We examine the relationship between investment and
GDP.
300,000

Investment (m)

250,000
200,000
150,000
100,000
50,000
0
0

200,000

400,000

600,000

800,000

1,000,000 1,200,000

GDP (m)

Figure 1.24 Scatter diagram of investment (vertical axis) against GDP (horizontal
axis) (nominal values)

Bivariate data (Continued)


High values of investment seem associated with
high values of GDP, there is a close relationship.
As both variables are growing over time, later
observations are at the top right of the graph, but
this does not have to be so.

Both variables are influenced by inflation, so it


might be better to graph the real series, after
adjusting for inflation.

Real investment versus real GDP


300 000

Real Investment

250 000

200 000

150 000

100 000

50 000
500 000

700 000

900 000

1100 000

1300 000

1500 000

Real GDP

Figure 1.25 The relationship between real investment and real output

Summary
Slightly different graphical and numerical techniques are
used for timeseries data
A variety of timeseries charts are available, both for single
and multiple series.
The mean and variance are both useful descriptive
devices, but it makes more sense to apply them to the
growth rate, rather than the level, of a trended variable.
Data transformations can be useful, e.g. taking logs or
differences and deflating to real terms.

You might also like