You are on page 1of 18

1

3. The Methodology of Descriptive Statistics


The purpose of a descriptive statistical investigation is clarify a number of
characteristics of a given variable measures in time or as a cross section
A descriptive statistical analysis consists of:
Setting up a histogram (or a time series plot)
Calculating descriptive statistics
Measures of location and position
Inspection for outliers
Classification of the distribution of the examined data set
The range of statistical techniques utilized have not provided us with
anything more than we would have got by taking the [...] variables and
looking at their graphs
Statistics EUS & Negot Chinese 1
In statistics, we consider the following types of data:
Cross-section:
Many sectors/categories/regions at a given point in time
Time series:
One sector/category/regions over a period of time e.g. a year
Panel:
A combination of times series and cross section
Census:
Statistics provided through a questionnaire
Statistics EUS & Negot Chinese 2
2
4. Histogram
A histogram displays classification into intervals of a
quantitative variable
The horizontal axis (x-axis) is the interval scale
The vertical axis (y-axis) is used to display the frequency
Data set with 20 observations of incomes in 1,000 DKK
Ranked
Statistics EUS & Negot Chinese 3
9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 18 19 18 20 17

6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24

How can the data set be divided into some efficient
categories or groups?
Ad hoc method:
More mathematical approach:
2
k
=n where k is the number of categories
Statistics EUS & Negot Chinese 4
Below 5 6 to 10 11 to 15 16 to 20 21 or more Total
Number 20
Frequency 0 3 5 9 3 20
Relative % 0 0.15 0.25 0.45 0.15 1.00
Cumulative % 0 0.15 0.40 0.85 1.00
10.5 to 15 16 to 15 15 to 19.5 19.5 to 24 Total
Observations 20
Frequency 3 5 8 4 20
Relative % 0.15 0.25 0.40 0.20 1.00
Cumulative % 0.15 0.40 0.80 1.00
3
Statistics EUS & Negot Chinese 5
0
1
2
3
4
5
6
7
8
9
10
Under 5 5 to 10 11 to15 16 to 20 Over 20
Frequency
Interval (1,000 DKK)
Monthly Income
Construction of a Histogram by use of Excel
Statistics EUS & Negot Chinese 6
4
A Special Histogram


Age: 0 to 4 5 to 14 15 to 29 30 to 49 50 to 69 70 or more Total
Persons, mill 116.60 196.90 350.50 283.10 147.90 36.80 1131.90
Persons, % 10.30 17.40 30.97 25.01 13.07 3.25 100.00

Units of 5 years 1 2 3 4 4 [4]* 18
% units of 5 years 10.30 8.70 10.32 6.25 3.27 0.81
*=assumed

Using data from the first part of the table the following graph can be drawn:



0,00
5,00
10,00
15,00
20,00
25,00
30,00
35,00
0to 4 5to 14 15to 29 30to 49 50to 69 70or more
Percent
Age
Population China 1990
Statistics EUS & Negot Chinese 7


0.00
2.00
4.00
6.00
8.00
10.00
12.00
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
plus
PopulationChina1.7.1990
Statistics EUS & Negot Chinese 8
Age, year 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 >85
Person,%
10.3 8.7 8.7 10.3 10.3 10.3 6.25 6.25 6.25 6.25 3.3 3.3 3.3 3.3 0.8 0.8 0.8 0.8 0.8

5
5. Measures of Location
Most frequent or typical observation
Sample mean (MB page 26)
Modus or Mode (MB page 32)
Median (MB page 31)
Geometric Mean (MB page 57)
Relation among the mean, mode and median
Quartiles and Perentiles
Statistics EUS & Negot Chinese 9
The mean
Uses information from all observations
Man
From the example:
Grouped data set:
Statistics EUS & Negot Chinese 10
6
Example of Grouped data set on Grades
Exam in the course International Economics that was held in
February 2011 at the BA-int study in Flensburg
Grouped mean:
Modus or Mode
This is the most common observed observation (highest frequency)
Income data example mode = 16
Grouped data examplemode = 7
Statistics EUS & Negot Chinese 11
Grades of passed (7-point DK scale) 2 4 7 10 12 Total
Frequency 10 26 33 19 4 92

Median
The middlemost observation:
Median = 0.50(n + 1) ordered position
0.50(20+1) = 10.5 ordered observation = 16
Example with grades: At the 46.5 ordered obs. = 7
Important measure because it is not sensitive with regard to
outliers
Statistics EUS & Negot Chinese 12
Data 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24
Frequency .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05
Cumulative .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00
Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

7
Sum function:
Statistics EUS & Negot Chinese 13
Dealing with symmetry
Statistics EUS & Negot Chinese 14
8
Summing up
Symmetry: M
0
= M
d
=
Skewed to the right: M
0
< M
d
< (bulk of data left)
Skewed to the left: < M
d
< M
0
(bulk of data right)
Income data set:
= 15.85 < M
0
= 16 and M
d
= 16 data is skewed to the left
Grade data set:
= 6.45 < M
o
= 7 and M
d
= 7 data is skewed to the left
Statistics EUS & Negot Chinese 15
Quartiles and Percentiles
Quartile = q(n+1) ordered position
Percentile = p(n+1) ordered position
5-point summary:
1
st
decil is 0.10-percentile
Lower quartile is 0.25-percentile (called Q
1
)
Median is 0.50-percentile
Upper quartile is 0.75-percentile (called Q
3
)
9
th
decil is 0.90-percentile
Statistics EUS & Negot Chinese 16
9
Example
10: (20+1)(10/100) = 2.10 observations appears at = 9.10
25: (20+1)(25/100) = 5.25 observations appears at = 13.75
50: (20+1)(50/100) = 10.50 observations appears at = 16.00
75: (20+1)(75/100) = 15.75 observations appears at = 18.25
90: (20+1)(90/100) = 18.90 observations appears at = 21.90
Statistics EUS & Negot Chinese 17
Data 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24
Frequency .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05
Cumulative .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00
Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Geometric (multiplicative) Mean
Defined as:
The geometric mean is always smaller than the arithmic mean
Example:
Statistics EUS & Negot Chinese 18
10
6. Measures of Dispersion
Range, inter quartile range, decil range and Box-plot
Variance and standard deviation
Coefficient of variation
Skewness and kurtosis
Range = maximum minimum
Quartile range = Q
3
Q
1
= 50 % of obs.
Decil range = D
9
D
1
= 80 % of obs.
Statistics EUS & Negot Chinese 19
Box-plot
A Box-plot is used in order to identify outliers
Outlier: obs. more than 3 times the IRQ away from Q
1
and Q
3
Suspected outlier: obs. more than 1.5 (but less than 3) IRQ away from Q
1
and Q
3
For our little data set we get
(supected) Outlier Q
1
Median Q
3
BoxPlot
0 5 10 15 20 25 30
Statistics EUS & Negot Chinese 20
11
Lower inner fence:
Q
1
1.5IQR = 13.75 1.5(4.5) = 7.00
Lower outer fence:
Q
1
3.0IQR = 13.75 3.0(4.5) = 0.25
Upper inner fence:
Q
3
+ 1.5IQR = 18.25 + 1.5(4.5) = 25.50
Upper outer fence:
Q
3
+ 3.0IQR = 18.25 + 3.0(4.5) = 32.25
Statistics EUS & Negot Chinese 21
Variance and Standard Deviation
Make use of all observations
or
Example on data set for incomes
Statistics EUS & Negot Chinese 22
12
Grouped data set
Example:
Statistics EUS & Negot Chinese 23
The Coefficient of Variation:
Gives the relative dispersion
Recommended for comparisons of different data sets
If the distribution has large variation (is very flat) then CV
takes a large value.
If the distribution has small variation (is very steep) then CV
takes a small value.
Statistics EUS & Negot Chinese 24
13
Some examples:
SK > 0: Right
SK = 0: Symmetry
SK < 0: Left
KU large: Density
KU low: Uniform
Statistics EUS & Negot Chinese 25
7. Descriptive statistics on a Computer or Calculator
Use of Excel
Use of Megastat
Use of pocket calculator
Statistics EUS & Negot Chinese 26
14
8. Descriptive Statistics in a Grouped Data Sets
Statistics EUS & Negot Chinese 27
More complex data set for the distribution of income, Denmark

Disposal house hold incomes, Denmark, 1987



i
Interval for incomes
1,000 DKK
Number of
households,
1,000
Mean income
1,000 DKK
Income mass
Mio. DKK
Deviation

Square
f
i
x
i
f
i
x
i
(x
i
) (x
i
)
2
f
i
(x
i
)
2

1
2
3
4
5
6
7
8
0
50
100
150
200
250
300
400
- 49.9
- 99.9
- 149.9
- 199.9
- 249.9
299.9
399.9
-
146
590
414
323
325
210
139
55
36.9
73.2
123.7
175.1
225.9
273.6
340.6
548.3
5,387
43,202
51,224
56,568
73,435
57,446
47,339
30,156
-128.7
-92.4
-41.9
9.5
60.3
108.0
175.0
382.7
16563.69
8537.76
1755.61
90.25
3636.09
11664.00
30625.00
146459.29
2418298
5036983
726822
29151
1181729
2449440
4256875
8055261
Sum 2,202 364,757 24154559
Source: Statistics Denmark, Annual Statistical Review, 1994, page 220-221.

Mean and Standard Deviation
Statistics EUS & Negot Chinese 28
Mean and Standard Deviation
There are 8 categories i.e. k = 8. By insertion in the formulas:

Mean: 6 . 165 648 , 165
202 , 2
757 , 364
1

DKK
n
x f
k
i
i i



Standard deviation: 73 . 104
202 , 2
559 , 154 , 24
) (
1
2

n
x f
k
i
i i


15
Histogram, Quartiles, Median and Box-plot
Consider the relative and cumulative distribution of data
Statistics EUS & Negot Chinese 29
Disponible husstandsindkomster, Danmark, 1987



i
Interval for incomes
1,000 DKK
Number of
households,
1,000
Number of
households
frequency, %
Cumulative
frequency, %
fi fi/n
1
2
3
4
5
6
7
8
0
50
100
150
200
250
300
400
- 49.9
- 99.9
- 149.9
- 199.9
- 249.9
299.9
399.9
-
146
590
414
323
325
210
139
55
6.6
26.8
18.8
14.7
14.8
9.5
6.3
2.5
6.6
33.4
52.2
66.9
81.7
91.2
97.5
100.0

Sum 2,202 100.0
Source: Statistics Denmark, Annual Statistical Review, 1994, page 220-221

Histogram
Distribution Income, Denmark, 1987
0,00
5,00
10,00
15,00
20,00
25,00
30,00
0 - 49 50 - 99 100 -
149
150 -
199
200 -
249
250 -
299
300 -
349
350 -
399
Above
400
%
Statistics EUS & Negot Chinese 30
16
Sum Function
Statistics EUS & Negot Chinese 31
How to do the interpolation
We use a formula for example given as:

Value = End value interval
" "
" "
pct percent in width Total
fractile to relative long too
interval width in value

Illustration:

Frequency %

52.2
50


33.4
100 ? 149 income (1,000 DKK)

Statistics EUS & Negot Chinese 32
17
Median: 149 , 144 851 , 5 000 , 150 000 , 50
8 . 18
) 50 2 . 52 (
000 , 150



Similarly for the other quartiles and deciles:

Lower quartile: 328 , 84 000 , 50
8 . 26
) 25 4 . 33 (
000 , 100

(Q
1
)

Upper quartile: 365 , 227 000 , 50
8 . 14
) 75 7 . 81 (
000 , 250

(Q
3
)

Lower decile: 343 , 56 000 , 50
8 . 26
) 10 4 . 33 (
000 , 100



Upper decile: 684 , 293 000 , 50
5 . 9
) 90 2 . 91 (
000 , 300


Statistics EUS & Negot Chinese 33

Inter Quartile Range (IQR): (Q
3
Q
1
) = 227,365 84,328 = 143,037

Lower inner fence: Q
1
1.5IQR = 84,328 1.5(143,037) = 130,228
Lower outer fence: Q
1
3.0IQR = 84,328 3.0(143,037) = 344,783

Upper inner fence: Q
3
+ 1,5IQR = 227,365 + 1.5(143,037) = 441,921
Upper outer fence: Q
3
+ 3.0IQR = 227,365 + 3.0(143,037) = 656,476

Box-plot

300 200 100 0 100 200 300 400 500 600
LOF = 345 LIF = 130 Q
1
=84 M=144 Q
3
=227 UIF = 442 UOF = 656
Statistics EUS & Negot Chinese 34
18
9. Descriptive Statistics an Example of Outliers
Outliers are extremes
Outliers make distributions non-normal
Outliers changes the mean, standard deviation and skewness
However, the median remains constant
Statistics EUS & Negot Chinese 35
Basic Max=34 Max=44 Max=54

Mean 15.85 16.35 16.85 17.35 Increases
Standard Error 1.00 1.29 1,69 2.13
Median 16 16 16 16 Constant!!
Modus / Mode 16 16 16 16
Standard deviation 4.46 5.79 7.56 9.52
Sample variance 19.92 33.50 57.08 90.66
Kurtosis 0.12 3.88 8.99 12.55
Skewness -0.35 1.19 2.43 3.16 Increases
Range 18 28 38 48
Minimum 6 6 6 6
Maximum 24 34 44 54
Sum 317 327 337 347
Observations 20 20 20 20
Confidence interval(95 %) 2.09 2.71 3.54 4.46 Increases

Statistics EUS & Negot Chinese 36

You might also like