Professional Documents
Culture Documents
Chapter 1: Introduction
Chapter 1: Introduction
Contents
I What is ’Statistics’ ? - definition
I Key-words: population, parameter, sample, statistic, population size,
sample size, individuals, objects
I Types of variables: categorical (ordinal, nominal) and numerical
(discrete, continuous)
I Why sample? Definition of a simple random sample
I Frequencies and frequency distribution/table: absolute, absolute
cumulative, relative, relative cumulative. Properties.
Chapter 1: Introduction
Recommended reading
I Peña, D., Romo, J., ’Introducción a la Estadı́stica para las Ciencias
Sociales’
I Chapters 1, 2, 3
I Newbold, P. ’Estadı́stica para los Negocios y la Economı́a’ (2009)
I Chapter 1
I Sections 2.1, 2.4, 2.7. How to lie with Statistics
Definition of Statistics
Examples
Data (Variable)
. &
Categorical (Qualitative) Numerical (Quantitative)
. & . &
Ordinal Nominal Discrete Continuous
classes can be ranked no natural order integer nonintegers
Example Example Example Example
Clothes size: Blood type: # of children: Height:
L>M>S A,B,AB,O 0,1,2,. . . 1.55cm, 1.71cm
Why sample?
Cumulative Cumulative
Absolute Relative Absolute Relative
Class, xi Freq, ni Freq, fi Freq, Ni Frequency, Fi
x1 n1 f1 = nn1 N 1 = n1 F1 = f1
x2 n2 f2 = nn2 N2 = N1 + n2 F2 = F1 + f2
.. .. .. .. ..
. . . . .
nk
xk nk fk = n Nk = n Fk = 1
Total n 1 empty empty
Note:
I ni = number of xi in the sample, fi = number
n
of xi
I Ni = Ni−1 + ni , Fi = Fi−1 + fi
I 0 ≤ fi , Fi ≤ 1
I Fi and Ni do not make sense for categorical-nominal variables
Grouping by classes
Example 1: The data below shows blood types reported for a sample of
40 individuals.
AB, A, B, O, A, A, A, B, O, AB,
B, O, B, B, B, A, A, A, AB, B,
O, A, A, A, AB, AB, O, B, B, AB,
O, B, O, O, A, A, O, B, AB, AB
Grouping by classes
Example 1 cont.:
I Categorical, nominal with 4 different classes. The frequency
distribution is:
Absolute Relative
Class Frequency Frequency
A 12 0.300
B 11 0.275
AB 8 0.200
O 9 0.225
Total 40 1
I 30%
I 100% − 22.5% = 77.5%
Grouping by classes
Absolute
Class Frequency
VU 62
U 108
S 319
VS 412
Total 901
Grouping by classes
Example 2 cont.:
I Categorical, ordinal with 4 different classes. The frequency
distribution is:
Cumulative Cumulative
Absolute Relative Absolute Relative
Class Frequency Frequency Frequency Frequency
VU 62 0.07 62 0.07
U 108 0.12 170 0.19
S 319 0.35 489 0.54
VS 412 0.46 901 1
Total 901 1
I 35%
I 170, 19%
I 319 + 412 = 731 or 901 − 170 = 731, 35% + 46% = 81% or
100% − 19% = 81%
Grouping by classes
Example 3: To evaluate the performance of a new pesticide, a sample of
50 plants, from those treated by the new pesticide, was selected. The
number of leaves attacked by a pest was counted for each of the sampled
plants. The results are shown below.
Absolute
xi Frequency
0 6
1 10
2 12
3 8
4 5
5 4
6 3
8 1
10 1
Total 50
Grouping by classes
Example 3 cont.:
I What can you say about the variable in the study? Find its
frequency distribution.
I What percentage of the sampled plants had only 3 leaves attacked?
I How many plants had no more than 3 leaves attacked?
I How many plants had at least 6 leaves attacked?
I What percentage of plants have between 3 and 5 leaves attacked?
I What percentage of plants had at least 8 leaves attacked?
I What percentage of plants had at most 2 leaves attacked?
Grouping by classes
Example 3 cont.:
I Numerical, discrete with 9 different values. The frequency
distribution is:
Cumulative Cumulative
Absolute Relative Absolute Relative
xi Frequency Frequency Frequency Frequency
0 6 0.12 6 0.12
1 10 0.20 16 0.32
2 12 0.24 28 0.56
3 8 0.16 36 0.72
4 5 0.10 41 0.82
5 4 0.08 45 0.90
6 3 0.06 48 0.96
8 1 0.02 49 0.98
10 1 0.02 50 1
Total 50 1
Grouping by classes
Example 3 cont.:
I 16%
I 36
I 3 + 1 + 1 or 50 − 45 = 5
I 16% + 10% + 8% = 34% or (8 + 5 + 4)/50 = 34%
I 2% + 2% = 4% or 100% − 96% = 4%
I 56%
Grouping by class intervals: continuous (and discrete) data
Note:
I Left end-point is included, but right end-point is excluded (typical
convention)
I Reverse end-point convention can be applied - check your software
for definition
I Useful for tabulating discrete data if X takes many values
Example 4 cont.:
Class Interval Midpoint ni fi Ni Fi
[10, 20) 15 3 0.15 3 0.15
[20, 30) 25 6 0.30 9 0.45
[30, 40) 35 5 0.25 14 0.70
[40, 50) 45 4 0.20 18 0.90
[50, 60] 55 2 0.10 20 1
Total 20 1