Professional Documents
Culture Documents
SCIENCES
STATISTICS
There are different definitions of Statistics and each researcher has
defined it in their own terms. For example;
TYPES OF STATISTICS
STATISTICS
Descriptive Statistics
Presentation of Data
(Graphs and Diagrams)
Tabulation and
Classification
Inferential Statistics
Measures of Central
Tendency and
Dispersion
Estimation of
Parameters
Testing of Hypothesis
TYPES OF STATISTICS
Descriptive Statistics
Descriptive statistics deals with concepts and methods related with the
summarization and description of the important aspects of numerical data. It
consists of condensation of data, their graphical displays and computation of
numerical quantities that can provide information about the centre and
spreadness of observations of a data set.
Inferential Statistics
Inferential statistics deals with methods and procedures used for drawing
inferences about the true but unknown characteristics of a population based on
the sample data derived from the same population. Inferential statistics can be
further classified into estimation of parameters and testing of hypothesis.
Sample
a b cd
ef gh i jk l m n
o p q rs t u v w
x y
z
a d
ef i m n
oq uv w
xz
Variable
Qualitative Variable
Categorical Variable
Quantitative Variable
Discrete Variable
Continuous Variable
10
11
12
Dependent Variable
2.
Independent Variable
3.
13
14
DATA AND
ITS
COLLECTION
15
16
Sources of Data
Data Sources
Print or Electronic
Observation
Survey
Experimentation
Dr. Yousaf Hayat
17
Types of Data
Primary Data
The data which is collected for the first time from its source, is
called primary data.
Secondary Data
OR, the data that are collected and compiled by an outside source
or by someone in the organization who may later provide access to
the data to other users.
18
Collection of Data
Collection of Primary Data
i.
ii.
iii.
iv.
v.
i.
ii.
19
Indirect Investigation
In some cases, it is not possible to take direct information from the respondents
due to certain limitations. So, in such a circumstances indirect
investigation is carried out by involving a third party for collecting the
required information. This method is useful in conducting the inquiries or
the information are the information required are complex.
20
According to this method, trained peoples are send to the area under study for
collecting information on a pre-specified performa. Information collected
through this method will be more useful as compared to the questionnaire
method. In this method, the enumerators can take information from the
respondents directly or may be his/her closed relatives (if not available on the
spot).
21
As the name indicates that by using this method, data are collected through local
sources. Local sources means that information are not directly collected from
the respondents but the desired information are collected from the people
belong the area about which information are required.
22
Profit
Producti
on
1990
12
120
110
1991
13
140
132
1992
14
150
145
1993
13.5
140
123
1994
10
103
90
1995
11
115
100
1996
12.5
123
122
1997
13.8
140
135
1998
15
160
145
2000
11.6
120
115
2001
15
162
150
2002
16
165
145
of
commodities
(or
23
Sale
Profit
Producti
on
12
120
110
13
140
132
14
150
145
13.5
140
123
10
103
90
11
115
100
12.5
123
122
13.8
140
135
15
160
145
10
11.6
120
115
11
15
162
150
12
16
165
145
24
Sale
Panel Data
Year
Producti
on
2000
120
110
2001
140
132
2002
150
145
2000
140
123
2001
103
90
2002
115
100
2000
123
122
2001
140
135
2002
160
145
2000
120
115
2001
162
150
2002
165
145
like
individuals,
households,
firms, or governments.
25
Sale
Primary Data
Questionnaire methods
Secondary Data
Printed materials
Beauru of Statistics
26
Frequency
Repetition of an observation in a data set is called frequency of that particular
observation/data point/individual. OR
Total number of observations in a class is called the frequency of that class. For
example, consider the following data showing the monthly salaries of 50
employees of a certain University. In this example, 20 is the frequency of
the class (employees) having salary Rs. 40, 000 per month, and 3 is the
frequency of the employees drawing Rs. 90,000 per month salary.
Salary (000)
40
50
60
70
80
90
Number of
employees
20
10
27
5-10 10-15
20
10
15-20
20-25
25-30
30-35
5-9
10-14
15-19
20-24
25-29
30-34
Number of
employees
20
10
28
Classes
Class
Boundaries
5-9
4.5-9.5
10-14
9.5-14.5
15-19
14.5-19.5
20-24
19.5-24.5
25-29
24.5-29.5
30-34
29.5-34.5
29
PRESENTATION OF DATA
Diagrams and Graphs
1.
2.
Pie Chart
Frequency Curves
30
Frequency Distribution
Arrangement of data in to different classes or group in such a way that each class/group has their own
frequency, is called frequency distribution. For example, the following data shows the frequency
distribution of the salary of 50 employees of a firm. This frequency distribution is called discrete
frequency distribution.
Salary (000)
40
50
60
70
80
90
Number of employees
20
10
Where as, the data below indicate the grouped/continuous frequency distribution
of the amount of salary of 50 employees.
Salary (000)
5-9
10-14
15-19
20-24
25-29
30-34
Number of employees
20
10
31
City Name
No. of Industries
No. of Banks
Peshawar
50
35
Islamabad
40
45
Karachi
120
90
Lahore
70
55
Faisalabad
90
30
Quetta
15
32
33
Figure: Summary of the number of industries and Banks in different cities of Pakistan
34
Figure: Summary of the number of industries and Banks in different cities of Pakistan
35
Pie Chart
36
GPA
Number of Students
2.8
2.9
18
3.1
14
3.2
20
3.3
40
3.4
35
3.5
25
3.6
13
3.7
3.8
3.9
37
GPA
Starting Salary
2.8
20000
2.9
20000
25000
3.1
30000
3.2
15000
3.3
26000
3.4
30000
3.5
24000
3.6
18000
3.7
30000
3.8
45000
3.9
35000
38
Measures of Dispersion
Reliability Analysis
Inferential Statistics
Tests of Association
Non-parametric Tests
39
SIGNIFICANCE OF STATISTICS
Statistical methods are used for analyzing the data related to filed and lab experiments.
Statistical methods are used for conducting sampling surveys, also the data coming from
surveys can be analyzed by using statistical methods to find solutions of the problems
under study.
Statistical methods are used in each and every filed of scientific discipline like
agriculture, business, medical, biological, genetics, physical and social sciences etc.
40
Arithmetic
Mean
Median
Mode
Harmonic
Mean
41
Geometric
Mean
42
x x2 K x N
= 1
=
N
x
i 1
[Population data]
xi
x1 x2 K xn
X =
= i 1
[Sample data]
n
n
Example: The following data shows the consumption (in thousand of Rs.) of 9 MBA
students per semester in a certain University, compute arithmetic mean and interpret the
result. The data is: 39, 36, 48, 36, 41, 37, 32, 46 and 45.
n
X =
x
i 1
39 36 48 36 41 37 32 46 45
360
40
9
9
It indicates that on the average, each MBA student is consuming Rs. 40,000
per semester.
43
f1 x1 f 2 x2 K f n xn
X =
f1 f 2 K f n
n
fx
f
i 1
fx
, where f
f
44
Salary (000)[x]
40
50
60
70
80
90
Total
20
10
50
fx
800
500
480
350
320
270
2720
X =
fx
i 1
2720
54.4
50
It shows that each employee of the University has 54.4 (thousand) salary, on the average.
45
Example: Using the following data showing the profits (in thousand of Rs.) of 60
different industries, calculate the mean profit (average profit) of the industries.
Profit (000)
Number of
industries
10
17
10
60
To compute the mean, first we convert the class intervals into mid points (X)
Profit (000)
10
17
10
74.5
94.5
114.5
134.5
154.5
174.5
194.5
fx
670.5
945
1946.5
1345
772.5
698
972.5
X =
fx
f
i 1
7350
122.5
60
46
Total
60
7350
DEFINITIONS
Class Boundaries
Class Limits
In a grouped frequency distribution, if upper limit of a class is not repeated as a lower
limit of the next class such as, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34 and 35-39 are
called class limits. These class limits can be converted in to class boundaries by
taking the following steps:
1.
= (upper limit of the exceeding class lower limit of preceding class)/2 =(10-9).2 = 0.5
2. Now subtract mid-way value from each of the lower class limit and add it with the
upper class limit. In such a case the class/groups so formed are become class
boundaries.
47
Mid Point
Mid point is the average of upper and lower class limit/boundary of a particular
class. The following example illustrate the mid point.
Profit (000)
Mid point (X)
65-84
85-104
105-124
125-144
(65+84)/2
= 74.5
(85+104)/2
= 94.5
(105+124)/2
= 114.5
(125+144)/2
= 134.5
48
(X X ) 0
X
)
(
X
a
)
x1 , x2 ,..., xk
respectively, then the combined mean (mean of all the groups can be calculated as:
k
ni xi
n1 x1 n2 x2 ... nk xk
x
i 1k
n1 n2 ... nk
ni
i 1
y a bx
y a bx
49
MODE
Mode is a value which has maximum frequency as compared to other items of a data
set. OR, the most frequent value of a data set is called mode.
A distribution/data set having only one mode is called uni-modal distribution. Similarly,
a distribution is defined to be bi-modal if it has two modes. Generally, a
distribution having more than one modes is called multi-modal distribution. For
example:
a). 2, 4, 6, 4, 8, 10 (mode = 4)
b). 2, 4, 6, 4, 8, 10, 8 (mode = 4 and 8)
c). 2, 4, 6, 4, 8, 10, 8, 10 (mode = 4, 8 and 10 )
If all the observations of a data set have the same frequencies (repeated the same
number of times), the data set will have no mode. For example: 2, 4, 6, 4, 8, 10, 8,
10, 6: this data set has no mode because each and every observation is repeated the
same number of times.
Mode is the appropriate average for qualitative/nominal data.
50
Uni-modal distribution
Bi-modal distribution
Tri-modal
distribution
51
Mode = l +
( f m f1 ) h
(2 f m f1 f 2 )
(20 8) 5
60
14.5
(2 20 8 5)
27
14.5 2.22 16.72 Dr. Yousaf Hayat
Consumption
4.5-9.5
10
9.5-14.5
14.5-19.5
20
19.5-24.5
24.4-29.5
29.5-34.5
Total
50
Mode = 14.5 +
52
MEDIAN
Median is a value which divide and arranged data set into two equal parts i.e.
half (50%) of the observations will lies below and half (50%) will come
above that value.
For example: what will be the median of the following data showing weekly
profit (000) of seven stores as: 10, 20, 15, 13, 14, 9 and 12.
Arranged data (increasing order): 9, 10, 12, 13, 14, 15, 16
Median = 13
Similarly, for the data set having the size (even number) divisible by 2, median
will be the average of two middle values, for example:
53
Arithmetic mean
2.
Median
Salary (000)
5-9
10-14
15-19
20-24
25-29
30-34
Number of
employees
20
10
54
Class boundary
cf
fX
5-9
4.5-9.5
20
20
140
10-14
9.5-14.5
10
30
12
120
15-19
14.5-19.5
38
17
136
20-24
19.5-24.5
43
22
110
25-29
24.4-29.5
47
27
108
30-34
29.5-34.5
50
32
96
50
AM =
fX
f
710
710
14.2
50
For median, n/2 = 50/2 = 25. It implies that median lies in the
group (9.5-14.5), so
Median = l
h
5
(n / 2 c) = 9.5 +
(50 / 2 20) 12
f
10
55
QUANTILES
Quartiles, Deciles and Percentiles are collectively called quintiles. Generally,
quantiles are also called measures of position.
Quartiles: The three points which divide an arranged (ascending order) data
set into four equal parts are called quartiles. Quartiles are denoted by Q 1,
Q2 and Q3.
Q1 = lower quartile or first quartile
Q2 = second quartile = median
Q3 = upper quartile or third quartile
25%
25%
Q1
25%
Q2
25%
Q3
56
Deciles: The nine points which divide an arranged (ascending order) data set
into 10 equal parts are called deciles. Deciles are denoted by D 1, D2
-----D9.
th
D110%
= first decile,
D
=
second
decile,
.,
D
=
9
2
9
10% 10% 10% 10% 10% decile
10%
D1
D2
D3
D4
D5
D6
10% 10%
D7
D8
10%
D9
Q2
Median
D1 is a value from which 10% observations lies below and 90% lies above;
D2 is a value from which 20% observations lies below and 80% lies above;
----------D9 = 90% observations below and 10% lies above
57
Percentiles: Percentiles divide an arranged data set into 100 equal parts. These
are 99 points to do so. Percentiles are denoted by P i ( i = 1,2 , 3, ., 99).
1%
1%
P1
---------------------
P50
---------P75-------
Q2
Q3
P99
Median
D5
P1 is a value from which 1% observations lies below and 99% lies above;
P2 is a value from which 2% observations lies below and 98% lies above;
----------P99 = 99% observations below and 1% lies above
58
59
COMPUTATIONAL FORMULAE
h jn
Q = l + ( c)
f 4
l = lower limit of the j th quartile group
h = width of class
f = frequency of the j th quartile group
c = cumulative frequency preceding the j th quartile group
j
h jn
D = l + ( c)
f 10
l = lower limit of the j th decile group
j
h = width of class
f = frequency of the j th decile group
c = cumulative frequency preceding the j th decile group
Dr. Yousaf Hayat
60
h jn
P =l +
(
c)
f 100
j
61
Examples-Quantiles
Example: The following data shows the frequency distribution of the salary of
50 employees of a firm. Calculate the following:
1.
Median
2.
3.
4.
5-9
10-14
15-19
20-24
25-29
30-34
20
10
62
Class
boundary
cf
5-9
4.5-9.5
10-14
9.5-14.5
13
15-19
14.5-19.5
20
33
20-24
19.5-24.5
10
43
25-29
24.4-29.5
47
30-34
29.5-34.5
50
50
h n
( c)
f 2
put l 14.5, h = 5, f = 20, c = 13
5
Median = 14.5 +
(25 13)
20
Median = 14.5 + 3 = 17.5
Median = l +
h
(3n / 4 c); put l = 19.5, h 5, f 10, c 33
f
Q3 = 19.5+5/10(37.5-33) =Dr.
21.75
Yousaf Hayat
Q3 = l
63
For 5th Decile: 5n/10 = 5*50/10 = 25. It implies that D5 lies in the group
(14.5-19.5).
h 5 n
(
c)
f 10
put l 14.5, h= 5, f = 20, c = 13
5
D5 = 14.5 +
(25 13)
20
D5 = 14.5 + 3 = 17.5
D5 = l +
For 7th Decile: 7n/10 = 7*50/10 = 35. It implies that D 5 lies in the group
(19.5-24.5 ).
h 7n
D7 = l +
(
c)
f 10
put l 19.5, h= 5, f = 10, c = 33
5
D7 = 19.5 +
(35 33)
10
D7 = 19.5 + 1 = 20.5
64
P25 = l +
5
(12.5 5)
8
P25 = 9.5 + 4.69 = 14.19
P25 = 9.5 +
h 50 n
(
c)
f
100
h
(75 n /100 c);
f
put l = 19.5, h 5, f 10, c 33
P75 = l
Q1= P25
5
(25 13)
20
P50 = 14.5 + 3 = 17.5
Q3 = P75
P50 = 14.5 +
65
Q1= P25
Q2 = Median = D5 = P50
P10 = D1;
P20 = D2;
P30 = D3 -----------; P90 = D9
Q3 = P75
66