You are on page 1of 66

STATISTICS FOR SOCIAL

SCIENCES

Dr. Yousaf Hayat

STATISTICS
There are different definitions of Statistics and each researcher has
defined it in their own terms. For example;

Statistics is a science of sampling and estimation.

Statistics is a science of probability.

Statistics is a science of collecting information/data.

Statistics is a science of presentation of data either in qualitative


or quantitative form.

Statistics is a science of analyzing the data.

Statistics is a science of collection, presentation, analysis and

interpretation of numerical data.

Dr. Yousaf Hayat

STATISTICS FOR SOCIAL SCIENCES


A type of Statistics that deals with all those statistical methods and techniques used
in finding solution (s) for different social problems. OR
Statistics deals with procedures used to collect, presents, analyze and interpret the
information pertaining to social problems of a society.
For example, impact of different attributes on the study behaviour of male and
female students, parents attitude towards female education, effect of organizational
justice on the job satisfaction, and effect of different marketing policies on the sale of
a commodity, All these can be answered if one have data in hands and then using
appropriate statistical methods for their analysis to find the solution, keeping in mind
the nature of data.
A variety of statistical methods are available to answer different business related
problems.

Dr. Yousaf Hayat

TYPES OF STATISTICS
STATISTICS

Descriptive Statistics

Presentation of Data
(Graphs and Diagrams)
Tabulation and
Classification

Inferential Statistics

Measures of Central
Tendency and
Dispersion

Estimation of
Parameters

Dr. Yousaf Hayat

Testing of Hypothesis

TYPES OF STATISTICS
Descriptive Statistics
Descriptive statistics deals with concepts and methods related with the
summarization and description of the important aspects of numerical data. It
consists of condensation of data, their graphical displays and computation of
numerical quantities that can provide information about the centre and
spreadness of observations of a data set.
Inferential Statistics
Inferential statistics deals with methods and procedures used for drawing
inferences about the true but unknown characteristics of a population based on
the sample data derived from the same population. Inferential statistics can be
further classified into estimation of parameters and testing of hypothesis.

Dr. Yousaf Hayat

SOME BASIC CONCEPTS


Population: An aggregate or totality having some common characteristics of
interest is called population. It is also called Universe. For example,
total number of students enrolled in im|Sciences-Peshawar, total
number of markets in Peshawar city, total number of banks in
Hayatabad, number of industries in the province, monthly/yearly sales of the
stores in Peshawar district, etc.
There are different types of population e.g. finite and infinite population,
homogeneous and heterogeneous population etc.
Sample: A small representative part of population is called sample. For
example, a small portion/part of students of im|Sciences-Peshawar will
constitute a sample of students. Similarly, a randomly/purposively
selected number of markets from a bulk of markets is called a sample
of markets.

Dr. Yousaf Hayat

SOME BASIC CONCEPTS


Population vs Sample
Population

Sample

a b cd
ef gh i jk l m n
o p q rs t u v w
x y
z

Measures used to describe a


population are called parameters

Dr. Yousaf Hayat

a d
ef i m n
oq uv w
xz

Measures computed from


sample data are called
statistics

SOME BASIC CONCEPTS-Contd.


Parameter: Any numerical quantity like mean, standard deviation etc
computed/obtained from population data is known as parameter. For example,
average monthly/yearly sale of all the stores located in district Peshawar etc.
Parameters are generally used to specify the distribution of data.
Statistic: Any numerical quantity like mean, standard deviation etc computed
from sample data is called statistic. For example, the average GPA of the 50
students that are selected from a population of 300 students. Similarly, the
average sale of 100 stores instead of 1000 stores etc are the examples of
statistic(s).

Dr. Yousaf Hayat

VARIABLE AND ITS TYPES


Variable
Any characteristic of interest which takes on different values is
called variable. For example, price of a commodity at different
places in Peshawar city, profit of a business firm at different
months of a year, production, cost, temperature, sale of a market,
consumption etc. Variable is broadly divided into qualitative and
quantitative variables.

Dr. Yousaf Hayat

SOME BASIC CONCEPTS-Contd.

Variable

Qualitative Variable

Categorical Variable

Quantitative Variable

Discrete Variable

Dr. Yousaf Hayat

Continuous Variable

10

SOME BASIC CONCEPTS-Contd.


Qualitative and Quantitative Variables
A variable is defined to be qualitative which is not capable of
numerical measurement but one can feel the presence or
absence of a particular phenomena. For example, honesty,
beauty, race, like and dislike, pass or fail, gender classification
etc.
A variable is defined to be quantitative which is capable of
numerical measurement. For example, cost of production, price
of a commodity, monthly consumption of households etc.

Dr. Yousaf Hayat

11

SOME BASIC CONCEPTS-Contd.


Discrete and Continuous Variables
A variable is said to be discrete if it takes isolated integral values or a
variable which take the values on jumps is called a discrete variable. For
example, number of rooms in a house, number of students in the class,
number of Banks in different cities, size of a household, number of shops
in a market etc.
A type of variable which takes all possible values with in a given
interval/range (a, b). For example, consumption, production, temperature,
monthly sale of a market, height, weight and age etc.

Dr. Yousaf Hayat

12

SOME BASIC CONCEPTS-Contd.

Other Variables used in Research


1.

Dependent Variable

2.

Independent Variable

3.

Dummy Variable/Categorical Variable etc.

Dr. Yousaf Hayat

13

Dependent and Independent Variables


A type of variable which is influenced by other variable/variables is called dependent
variable. It is also called random or stochastic variable. OR
A variable which depends on one or more other variables is called dependent
variable. OR
A variable of primary interest that lends itself for investigation as a function of other
cause variables is known as dependent variable.
For example, in economics, consumption of a commodity (say apple) depends upon
the income, household size, and price etc of the commodity. In this example,
consumption of apple is a dependent variable which will vary from one family
to other family; while the other variables like income, household size and price
are independent variables.
A variable which influence a dependent variable in either direction (positive or
negative) is called independent variable.

Dr. Yousaf Hayat

14

DATA AND
ITS
COLLECTION

Dr. Yousaf Hayat

15

Meaning and Purpose of Data


Data means observations or evidences. OR, the raw facts and figures/collection of meaningful
information is called data. Data are both qualitative and quantitative in nature.
The data are needed in a research work to serve the following purposes:
1. Quality of data determines the quality of research.
2. It provides a direction and answer to a research inquiry. Data are very essential for
conducting a research.
3. The main purpose of data collection is to verify the hypotheses.
4. Data are necessary to provide the solution of the problem.
5. Data are also employed to ascertain the effectiveness of new device for its practical utility.
6. Statistical data are used in two basic problems of any investigation:
(a). Estimation of population parameters, which helps in drawing generalization about the
population characteristics.
(b). The hypotheses of any investigation are tested with the help of data.

Dr. Yousaf Hayat

16

Sources of Data

Data Sources
Print or Electronic
Observation

Survey

Experimentation
Dr. Yousaf Hayat

17

Types of Data

Primary Data

The data which is collected for the first time from its source, is
called primary data.

Secondary Data

When the primary data is passed through any sort of statistical or


mathematical treatment, the data is known as secondary data.

OR, the data that are collected and compiled by an outside source
or by someone in the organization who may later provide access to
the data to other users.

Dr. Yousaf Hayat

18

Collection of Data
Collection of Primary Data

Primary data can be collected through:

i.

Direct personal investigation

ii.

Indirect investigation or personal interviews

iii.

Collection through questionnaire

iv.

Collection through enumerators

v.

Collection through local sources

Collection of Secondary Data

Secondary data can be collected from:

i.

Collection from official records

ii.

Collection from semi-official records

Dr. Yousaf Hayat

19

Collection of Primary Data


i.

Direct Personal Investigation

According to this method, the researcher/investigator collect information in


person from the selected respondents. In this method an investigator has a
degree of freedom and open choices of asking a variety of questions
(open ended and closed ended or mixed). The data collected through this
method is complete, however, personal bias can be present due to
personal involvement of the investigator. Also this method is very costly
and time consuming.
ii.

Indirect Investigation

In some cases, it is not possible to take direct information from the respondents
due to certain limitations. So, in such a circumstances indirect
investigation is carried out by involving a third party for collecting the
required information. This method is useful in conducting the inquiries or
the information are the information required are complex.

Dr. Yousaf Hayat

20

Collection of Primary Data


iii.

Collection through questionnaire

In this method, a list of questions (called questionnaire) is prepared by the


researcher/investigator covering all aspects of the study being required. A list
of questions is send to the respondents through mail or email with a request to
send back after answering all the listed questions. In this method it is possible
that the respondents keep some of the questions blank due to no understanding
or dont want to give information about those questions. In addition, some of
the respondents are not willing at all to given any of the information that are
contained in the questionnaire.
iv.

Collection through enumerators

According to this method, trained peoples are send to the area under study for
collecting information on a pre-specified performa. Information collected
through this method will be more useful as compared to the questionnaire
method. In this method, the enumerators can take information from the
respondents directly or may be his/her closed relatives (if not available on the
spot).

Dr. Yousaf Hayat

21

Collection of Primary Data


V.

Collection through local sources

As the name indicates that by using this method, data are collected through local
sources. Local sources means that information are not directly collected from
the respondents but the desired information are collected from the people
belong the area about which information are required.

Dr. Yousaf Hayat

22

Time Series Data

The data collected at different interval

Time Series data of a company showing


profit, production and sale.
Year

Profit

Producti
on

1990

12

120

110

1991

13

140

132

organization/firm) is called time

1992

14

150

145

series data. For example,

1993

13.5

140

123

1994

10

103

90

1995

11

115

100

1996

12.5

123

122

1997

13.8

140

135

1998

15

160

145

2000

11.6

120

115

2001

15

162

150

2002

16

165

145

of time regarding a commodity or


group

of

commodities

(or

Dr. Yousaf Hayat

23

Sale

Cross Sectional Data

Widely dispersed data (such as) relating to


one period, or data related to households,
data collected from the field survey i.e.
monthly profit of the selected stores, or
monthly profit of different companies
related to only one period etc

Cross sectional data of 12 different


households showing profit, production
and sale.
Household

Profit

Producti
on

12

120

110

13

140

132

14

150

145

13.5

140

123

10

103

90

11

115

100

12.5

123

122

13.8

140

135

15

160

145

10

11.6

120

115

11

15

162

150

12

16

165

145

Dr. Yousaf Hayat

24

Sale

Panel Data

Panel data is data from a (usually

Panel data of three different firms showing,


production and sale during 2000-2002.
Firm

Year

Producti
on

over time on a (usually large)

2000

120

110

number of cross-sectional units

2001

140

132

2002

150

145

2000

140

123

2001

103

90

2002

115

100

2000

123

122

2001

140

135

2002

160

145

2000

120

115

2001

162

150

2002

165

145

small) number of observations

like

individuals,

households,

firms, or governments.

Dr. Yousaf Hayat

25

Sale

Data Collection Tools

Primary Data

Direct personal investigation (Schedule)

Questionnaire methods

Through Enumerators etc.

Secondary Data

Printed materials

Beauru of Statistics

State Bank of Pakistan

Commercial and Research Journals

Business trades etc

Dr. Yousaf Hayat

26

Frequency
Repetition of an observation in a data set is called frequency of that particular
observation/data point/individual. OR
Total number of observations in a class is called the frequency of that class. For
example, consider the following data showing the monthly salaries of 50
employees of a certain University. In this example, 20 is the frequency of
the class (employees) having salary Rs. 40, 000 per month, and 3 is the
frequency of the employees drawing Rs. 90,000 per month salary.

Salary (000)

40

50

60

70

80

90

Number of
employees

20

10

Dr. Yousaf Hayat

27

Class Boundaries: In a grouped frequency distribution, if upper limit of a


class is repeated as a lower limit of the next class, such classes are called
class boundaries. For example, consider the following data set:
Salary (000)
Number of
employees

5-10 10-15
20

10

15-20

20-25

25-30

30-35

Class Limits: In a grouped frequency distribution, if upper limit of a class is


not repeated as a lower limit of the next class, such classes are called
class limits. For example, consider the following data set:
Salary (000)

5-9

10-14

15-19

20-24

25-29

30-34

Number of
employees

20

10

Dr. Yousaf Hayat

28

How to convert class limits in to


class boundaries:

Classes

Class
Boundaries

5-9

4.5-9.5

10-14

9.5-14.5

15-19

14.5-19.5

20-24

19.5-24.5

25-29

24.5-29.5

30-34

29.5-34.5

In the given data, classes shows


class limits. To convert class
limits in to class boundaries,
calculate the mid-way-value
as: = (10-9)/2 = 0.5
Now subtract 0.5 from each of the
lower limit of the class, and
add 0.5 to each of the upper
class limits. See the example
for further understanding.

Dr. Yousaf Hayat

29

PRESENTATION OF DATA
Diagrams and Graphs

1.

2.

Simple Bar Diagram

Multiple Bar Diagram

Component Bar Diagram

Pie Chart

Frequency Curves

Graphical Display of Data


Tabulation and Classification

Dr. Yousaf Hayat

30

Frequency Distribution
Arrangement of data in to different classes or group in such a way that each class/group has their own
frequency, is called frequency distribution. For example, the following data shows the frequency
distribution of the salary of 50 employees of a firm. This frequency distribution is called discrete
frequency distribution.

Salary (000)

40

50

60

70

80

90

Number of employees

20

10

Where as, the data below indicate the grouped/continuous frequency distribution
of the amount of salary of 50 employees.

Salary (000)

5-9

10-14

15-19

20-24

25-29

30-34

Number of employees

20

10

Dr. Yousaf Hayat

31

PRESENTATION OF DATA: DIAGRAMS

City Name

No. of Industries

No. of Banks

Peshawar

50

35

Islamabad

40

45

Karachi

120

90

Lahore

70

55

Faisalabad

90

30

Quetta

15

Dr. Yousaf Hayat

32

Simple Bar Diagram:

Figure: Summary of the number of industries in different cities of Pakistan

Dr. Yousaf Hayat

33

Multiple Bar Diagram:

Figure: Summary of the number of industries and Banks in different cities of Pakistan

Dr. Yousaf Hayat

34

Component Bar Diagram:

Figure: Summary of the number of industries and Banks in different cities of Pakistan

Dr. Yousaf Hayat

35

Pie Chart

Figure: Summary of the number of industries in different cities of Pakistan

Dr. Yousaf Hayat

36

A line graph of grade point average

Dr. Yousaf Hayat

GPA

Number of Students

2.8

2.9

18

3.1

14

3.2

20

3.3

40

3.4

35

3.5

25

3.6

13

3.7

3.8

3.9

37

Scatter plot of GPA and the Starting Salary

Dr. Yousaf Hayat

GPA

Starting Salary

2.8

20000

2.9

20000

25000

3.1

30000

3.2

15000

3.3

26000

3.4

30000

3.5

24000

3.6

18000

3.7

30000

3.8

45000

3.9

35000

38

Statistical Tools and Methods

Measures of Central Tendency

Measures of Dispersion

Reliability Analysis

Inferential Statistics

Mean Comparison (statistical tests)

Analysis of Variance (ANOVA)

Tests of Association

Regression and Correlation Analysis

Non-parametric Tests

Dr. Yousaf Hayat

39

SIGNIFICANCE OF STATISTICS

Statistical methods are used for summarization of a large set of data.

Statistical methods are used for analyzing the data related to filed and lab experiments.

Statistical methods are used for conducting sampling surveys, also the data coming from
surveys can be analyzed by using statistical methods to find solutions of the problems
under study.

Statistical methods are helpful in effective planning in any field of inquiry.

Statistical methods are used in each and every filed of scientific discipline like
agriculture, business, medical, biological, genetics, physical and social sciences etc.

Banks, insurance companies, government and semi-government organizations, are using


statistical techniques as a tool for data analysis.

Statistics helps in drawing general conclusions about the characteristics of a


population/aggregate on the basis of sample data;

Statistical methods are also helpful in making prediction (forecasting).

Dr. Yousaf Hayat

40

MEASURES OF CENTRAL TENDENCY


A data set can be summarized into a single value, usually lies somewhere in
the centre and represent the whole data set. Such a single value that
represent the central part of a data set is called central value. Tendency of
observations that cluster in the central part of the data set is called central
tendency. Most commonly used measures of central tendency are given
in the following diagram:
Measures of
Central
Tendency

Arithmetic
Mean

Median

Mode

Dr. Yousaf Hayat

Harmonic
Mean

41

Geometric
Mean

MEASURES OF CENTRAL TENDENCY


Arithmetic Mean
Simply it is called mean or average and mostly used measure of central
tendency in every field of research. Arithmetic mean is a value obtained
by dividing the sum of all observations in a data set by the number of
observations.

Sum of all observations


Mean =
Total number of observations

Dr. Yousaf Hayat

42

Mathematical Description of Arithmetic Mean


Mathematically, Arithmetic mean is expressed as
N

x x2 K x N
= 1
=
N

x
i 1

[Population data]

xi

x1 x2 K xn
X =
= i 1
[Sample data]
n
n
Example: The following data shows the consumption (in thousand of Rs.) of 9 MBA
students per semester in a certain University, compute arithmetic mean and interpret the
result. The data is: 39, 36, 48, 36, 41, 37, 32, 46 and 45.
n

X =

x
i 1

39 36 48 36 41 37 32 46 45
360

40
9
9

It indicates that on the average, each MBA student is consuming Rs. 40,000
per semester.

Dr. Yousaf Hayat

43

Arithmetic Mean for Grouped Data


Mathematically, Arithmetic mean for grouped data (frequency distribution )
is expressed as

f1 x1 f 2 x2 K f n xn
X =
f1 f 2 K f n
n

fx
f
i 1

fx

, where f
f

n (total number of observations)

where, f1 , f 2 , K , f n are corresponding frequencies


of x1 , x2 ,K , xn

Dr. Yousaf Hayat

44

Example: Consider the following frequency distribution of the salaries of 50 employees


of a certain University , compute arithmetic mean.

Salary (000)[x]

40

50

60

70

80

90

Total

Number of employees [f]

20

10

50

fx

800

500

480

350

320

270

2720

X =

fx
i 1

2720

54.4
50

It shows that each employee of the University has 54.4 (thousand) salary, on the average.

Dr. Yousaf Hayat

45

Example: Using the following data showing the profits (in thousand of Rs.) of 60
different industries, calculate the mean profit (average profit) of the industries.
Profit (000)

65-84 85-104 105-124 125-144 145-164 165-184 185-204 Total

Number of
industries

10

17

10

60

To compute the mean, first we convert the class intervals into mid points (X)
Profit (000)

65-84 85-104 105-124 125-144 145-164 165-184 185-204

Number of industries (f)

10

17

10

Mid point (X)

74.5

94.5

114.5

134.5

154.5

174.5

194.5

fx

670.5

945

1946.5

1345

772.5

698

972.5

X =

fx
f
i 1

7350
122.5
60

Dr. Yousaf Hayat

46

Total
60
7350

DEFINITIONS

Class Boundaries

In a grouped frequency distribution, if upper limit of a class is repeated as a lower limit


of the next class, such classes are called class boundaries. For example, the classes
5-10, 10-15, 15-20, 20-25, 25-30 and 30-35 are the class boundaries.

Class Limits
In a grouped frequency distribution, if upper limit of a class is not repeated as a lower
limit of the next class such as, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34 and 35-39 are
called class limits. These class limits can be converted in to class boundaries by
taking the following steps:
1.

Compute the mid-way value

= (upper limit of the exceeding class lower limit of preceding class)/2 =(10-9).2 = 0.5
2. Now subtract mid-way value from each of the lower class limit and add it with the
upper class limit. In such a case the class/groups so formed are become class
boundaries.

Dr. Yousaf Hayat

47

Mid Point
Mid point is the average of upper and lower class limit/boundary of a particular
class. The following example illustrate the mid point.
Profit (000)
Mid point (X)

65-84

85-104

105-124

125-144

(65+84)/2
= 74.5

(85+104)/2
= 94.5

(105+124)/2
= 114.5

(125+144)/2
= 134.5

Mid point is helpful in the calculation of various statistical measures i.e.


measures of central tendency and measures of dispersion.

Dr. Yousaf Hayat

48

Properties of Arithmetic Mean


1. Sum of deviations taken from mean is zero i.e.

(X X ) 0

2. Sum of squared deviations of xs from mean is minimum i.e.


2
2
(
X

X
)

(
X

a
)

3. In case of k-groups containing n1, n2, ,nk observations with means

x1 , x2 ,..., xk

respectively, then the combined mean (mean of all the groups can be calculated as:
k

ni xi
n1 x1 n2 x2 ... nk xk
x
i 1k
n1 n2 ... nk
ni
i 1

4. If two variables X and Y are linearly related as


Can be computed as:

y a bx

then the mean of Y

y a bx

where, "a " and "b " are constant values.

Dr. Yousaf Hayat

49

MODE
Mode is a value which has maximum frequency as compared to other items of a data
set. OR, the most frequent value of a data set is called mode.
A distribution/data set having only one mode is called uni-modal distribution. Similarly,
a distribution is defined to be bi-modal if it has two modes. Generally, a
distribution having more than one modes is called multi-modal distribution. For
example:
a). 2, 4, 6, 4, 8, 10 (mode = 4)
b). 2, 4, 6, 4, 8, 10, 8 (mode = 4 and 8)
c). 2, 4, 6, 4, 8, 10, 8, 10 (mode = 4, 8 and 10 )
If all the observations of a data set have the same frequencies (repeated the same
number of times), the data set will have no mode. For example: 2, 4, 6, 4, 8, 10, 8,
10, 6: this data set has no mode because each and every observation is repeated the
same number of times.
Mode is the appropriate average for qualitative/nominal data.

Dr. Yousaf Hayat

50

Uni-modal distribution

Bi-modal distribution

Tri-modal
distribution

Dr. Yousaf Hayat

51

MODE FOR CONTINEOUS SERIES

For continuous series/grouped frequency distribution, mode is defined as:

Mode = l +

( f m f1 ) h
(2 f m f1 f 2 )

l = lower limit of the modal group


f m frequency of the modal group
f1 frequency preceeding the modal group
f 2 frequency exceeding the modal group
h = width of class
Mode lies in the group (14.5-19.5) as it has
maximum frequency, so

(20 8) 5
60
14.5
(2 20 8 5)
27
14.5 2.22 16.72 Dr. Yousaf Hayat

Consumption

4.5-9.5

10

9.5-14.5

14.5-19.5

20

19.5-24.5

24.4-29.5

29.5-34.5

Total

50

Mode = 14.5 +

52

MEDIAN

Median is a value which divide and arranged data set into two equal parts i.e.
half (50%) of the observations will lies below and half (50%) will come
above that value.
For example: what will be the median of the following data showing weekly
profit (000) of seven stores as: 10, 20, 15, 13, 14, 9 and 12.
Arranged data (increasing order): 9, 10, 12, 13, 14, 15, 16
Median = 13
Similarly, for the data set having the size (even number) divisible by 2, median
will be the average of two middle values, for example:

9, 10, 12, 13, 14, 15, 16, 20 (here n = 8) so

Median = (13+14)/2 = 13.5


Dr. Yousaf Hayat

53

Numerical Examples-Continuous Frequency Distribution


The following data shows the frequency distribution of the salary of 50
employees of a firm. Calculate the following
1.

Arithmetic mean

2.

Median

Salary (000)

5-9

10-14

15-19

20-24

25-29

30-34

Number of
employees

20

10

Dr. Yousaf Hayat

54

To calculate the required quantities, we take the following steps:


Salary (000)

Class boundary

cf

fX

5-9

4.5-9.5

20

20

140

10-14

9.5-14.5

10

30

12

120

15-19

14.5-19.5

38

17

136

20-24

19.5-24.5

43

22

110

25-29

24.4-29.5

47

27

108

30-34

29.5-34.5

50

32

96

50

AM =

fX
f

710

710
14.2
50

For median, n/2 = 50/2 = 25. It implies that median lies in the
group (9.5-14.5), so
Median = l

h
5
(n / 2 c) = 9.5 +
(50 / 2 20) 12
f
10

Dr. Yousaf Hayat

55

QUANTILES
Quartiles, Deciles and Percentiles are collectively called quintiles. Generally,
quantiles are also called measures of position.
Quartiles: The three points which divide an arranged (ascending order) data
set into four equal parts are called quartiles. Quartiles are denoted by Q 1,
Q2 and Q3.
Q1 = lower quartile or first quartile
Q2 = second quartile = median
Q3 = upper quartile or third quartile

25%

25%
Q1

25%
Q2

Dr. Yousaf Hayat

25%
Q3

56

Deciles: The nine points which divide an arranged (ascending order) data set
into 10 equal parts are called deciles. Deciles are denoted by D 1, D2
-----D9.
th
D110%
= first decile,
D
=
second
decile,
.,
D
=
9
2
9
10% 10% 10% 10% 10% decile
10%

D1

D2

D3

D4

D5

D6

10% 10%

D7

D8

10%

D9

Q2
Median

D1 is a value from which 10% observations lies below and 90% lies above;
D2 is a value from which 20% observations lies below and 80% lies above;
----------D9 = 90% observations below and 10% lies above

Dr. Yousaf Hayat

57

Percentiles: Percentiles divide an arranged data set into 100 equal parts. These
are 99 points to do so. Percentiles are denoted by P i ( i = 1,2 , 3, ., 99).

1%

1%
P1

---------------------

P50

---------P75-------

Q2

Q3

P99

Median
D5

P1 is a value from which 1% observations lies below and 99% lies above;
P2 is a value from which 2% observations lies below and 98% lies above;
----------P99 = 99% observations below and 1% lies above

Dr. Yousaf Hayat

58

RELATION AMONG QUANTILES


Q1= P25
Q2 = Median = D5 = P50
P10 = D1;
P20 = D2;
P30 = D3 -----------; P90 = D9
Q3 = P75

Dr. Yousaf Hayat

59

COMPUTATIONAL FORMULAE
h jn
Q = l + ( c)
f 4
l = lower limit of the j th quartile group
h = width of class
f = frequency of the j th quartile group
c = cumulative frequency preceding the j th quartile group
j

h jn
D = l + ( c)
f 10
l = lower limit of the j th decile group
j

h = width of class
f = frequency of the j th decile group
c = cumulative frequency preceding the j th decile group
Dr. Yousaf Hayat

60

COMPUTATIONAL FORMULA FOR PERCENTILES

h jn
P =l +
(
c)
f 100
j

l = lower limit of the j th percentile group


h = width of class
f = frequency of the j th percentile group
c = cumulative frequency preceding the
j th percentile group

Dr. Yousaf Hayat

61

Examples-Quantiles
Example: The following data shows the frequency distribution of the salary of
50 employees of a firm. Calculate the following:
1.

Median

2.

Lower and upper quartiles

3.

5th and 7th deciles (D5 and D7)

4.

P25, P50 and P75

Also describe the relationship among the computed quantities


Salary (000)
Number of
employees

5-9

10-14

15-19

20-24

25-29

30-34

20

10

Dr. Yousaf Hayat

62

To calculate the required quantities, we take the following steps:


Salary
(000)

Class
boundary

cf

5-9

4.5-9.5

10-14

9.5-14.5

13

15-19

14.5-19.5

20

33

20-24

19.5-24.5

10

43

25-29

24.4-29.5

47

30-34

29.5-34.5

50

50

h n
( c)
f 2
put l 14.5, h = 5, f = 20, c = 13
5
Median = 14.5 +
(25 13)
20
Median = 14.5 + 3 = 17.5
Median = l +

To compute the required quantities,


first we will find the group/class in
which it will lie to occur.
For median: n/2 = 50/2 = 25. It
implies that median lies in the group
(14.5-19.5).
For Q1: n/4 = 50/4 = 12.5. It indicate
the Q1 lies in the group (9.5-14.5).
For Q3: 3n/4 = 3*50/4 = 37.5. It
indicate the Q3 lies in the group
(19.5-24.5).
h n
( c)
f 4
put l 9.5, h = 5, f = 8, c = 5
5
Q1 = 9.5 +
(12.5 5)
8
Q1 = 9.5 + 4.69 = 14.19
Q1 = l +

h
(3n / 4 c); put l = 19.5, h 5, f 10, c 33
f
Q3 = 19.5+5/10(37.5-33) =Dr.
21.75
Yousaf Hayat
Q3 = l

63

For 5th Decile: 5n/10 = 5*50/10 = 25. It implies that D5 lies in the group
(14.5-19.5).

h 5 n
(
c)
f 10
put l 14.5, h= 5, f = 20, c = 13
5
D5 = 14.5 +
(25 13)
20
D5 = 14.5 + 3 = 17.5
D5 = l +

For 7th Decile: 7n/10 = 7*50/10 = 35. It implies that D 5 lies in the group
(19.5-24.5 ).
h 7n
D7 = l +
(
c)
f 10
put l 19.5, h= 5, f = 10, c = 33
5
D7 = 19.5 +
(35 33)
10
D7 = 19.5 + 1 = 20.5

Dr. Yousaf Hayat

64

For P25: 25n/100 = 25*50/100 = 12.5. It


implies that P25 lies in the group (9.514.5).
h 25 n
(
c)
f
100
put l 9.5, h= 5, f = 8, c = 5

For P75: 75*n/100 = 75*50/100 =


37.5. It indicate the P75 lies in the
group (19.5-24.5).

P25 = l +

5
(12.5 5)
8
P25 = 9.5 + 4.69 = 14.19
P25 = 9.5 +

For P50: 50*n/100 = 50*50/100 = 25. It


implies that P50 lies in the group (14.5-19.5).
P50 = l +

h 50 n
(
c)
f
100

h
(75 n /100 c);
f
put l = 19.5, h 5, f 10, c 33
P75 = l

P75 = 19.5+5/10(37.5-33) = 21.75


From the analyses, it has been
verified that:
Q2 = Median = D5 = P50

put l 14.5, h = 5, f = 20, c = 13

Q1= P25

5
(25 13)
20
P50 = 14.5 + 3 = 17.5

Q3 = P75

P50 = 14.5 +

Dr. Yousaf Hayat

65

Q1= P25
Q2 = Median = D5 = P50
P10 = D1;
P20 = D2;
P30 = D3 -----------; P90 = D9
Q3 = P75

Dr. Yousaf Hayat

66

You might also like