Statistics for Social Sciences Overview

STATISTICS FOR SOCIAL
SCIENCES
Dr. Yousaf Hayat
STATISTICS
There are different definitions of Statistics and each researcher has
defined it in their own terms. For example;
Statistics is a science of sampling and estimation.
Statistics is a science of probability.
Statistics is a science of collecting information/data.
Statistics is a science of presentation of data either in qualitative

or quantitative form.
Statistics is a science of analyzing the data.
Statistics is a science of collection, presentation, analysis and
interpretation of numerical data.
Dr. Yousaf Hayat
STATISTICS FOR SOCIAL SCIENCES

A type of Statistics that deals with all those statistical methods and techniques used
in finding solution (s) for different social problems. OR
Statistics deals with procedures used to collect, presents, analyze and interpret the
information pertaining to social problems of a society.
For example, impact of different attributes on the study behaviour of male and
female students, parents attitude towards female education, effect of organizational
justice on the job satisfaction, and effect of different marketing policies on the sale of
a commodity, All these can be answered if one have data in hands and then using
appropriate statistical methods for their analysis to find the solution, keeping in mind
the nature of data.
A variety of statistical methods are available to answer different business related
problems.
Dr. Yousaf Hayat
TYPES OF STATISTICS
STATISTICS
Descriptive Statistics
Presentation of Data
(Graphs and Diagrams)
Tabulation and
Classification
Inferential Statistics
Measures of Central
Tendency and
Dispersion
Estimation of
Parameters
Dr. Yousaf Hayat
Testing of Hypothesis
TYPES OF STATISTICS
Descriptive Statistics
Descriptive statistics deals with concepts and methods related with the
summarization and description of the important aspects of numerical data. It
consists of condensation of data, their graphical displays and computation of
numerical quantities that can provide information about the centre and
spreadness of observations of a data set.
Inferential statistics deals with methods and procedures used for drawing
inferences about the true but unknown characteristics of a population based on
the sample data derived from the same population. Inferential statistics can be
further classified into estimation of parameters and testing of hypothesis.
Dr. Yousaf Hayat
SOME BASIC CONCEPTS

Population: An aggregate or totality having some common characteristics of
interest is called population. It is also called Universe. For example,
total number of students enrolled in im|Sciences-Peshawar, total
number of markets in Peshawar city, total number of banks in
Hayatabad, number of industries in the province, monthly/yearly sales of the
stores in Peshawar district, etc.
There are different types of population e.g. finite and infinite population,
homogeneous and heterogeneous population etc.
Sample: A small representative part of population is called sample. For
example, a small portion/part of students of im|Sciences-Peshawar will
constitute a sample of students. Similarly, a randomly/purposively
selected number of markets from a bulk of markets is called a sample
of markets.
Dr. Yousaf Hayat
SOME BASIC CONCEPTS

Population vs Sample
Population
Sample
a b cd
ef gh i jk l m n
o p q rs t u v w
x y
z
Measures used to describe a

population are called parameters
Dr. Yousaf Hayat
a d
ef i m n
oq uv w
xz
Measures computed from

sample data are called
statistics
SOME BASIC CONCEPTS-Contd.

Parameter: Any numerical quantity like mean, standard deviation etc
computed/obtained from population data is known as parameter. For example,
average monthly/yearly sale of all the stores located in district Peshawar etc.
Parameters are generally used to specify the distribution of data.
Statistic: Any numerical quantity like mean, standard deviation etc computed
from sample data is called statistic. For example, the average GPA of the 50
students that are selected from a population of 300 students. Similarly, the
average sale of 100 stores instead of 1000 stores etc are the examples of
statistic(s).
Dr. Yousaf Hayat
VARIABLE AND ITS TYPES

Variable
Any characteristic of interest which takes on different values is
called variable. For example, price of a commodity at different
places in Peshawar city, profit of a business firm at different
months of a year, production, cost, temperature, sale of a market,
consumption etc. Variable is broadly divided into qualitative and
quantitative variables.
Dr. Yousaf Hayat
Variable
Qualitative Variable
Categorical Variable
Quantitative Variable
Discrete Variable
Dr. Yousaf Hayat
Continuous Variable
10

Qualitative and Quantitative Variables
A variable is defined to be qualitative which is not capable of
numerical measurement but one can feel the presence or
absence of a particular phenomena. For example, honesty,
beauty, race, like and dislike, pass or fail, gender classification
etc.
A variable is defined to be quantitative which is capable of
numerical measurement. For example, cost of production, price
of a commodity, monthly consumption of households etc.
Dr. Yousaf Hayat
11

Discrete and Continuous Variables
A variable is said to be discrete if it takes isolated integral values or a
variable which take the values on jumps is called a discrete variable. For
example, number of rooms in a house, number of students in the class,
number of Banks in different cities, size of a household, number of shops
in a market etc.
A type of variable which takes all possible values with in a given
interval/range (a, b). For example, consumption, production, temperature,
monthly sale of a market, height, weight and age etc.
Dr. Yousaf Hayat
12
Other Variables used in Research

1.
Dependent Variable
2.
Independent Variable
3.
Dummy Variable/Categorical Variable etc.
Dr. Yousaf Hayat
13
Dependent and Independent Variables

A type of variable which is influenced by other variable/variables is called dependent
variable. It is also called random or stochastic variable. OR
A variable which depends on one or more other variables is called dependent
variable. OR
A variable of primary interest that lends itself for investigation as a function of other
cause variables is known as dependent variable.
For example, in economics, consumption of a commodity (say apple) depends upon
the income, household size, and price etc of the commodity. In this example,
consumption of apple is a dependent variable which will vary from one family
to other family; while the other variables like income, household size and price
are independent variables.
A variable which influence a dependent variable in either direction (positive or
negative) is called independent variable.
Dr. Yousaf Hayat
14
DATA AND
ITS
COLLECTION
Dr. Yousaf Hayat
15
Meaning and Purpose of Data

Data means observations or evidences. OR, the raw facts and figures/collection of meaningful
information is called data. Data are both qualitative and quantitative in nature.
The data are needed in a research work to serve the following purposes:
1. Quality of data determines the quality of research.
2. It provides a direction and answer to a research inquiry. Data are very essential for
conducting a research.
3. The main purpose of data collection is to verify the hypotheses.
4. Data are necessary to provide the solution of the problem.
5. Data are also employed to ascertain the effectiveness of new device for its practical utility.
6. Statistical data are used in two basic problems of any investigation:
(a). Estimation of population parameters, which helps in drawing generalization about the
population characteristics.
(b). The hypotheses of any investigation are tested with the help of data.
Dr. Yousaf Hayat
16
Sources of Data
Data Sources
Print or Electronic
Observation
Survey
Experimentation
Dr. Yousaf Hayat
17
Types of Data
Primary Data
The data which is collected for the first time from its source, is
called primary data.
Secondary Data
When the primary data is passed through any sort of statistical or

mathematical treatment, the data is known as secondary data.
OR, the data that are collected and compiled by an outside source
or by someone in the organization who may later provide access to
the data to other users.
Dr. Yousaf Hayat
18
Collection of Data
Collection of Primary Data
Primary data can be collected through:
i.
Direct personal investigation
ii.
Indirect investigation or personal interviews
iii.
Collection through questionnaire
iv.
Collection through enumerators
v.
Collection through local sources
Collection of Secondary Data
Secondary data can be collected from:
i.
Collection from official records
ii.
Collection from semi-official records
Dr. Yousaf Hayat
19

i.
Direct Personal Investigation
According to this method, the researcher/investigator collect information in

person from the selected respondents. In this method an investigator has a
degree of freedom and open choices of asking a variety of questions
(open ended and closed ended or mixed). The data collected through this
method is complete, however, personal bias can be present due to
personal involvement of the investigator. Also this method is very costly
and time consuming.
ii.
Indirect Investigation
In some cases, it is not possible to take direct information from the respondents
due to certain limitations. So, in such a circumstances indirect
investigation is carried out by involving a third party for collecting the
required information. This method is useful in conducting the inquiries or
the information are the information required are complex.
Dr. Yousaf Hayat
20

iii.
Collection through questionnaire
In this method, a list of questions (called questionnaire) is prepared by the

researcher/investigator covering all aspects of the study being required. A list
of questions is send to the respondents through mail or email with a request to
send back after answering all the listed questions. In this method it is possible
that the respondents keep some of the questions blank due to no understanding
or dont want to give information about those questions. In addition, some of
the respondents are not willing at all to given any of the information that are
contained in the questionnaire.
iv.
Collection through enumerators
According to this method, trained peoples are send to the area under study for
collecting information on a pre-specified performa. Information collected
through this method will be more useful as compared to the questionnaire
method. In this method, the enumerators can take information from the
respondents directly or may be his/her closed relatives (if not available on the
spot).
Dr. Yousaf Hayat
21

V.
Collection through local sources
As the name indicates that by using this method, data are collected through local
sources. Local sources means that information are not directly collected from
the respondents but the desired information are collected from the people
belong the area about which information are required.
Dr. Yousaf Hayat
22
Time Series Data
The data collected at different interval
Time Series data of a company showing

profit, production and sale.
Year
Profit
Producti
on
1990
12
120
110
1991
13
140
132
organization/firm) is called time
1992
14
150
145
series data. For example,
1993
13.5
140
123
1994
10
103
90
1995
11
115
100
1996
12.5
123
122
1997
13.8
140
135
1998
15
160
145
2000
11.6
120
115
2001
15
162
150
2002
16
165
145
of time regarding a commodity or

group
of
commodities
(or
Dr. Yousaf Hayat
23
Sale
Cross Sectional Data
Widely dispersed data (such as) relating to

one period, or data related to households,
data collected from the field survey i.e.
monthly profit of the selected stores, or
monthly profit of different companies
related to only one period etc
Cross sectional data of 12 different

households showing profit, production
and sale.
Household
Profit
Producti
on
12
120
110
13
140
132
14
150
145
13.5
140
123
10
103
90
11
115
100
12.5
123
122
13.8
140
135
15
160
145
10
11.6
120
115
11
15
162
150
12
16
165
145
Dr. Yousaf Hayat
24
Sale
Panel Data
Panel data is data from a (usually
Panel data of three different firms showing,

production and sale during 2000-2002.
Firm
Year
Producti
on
over time on a (usually large)
2000
120
110
number of cross-sectional units
2001
140
132
2002
150
145
2000
140
123
2001
103
90
2002
115
100
2000
123
122
2001
140
135
2002
160
145
2000
120
115
2001
162
150
2002
165
145
small) number of observations
like
individuals,
households,
firms, or governments.
Dr. Yousaf Hayat
25
Sale
Data Collection Tools
Primary Data
Direct personal investigation (Schedule)
Questionnaire methods
Through Enumerators etc.
Secondary Data
Printed materials
Beauru of Statistics
State Bank of Pakistan
Commercial and Research Journals
Business trades etc
Dr. Yousaf Hayat
26
Frequency
Repetition of an observation in a data set is called frequency of that particular
observation/data point/individual. OR
Total number of observations in a class is called the frequency of that class. For
example, consider the following data showing the monthly salaries of 50
employees of a certain University. In this example, 20 is the frequency of
the class (employees) having salary Rs. 40, 000 per month, and 3 is the
frequency of the employees drawing Rs. 90,000 per month salary.
Salary (000)
40
50
60
70
80
90
Number of
employees
20
10
Dr. Yousaf Hayat
27
Class Boundaries: In a grouped frequency distribution, if upper limit of a

class is repeated as a lower limit of the next class, such classes are called
class boundaries. For example, consider the following data set:
Salary (000)
Number of
employees
5-10 10-15
20
10
15-20
20-25
25-30
30-35
Class Limits: In a grouped frequency distribution, if upper limit of a class is

not repeated as a lower limit of the next class, such classes are called
class limits. For example, consider the following data set:
Salary (000)
5-9
10-14
15-19
20-24
25-29
30-34
Number of
employees
20
10
Dr. Yousaf Hayat
28
How to convert class limits in to

class boundaries:
Classes
Class
Boundaries
5-9
4.5-9.5
10-14
9.5-14.5
15-19
14.5-19.5
20-24
19.5-24.5
25-29
24.5-29.5
30-34
29.5-34.5
In the given data, classes shows

class limits. To convert class
limits in to class boundaries,
calculate the mid-way-value
as: = (10-9)/2 = 0.5
Now subtract 0.5 from each of the
lower limit of the class, and
add 0.5 to each of the upper
class limits. See the example
for further understanding.
Dr. Yousaf Hayat
29
PRESENTATION OF DATA
Diagrams and Graphs
1.
2.
Simple Bar Diagram
Multiple Bar Diagram
Component Bar Diagram
Pie Chart
Frequency Curves
Graphical Display of Data

Tabulation and Classification
Dr. Yousaf Hayat
30
Frequency Distribution
Arrangement of data in to different classes or group in such a way that each class/group has their own
frequency, is called frequency distribution. For example, the following data shows the frequency
distribution of the salary of 50 employees of a firm. This frequency distribution is called discrete
frequency distribution.
Salary (000)
40
50
60
70
80
90
Number of employees
20
10
Where as, the data below indicate the grouped/continuous frequency distribution
of the amount of salary of 50 employees.
Salary (000)
5-9
10-14
15-19
20-24
25-29
30-34
Number of employees
20
10
Dr. Yousaf Hayat
31
PRESENTATION OF DATA: DIAGRAMS
City Name
No. of Industries
No. of Banks
Peshawar
50
35
Islamabad
40
45
Karachi
120
90
Lahore
70
55
Faisalabad
90
30
Quetta
15
Dr. Yousaf Hayat
32
Simple Bar Diagram:
Figure: Summary of the number of industries in different cities of Pakistan
Dr. Yousaf Hayat
33
Multiple Bar Diagram:
Figure: Summary of the number of industries and Banks in different cities of Pakistan
Dr. Yousaf Hayat
34
Component Bar Diagram:
Figure: Summary of the number of industries and Banks in different cities of Pakistan
Dr. Yousaf Hayat
35
Pie Chart
Figure: Summary of the number of industries in different cities of Pakistan
Dr. Yousaf Hayat
36
A line graph of grade point average
Dr. Yousaf Hayat
GPA
Number of Students
2.8
2.9
18
3.1
14
3.2
20
3.3
40
3.4
35
3.5
25
3.6
13
3.7
3.8
3.9
37
Scatter plot of GPA and the Starting Salary
Dr. Yousaf Hayat
GPA
Starting Salary
2.8
20000
2.9
20000
25000
3.1
30000
3.2
15000
3.3
26000
3.4
30000
3.5
24000
3.6
18000
3.7
30000
3.8
45000
3.9
35000
38
Statistical Tools and Methods
Measures of Central Tendency
Measures of Dispersion
Reliability Analysis
Mean Comparison (statistical tests)
Analysis of Variance (ANOVA)
Tests of Association
Regression and Correlation Analysis
Non-parametric Tests
Dr. Yousaf Hayat
39
SIGNIFICANCE OF STATISTICS
Statistical methods are used for summarization of a large set of data.
Statistical methods are used for analyzing the data related to filed and lab experiments.
Statistical methods are used for conducting sampling surveys, also the data coming from
surveys can be analyzed by using statistical methods to find solutions of the problems
under study.
Statistical methods are helpful in effective planning in any field of inquiry.
Statistical methods are used in each and every filed of scientific discipline like
agriculture, business, medical, biological, genetics, physical and social sciences etc.
Banks, insurance companies, government and semi-government organizations, are using

statistical techniques as a tool for data analysis.
Statistics helps in drawing general conclusions about the characteristics of a

population/aggregate on the basis of sample data;
Statistical methods are also helpful in making prediction (forecasting).
Dr. Yousaf Hayat
40
MEASURES OF CENTRAL TENDENCY

A data set can be summarized into a single value, usually lies somewhere in
the centre and represent the whole data set. Such a single value that
represent the central part of a data set is called central value. Tendency of
observations that cluster in the central part of the data set is called central
tendency. Most commonly used measures of central tendency are given
in the following diagram:
Measures of
Central
Tendency
Arithmetic
Mean
Median
Mode
Dr. Yousaf Hayat
Harmonic
Mean
41
Geometric
Mean
MEASURES OF CENTRAL TENDENCY

Arithmetic Mean
Simply it is called mean or average and mostly used measure of central
tendency in every field of research. Arithmetic mean is a value obtained
by dividing the sum of all observations in a data set by the number of
observations.
Sum of all observations

Mean =
Total number of observations
Dr. Yousaf Hayat
42
Mathematical Description of Arithmetic Mean

Mathematically, Arithmetic mean is expressed as
N
x x2 K x N
= 1
=
N
x
i 1
[Population data]
xi
x1 x2 K xn
X =
= i 1
[Sample data]
n
n
Example: The following data shows the consumption (in thousand of Rs.) of 9 MBA
students per semester in a certain University, compute arithmetic mean and interpret the
result. The data is: 39, 36, 48, 36, 41, 37, 32, 46 and 45.
n
X =
x
i 1
39 36 48 36 41 37 32 46 45
360
40
9
9
It indicates that on the average, each MBA student is consuming Rs. 40,000
per semester.
Dr. Yousaf Hayat
43
Arithmetic Mean for Grouped Data

Mathematically, Arithmetic mean for grouped data (frequency distribution )
is expressed as
f1 x1 f 2 x2 K f n xn
X =
f1 f 2 K f n
n
fx
f
i 1
fx
, where f
f
n (total number of observations)
where, f1 , f 2 , K , f n are corresponding frequencies

of x1 , x2 ,K , xn
Dr. Yousaf Hayat
44
Example: Consider the following frequency distribution of the salaries of 50 employees

of a certain University , compute arithmetic mean.
Salary (000)[x]
40
50
60
70
80
90
Total
Number of employees [f]
20
10
50
fx
800
500
480
350
320
270
2720
X =
fx
i 1
2720
54.4
50
It shows that each employee of the University has 54.4 (thousand) salary, on the average.
Dr. Yousaf Hayat
45
Example: Using the following data showing the profits (in thousand of Rs.) of 60
different industries, calculate the mean profit (average profit) of the industries.
Profit (000)
65-84 85-104 105-124 125-144 145-164 165-184 185-204 Total
Number of
industries
10
17
10
60
To compute the mean, first we convert the class intervals into mid points (X)
Profit (000)
65-84 85-104 105-124 125-144 145-164 165-184 185-204
Number of industries (f)
10
17
10
Mid point (X)
74.5
94.5
114.5
134.5
154.5
174.5
194.5
fx
670.5
945
1946.5
1345
772.5
698
972.5
X =
fx
f
i 1
7350
122.5
60
Dr. Yousaf Hayat
46
Total
60
7350
DEFINITIONS
Class Boundaries
In a grouped frequency distribution, if upper limit of a class is repeated as a lower limit

of the next class, such classes are called class boundaries. For example, the classes
5-10, 10-15, 15-20, 20-25, 25-30 and 30-35 are the class boundaries.
Class Limits
In a grouped frequency distribution, if upper limit of a class is not repeated as a lower
limit of the next class such as, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34 and 35-39 are
called class limits. These class limits can be converted in to class boundaries by
taking the following steps:
1.
Compute the mid-way value
= (upper limit of the exceeding class lower limit of preceding class)/2 =(10-9).2 = 0.5
2. Now subtract mid-way value from each of the lower class limit and add it with the
upper class limit. In such a case the class/groups so formed are become class
boundaries.
Dr. Yousaf Hayat
47
Mid Point
Mid point is the average of upper and lower class limit/boundary of a particular
class. The following example illustrate the mid point.
Profit (000)
Mid point (X)
65-84
85-104
105-124
125-144
(65+84)/2
= 74.5
(85+104)/2
= 94.5
(105+124)/2
= 114.5
(125+144)/2
= 134.5
Mid point is helpful in the calculation of various statistical measures i.e.

measures of central tendency and measures of dispersion.
Dr. Yousaf Hayat
48
Properties of Arithmetic Mean

1. Sum of deviations taken from mean is zero i.e.
(X X ) 0
2. Sum of squared deviations of xs from mean is minimum i.e.

2
2
(
X
X
)
(
X
a
)
3. In case of k-groups containing n1, n2, ,nk observations with means
x1 , x2 ,..., xk
respectively, then the combined mean (mean of all the groups can be calculated as:
k
ni xi
n1 x1 n2 x2 ... nk xk
x
i 1k
n1 n2 ... nk
ni
i 1
4. If two variables X and Y are linearly related as

Can be computed as:
y a bx
then the mean of Y
y a bx
where, "a " and "b " are constant values.
Dr. Yousaf Hayat
49
MODE
Mode is a value which has maximum frequency as compared to other items of a data
set. OR, the most frequent value of a data set is called mode.
A distribution/data set having only one mode is called uni-modal distribution. Similarly,
a distribution is defined to be bi-modal if it has two modes. Generally, a
distribution having more than one modes is called multi-modal distribution. For
example:
a). 2, 4, 6, 4, 8, 10 (mode = 4)
b). 2, 4, 6, 4, 8, 10, 8 (mode = 4 and 8)
c). 2, 4, 6, 4, 8, 10, 8, 10 (mode = 4, 8 and 10 )
If all the observations of a data set have the same frequencies (repeated the same
number of times), the data set will have no mode. For example: 2, 4, 6, 4, 8, 10, 8,
10, 6: this data set has no mode because each and every observation is repeated the
same number of times.
Mode is the appropriate average for qualitative/nominal data.
Dr. Yousaf Hayat
50
Uni-modal distribution
Bi-modal distribution
Tri-modal
distribution
Dr. Yousaf Hayat
51
MODE FOR CONTINEOUS SERIES
For continuous series/grouped frequency distribution, mode is defined as:
Mode = l +
( f m f1 ) h
(2 f m f1 f 2 )
l = lower limit of the modal group

f m frequency of the modal group
f1 frequency preceeding the modal group
f 2 frequency exceeding the modal group
h = width of class
Mode lies in the group (14.5-19.5) as it has
maximum frequency, so
(20 8) 5
60
14.5
(2 20 8 5)
27
14.5 2.22 16.72 Dr. Yousaf Hayat
Consumption
4.5-9.5
10
9.5-14.5
14.5-19.5
20
19.5-24.5
24.4-29.5
29.5-34.5
Total
50
Mode = 14.5 +
52
MEDIAN
Median is a value which divide and arranged data set into two equal parts i.e.
half (50%) of the observations will lies below and half (50%) will come
above that value.
For example: what will be the median of the following data showing weekly
profit (000) of seven stores as: 10, 20, 15, 13, 14, 9 and 12.
Arranged data (increasing order): 9, 10, 12, 13, 14, 15, 16
Median = 13
Similarly, for the data set having the size (even number) divisible by 2, median
will be the average of two middle values, for example:
9, 10, 12, 13, 14, 15, 16, 20 (here n = 8) so
Median = (13+14)/2 = 13.5

Dr. Yousaf Hayat
53
Numerical Examples-Continuous Frequency Distribution

The following data shows the frequency distribution of the salary of 50
employees of a firm. Calculate the following
1.
Arithmetic mean
2.
Median
Salary (000)
5-9
10-14
15-19
20-24
25-29
30-34
Number of
employees
20
10
Dr. Yousaf Hayat
54
To calculate the required quantities, we take the following steps:

Salary (000)
Class boundary
cf
fX
5-9
4.5-9.5
20
20
140
10-14
9.5-14.5
10
30
12
120
15-19
14.5-19.5
38
17
136
20-24
19.5-24.5
43
22
110
25-29
24.4-29.5
47
27
108
30-34
29.5-34.5
50
32
96
50
AM =
fX
f
710
710
14.2
50
For median, n/2 = 50/2 = 25. It implies that median lies in the
group (9.5-14.5), so
Median = l
h
5
(n / 2 c) = 9.5 +
(50 / 2 20) 12
f
10
Dr. Yousaf Hayat
55
QUANTILES
Quartiles, Deciles and Percentiles are collectively called quintiles. Generally,
quantiles are also called measures of position.
Quartiles: The three points which divide an arranged (ascending order) data
set into four equal parts are called quartiles. Quartiles are denoted by Q 1,
Q2 and Q3.
Q1 = lower quartile or first quartile
Q2 = second quartile = median
Q3 = upper quartile or third quartile
25%
25%
Q1
25%
Q2
Dr. Yousaf Hayat
25%
Q3
56
Deciles: The nine points which divide an arranged (ascending order) data set
into 10 equal parts are called deciles. Deciles are denoted by D 1, D2
-----D9.
th
D110%
= first decile,
D
=
second
decile,
.,
D
=
9
2
9
10% 10% 10% 10% 10% decile
10%
D1
D2
D3
D4
D5
D6
10% 10%
D7
D8
10%
D9
Q2
Median
D1 is a value from which 10% observations lies below and 90% lies above;
D2 is a value from which 20% observations lies below and 80% lies above;
----------D9 = 90% observations below and 10% lies above
Dr. Yousaf Hayat
57
Percentiles: Percentiles divide an arranged data set into 100 equal parts. These
are 99 points to do so. Percentiles are denoted by P i ( i = 1,2 , 3, ., 99).
1%
1%
P1
---------------------
P50
---------P75-------
Q2
Q3
P99
Median
D5
P1 is a value from which 1% observations lies below and 99% lies above;
P2 is a value from which 2% observations lies below and 98% lies above;
----------P99 = 99% observations below and 1% lies above
Dr. Yousaf Hayat
58
RELATION AMONG QUANTILES

Q1= P25
Q2 = Median = D5 = P50
P10 = D1;
P20 = D2;
P30 = D3 -----------; P90 = D9
Q3 = P75
Dr. Yousaf Hayat
59
COMPUTATIONAL FORMULAE
h jn
Q = l + ( c)
f 4
l = lower limit of the j th quartile group
h = width of class
f = frequency of the j th quartile group
c = cumulative frequency preceding the j th quartile group
j
h jn
D = l + ( c)
f 10
l = lower limit of the j th decile group
j
h = width of class
f = frequency of the j th decile group
c = cumulative frequency preceding the j th decile group
Dr. Yousaf Hayat
60
COMPUTATIONAL FORMULA FOR PERCENTILES
h jn
P =l +
(
c)
f 100
j
l = lower limit of the j th percentile group

h = width of class
f = frequency of the j th percentile group
c = cumulative frequency preceding the
j th percentile group
Dr. Yousaf Hayat
61
Examples-Quantiles
Example: The following data shows the frequency distribution of the salary of
50 employees of a firm. Calculate the following:
1.
Median
2.
Lower and upper quartiles
3.
5th and 7th deciles (D5 and D7)
4.
P25, P50 and P75
Also describe the relationship among the computed quantities

Salary (000)
Number of
employees
5-9
10-14
15-19
20-24
25-29
30-34
20
10
Dr. Yousaf Hayat
62
To calculate the required quantities, we take the following steps:

Salary
(000)
Class
boundary
cf
5-9
4.5-9.5
10-14
9.5-14.5
13
15-19
14.5-19.5
20
33
20-24
19.5-24.5
10
43
25-29
24.4-29.5
47
30-34
29.5-34.5
50
50
h n
( c)
f 2
put l 14.5, h = 5, f = 20, c = 13
5
Median = 14.5 +
(25 13)
20
Median = 14.5 + 3 = 17.5
Median = l +
To compute the required quantities,

first we will find the group/class in
which it will lie to occur.
For median: n/2 = 50/2 = 25. It
implies that median lies in the group
(14.5-19.5).
For Q1: n/4 = 50/4 = 12.5. It indicate
the Q1 lies in the group (9.5-14.5).
For Q3: 3n/4 = 3*50/4 = 37.5. It
indicate the Q3 lies in the group
(19.5-24.5).
h n
( c)
f 4
put l 9.5, h = 5, f = 8, c = 5
5
Q1 = 9.5 +
(12.5 5)
8
Q1 = 9.5 + 4.69 = 14.19
Q1 = l +
h
(3n / 4 c); put l = 19.5, h 5, f 10, c 33
f
Q3 = 19.5+5/10(37.5-33) =Dr.
21.75
Yousaf Hayat
Q3 = l
63
For 5th Decile: 5n/10 = 5*50/10 = 25. It implies that D5 lies in the group
(14.5-19.5).
h 5 n
(
c)
f 10
put l 14.5, h= 5, f = 20, c = 13
5
D5 = 14.5 +
(25 13)
20
D5 = 14.5 + 3 = 17.5
D5 = l +
For 7th Decile: 7n/10 = 7*50/10 = 35. It implies that D 5 lies in the group
(19.5-24.5 ).
h 7n
D7 = l +
(
c)
f 10
put l 19.5, h= 5, f = 10, c = 33
5
D7 = 19.5 +
(35 33)
10
D7 = 19.5 + 1 = 20.5
Dr. Yousaf Hayat
64
For P25: 25n/100 = 25*50/100 = 12.5. It

implies that P25 lies in the group (9.514.5).
h 25 n
(
c)
f
100
put l 9.5, h= 5, f = 8, c = 5
For P75: 75*n/100 = 75*50/100 =

37.5. It indicate the P75 lies in the
group (19.5-24.5).
P25 = l +
5
(12.5 5)
8
P25 = 9.5 + 4.69 = 14.19
P25 = 9.5 +
For P50: 50*n/100 = 50*50/100 = 25. It

implies that P50 lies in the group (14.5-19.5).
P50 = l +
h 50 n
(
c)
f
100
h
(75 n /100 c);
f
put l = 19.5, h 5, f 10, c 33
P75 = l
P75 = 19.5+5/10(37.5-33) = 21.75

From the analyses, it has been
verified that:
put l 14.5, h = 5, f = 20, c = 13
Q1= P25
5
(25 13)
20
P50 = 14.5 + 3 = 17.5
Q3 = P75
P50 = 14.5 +
Dr. Yousaf Hayat
65
Q1= P25
P10 = D1;
P20 = D2;
P30 = D3 -----------; P90 = D9
Q3 = P75
Dr. Yousaf Hayat
66

Statistics for Social Sciences Overview

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics for Social Sciences Overview

Uploaded by

Copyright:

Available Formats

STATISTICS FOR SOCIAL

Dr. Yousaf Hayat

Statistics is a science of sampling and estimation.

Statistics is a science of probability.

Statistics is a science of collecting information/data.

Statistics is a science of presentation of data either in qualitative

Statistics is a science of analyzing the data.

Statistics is a science of collection, presentation, analysis and

interpretation of numerical data.

Dr. Yousaf Hayat