You are on page 1of 26

STATISTICS IN EDUCATION

Introduction
We are living an information age, which is invariably bound up with the notion of
counting and measurement. There would be no exaggeration in saying that the process of
counting and measurement in quite near to our lives also in education field. In fact one can
easily establish that the process of counting and measurement has been with us ever since
human race stepped towards civilization. It is the sheer importance and applications of
counting that has led to the emergence of the discipline, ‘STATISTICS’

The term ‘statistics’ seems to have been derived from the Latin word ‘status’ or
Italian word ‘statista’ or the German word ‘statistik’, each of which means ‘Political State’.
Statistics was born as the ’Science of Kings’. It had its origin in the needs of the ruling chiefs
in the olden days for collecting data on vital matters such as population, man power, and
wealth in the form of land, buildings and other assets with a view to framing their military
and fiscal politics.

Statistics – Definition
“Statistics is the science which deals with collection, classification and tabulation of numeric
facts as a basis for explanation, description and comparison of phenomenon” -
Lovitt

“Statistics may be defined as the collection, presentation, analysis and interpretation of


numerical data”

- Croxten and Cowden

Basic Terms in Statistics

Discrete Variable: Those variables which can be assume only distinct or particular values
are called discrete variable. They are exact or finite and are not normally fractions.
Continuous Variable: Those variables which can take any numerical values are known as
continuous variable.
Series: A series, as used statistically, may be defined as things or attributes of things
arranged according to some logical order.
Discrete Series: Ant series represented by discrete variables is called discrete series.
Continuous Series: Any series represented by continuous variable is called continuous series.
Raw Data: A mass of statistical data in its original form is called data or ungrouped data.
Class: It is a decided group of magnitude. Eg. 0 – 10, 10 – 20 etc.
Open-end Class: A lowest class lacking of lower limit and a highest class lacking an upper
limit are called open end classes.
Eg. Below 5 -------- Open-end class
5 – 10
10 – 15
15 – 20
20 above ------ Open-end Class

Inclusive type classes or working class: The classes in the form of 1 – 5, 6 – 10, 11 – 15, ----
are called Inclusive type classes. Here both limits (lower limit and upper limit) included in the
same class itself.
Exclusive type classes or Actual classes: The classes in which upper limit not included. 0 –
10, 10 – 20, 20 – 30, ---- etc. are Exclusive type classes.
Prepared by KSK 8
Class limit: The class limits are the lowest and highest values of the variable that can be
included in that class.
Class boundaries: The class limits of the exclusive type classes or actual classes are
called actual limits or class boundaries.
Mid points of the class or class marks: The mid point of a class is the average of the
upper and lower limit of the class.
Class interval: The class interval or class width is the difference between the upper limit
and lower limit of the class.
• Conversion of Inclusive type classes into Exclusive type classes:
• Note the difference between one upper limit and next lower limit of the inclusive class.
• Divide the difference by 2
• Subtract that value from the lower limit and ass the same to the upper limits
• Do the same in all classes.
Frequency: the number of times a given value in an observation appears is the frequency.
Class frequency: the number of values in each of the quantitative classes is called the
class frequency.
Total frequency: the sum total of the frequencies is known as the total frequency.

SCORING AND TABULATION OF SCORES


Frequency Distribution
Frequency distribution is an important method of condensing and presenting data.
This representation is also called ‘frequency table’
Frequency distribution

Discrete frequenct Continuouse (grouped) frequency


Discrete frequenct distribution

It is a frequency distribution in which we make an array by listing all the values


occurring in the series and noting the number of times each value occurs.

Steps

• Note the different values in the series


• Arrange three columns with heads scores, tally mark and frequency
• Go through the series and put tally marks against respective scores.
• Write the sum of the tally marks of each score in the frequency column.
• Note that the sum of the frequencies of all scores should be equal to the total number
of observations.
Eg.

The following data give the number of children per family in each of 25 families. Construct a
Discrete Frequency Distribution: 1, 4, 3, 2, 1, 2, 0, 2, 1, 2, 3, 2, 1, 0, 2, 3, 0, 3, 2, 1, 2, 2, 1, 4,
2

Number
Tally No. of
of
Marks Children
Children
0 III 3
1 IIII I 6

Prepared by KSK 8
2 IIII IIII 10
3 IIII 4
4 II 2
Total 25

Continuous (Grouped) Frequency Distribution

Continuous (Grouped) Frequency Distribution is a table in which the data are grouped into
different classes and the number of observations falls in each class are noted.

Eg. Construct a Continuous frequency distribution for the following set of observations

70 45 33 64 50 25 65 75 30 20
55 60 65 58 52 36 45 42 35 40
51 47 39 61 53 59 49 41 20 55
42 53 78 65 45 49 64 52 48 46

Classes Tally Frequency


Marks
20 – 29 III 3
30 – 39 IIII 5
40 – 49 IIII IIII 12
50 – 59 II 10
60 – 69 IIII IIII 7
70 - 79 IIII II 3
III
Total 40
Class Construction – Points to remember
• Class interval should be uniform through out.
• As far as possible class interval should be multiple of 5
• As far as possible the number of classes should be vary from 4 to 20
(We have a rule for determining the number of classes known as ‘Sturge’s’ rule, It is
given by k = 1 + 3.22 log N, where ‘k’ denote the number of classes, N – is the total
observations.)
• The class limit should be chosen as to give mid points which are representative of the
frequencies in the class
• Class should be mutually exclusive.
• As far as possible ‘open-end classes’ should be avoided
Cumulative Frequency Distribution
Cumulative Frequency Distribution is a table which gives how many observations are lying
below and above a particular value.

Cumulative frequency distribution

Less than Cumulative frequency Greater than Cumulative frequency

Less than Cumulative frequency distribution

Less than Cumulative frequency distribution ia table which fives the number of
observations falling bellow the upper limit of a class.
Eg. Construct Less than Cumulative Frwquency Distribution

Prepared by KSK 8
Class Frequenc Class Frequency <CF
y 0–5 4
0–5 4 5 – 10 7 4
5 – 10 7 10 – 15 12 (4 + 7)
10 – 15 12 15 – 20 5 11
15 – 20 5 20 - 25 2 (4 + 7 + 12)
20 - 25 2 23
Frequency Distribution (4 + 7 + 12 + 5)
Less than 28
Cumulative Frwquency Distribution (4 + 7 + 12 + 5 + 2)
30

Greater than Cumulative frequency distribution

Less than Cumulative frequency distribution ia table which fives the number of observations
lying above the lower limit of the class

Construct Graeter than Cumulative Frwquency Distribution

Class Frequenc Class Frequency <CF


y 0–5 4 (2 + 7 + 12 + 5 + 2 )
0–5 4 5 – 10 7 30
5 – 10 7 10 – 15 12 (7 + 12 + 5 + 2)
10 – 15 12 15 – 20 5 26
15 – 20 5 20 - 25 2 (12 + 5 + 2)
20 - 25 2 19
(5 + 2 )
Frequency Distribution 7
Greater than
Cumulative Frwquency Distribution 2

Graphical and Diagrammatic representation of data

Apart from tabulation, data can also be presented through diagrams and graphs.
Graphs and Diagrams are visual aids for the presentation of data. They are most convincing
and appealing methods by which statistical data can be presented.

The following are commonly used graphs and Diagrams.

1. Histogram

2. Frequency Polygon

3. Frequency Curve

4. Cumulative Frequency Curve (Ogive)

a. Less than Cumulative Frequency Curve (Less than Ogive)

b. Greater than Cumulative Frequency Curve (Greater than Ogive)

Prepared by KSK 8
5. Pie Diagram (Sector Diagram)

6. Bar Diagram

1. Histogram

• Graphical representation of continuous frequency distribution

• It is a graph including vertical rectangles with no space between the rectangles.

• The class interval taken along the horizontal axis (X – axis) and the respective class
frequencies are taken on the vertical axis (Y – axis) using suitable scales of each
classes.

• For each class a rectangle is drawn with base as width of the class and height as the
class frequency.

• The area of each rectangle will be proportional to or equal to respective frequencies of


the class

• The total area of the histogram will be proportional or equal to the total frequency of
the distribution.

Construct histogram for the following frequency distribution.

X
Class Frequency
0 – 10 4
10 – 20 10
20 – 30 21
30 – 40 9
40 – 50 4
50 – 60 2
Total 50
10 20 30 40
Y
50 60
2. Frequency Polygon

• It is a graphical representation of continuous frequency distribution

• It can be constructed by drawing Histogram or directly plotting the points

Prepared by KSK 8
• To draw Frequency Polygon by drawing Histogram, join the mid-points of the top of the
rectangles of the Histogram using straight lines

• Frequency Polygon can also drawn by joining the consecutive points, plotted by taking
the mid-points of the classes on X-axis and corresponding frequencies on Y-axis.

• The end points are extended at each end and to join the X-axis.

• he total area under the Frequency Polygon is equal to or proportional to (numerically)


the total frequency of the given distribution.

Construct Frequency Polygon for the following frequency distribution

Class Frequency
0 – 10 4
10 – 20 10
20 – 30 21
30 – 40 9
40 – 50 4
50 – 60 2
Total 50

-10 10 20 30 40 50
60 70

First Method

Frequency Frequency
Polygon Polygon
Frequency ------>

Scale
5

10

15

20

25
5

10

15

20

25

X axis - 1 cm = 10 units
Y axis - 1 cm = 5 units
Frequency
------>

-10 10 20 30 40 50 -5 5 15 25 35 45
60 70 Classes ------> 55 65 Classes ------>
Second Method Third Method

3. Frequency Curve
Prepared by KSK 8
• It is a graphical representation of continuous frequency distribution

• It can be constructed by drawing Histogram or directly plotting the points

• To draw Frequency curve by drawing Histogram, join the mid-points of the top of the
rectangles of the Histogram using smooth curve by free hand method

• Frequency curve can also drawn by joining the consecutive points, plotted by taking
the mid-points of the classes on X-axis and corresponding frequencies on Y-axis.

• The end points are extended at each end and to join the X-axis.

• The total area under the Frequency Curve is equal to or proportional to (numerically)
the total frequency of the given distribution.

Construct Frequency Curve for the following frequency distribution

Class Frequency
0 – 10 4
10 – 20 10
20 – 30 21
30 – 40 9
40 – 50 4
50 – 60 2
Total 50
-10 10 20 30 40 50
60 70

First Method

Frequency Frequency
Curve Curve
Scale Scale
5

10

15

20

25

10

15

20

25

X axis - 1 cm = 10 units X axis - 1 cm = 10 units


Y axis - 1 cm = 5 units Y axis - 1 cm = 5 units

-10 10 20 30 40 50 -5 5 15 25 35 45
60 70 Classes ------> 55 65 Classes ------>
Second Method Third Method
4. Cumulative Frequency Curve (Ogive)

It is the graphical representation of cumulative Frequency Distribution


Two types
Prepared by KSK 8
a). Less than Cumulative Frequency Curve (Less than Ogive)
• It is the graphical representation of Less than Cumulative Frequency distribution.
• Less than Cumulative Frequency Curve is drawn by joining smoothly the points
obtained by plotting the upper limit of the actual classes against their Less than
cumulative Frequencies.
Construct Less than Cumulative Frequency Curve for the following frequency distribution

Class Frequen <C


cy F

Less than Cumulative


0 – 10 5 5

10 – 20 12 17 b). Greater than Cumulative Frequency Curve (Greater than


Ogive)
20 – 30 28 45
• It is the graphical representation of Greater than

frequency
30 – 40 40 85 Cumulative Frequency distribution.
40 – 50 21 106 • Greater than Cumulative Frequency Curve is drawn by
joining smoothlyScalethe points obtained by plotting the
120
50 – 60 10 116 Lower limit of the- 1actual
X axis cm = 10 classes against their Greater
units
than cumulative Frequencies.
Y axis - 1 cm = 20
60 - 70 4 120 100 units

80

Construct Greater than Cumulative Frequency


60 Curve for the following frequency
distribution
40
Greater than Cumulative

Frequen <C 20
Class
cy F
0 – 10 5 120 -10 0 10 20 30 40 50
60 70 80
10 – 20 12 115 Upper limit of Classes
------>
frequency

20 – 30 28 103

30 – 120
40 40 75 Scale
X axis - 1 cm = 10
units
40 – 50 21 35 Y axis - 1 cm = 20
100 units

50 – 60 10 14
80
60 - 70 4 4
60

40
5. Pie Diagram

• Pie20
diagram consist of circle whose area proportional to the magnitude of the variable
they present
• The -10component
0 part of20the variable
10 30 represented
40 50 by means of sectors of the circle
•60 The70area of80the sector proportional to
Lower limit of Classes the frequencies of the component parts of the
variable.
------>

Prepared by KSK 8
• If A1 and A2 are the total magnitude of the two variables, to represent the data by
means of Pie diagram, draw two circles with radius r1 and r2 given by
Draw Pie Diagram for the following data

No. of Angle of the


Category
Students Circle
Distinctio
20
n
First class 40
Second
50
class
Third
45
class
Failure 25
Total 180 360

7. Bar Diagram (simple Bar Diagram)

• Bar diagram is the simplest diagrammatic representation of data.


• They are also called one dimensional diagram.
• These diagrams are generally drawn in the shape of horizontal or vertical bars.
• The bars should be of equal breadth and the height of the bars should be proportional
to the magnitude of each quantity.
• Leave equal space between the bars.

Draw simple bar diagram


Category No. of
Students
Distinctio 20
n
First class 40
Second 50
class
Third 45
class
Failure 25
Total 180

Diagrammatic and Graphic representation – Merits


• It permits easy visualization • They are the simplest method of
• Easy to understand the nature of the presenting data
data • They have universal validity; they are
• Comparative study of different aspect used to supply information to common
of a given data is possible. man
• Help analysis of the data • They give a bird’s eye-view of the
• Help to interpret and draw conclusion entire data
• They are interesting, attractive, and • They have a great memorizing effect
impressive
Diagrammatic and Graphic representation – Limitations

Prepared by KSK 8
• It is difficult to show minor differences • It is subjective in character; its
with their help interpretation varies from person to
• Diagram can be used only to show a person.
limited amount of information • Diagrams and graphs can be misused
• Diagrams show only approximate very easily
values • Diagrams and graphs are not
substitute of the original data
MEASURES OF CENTRAL TENDENCY
• When we collected data from a sample of study, the majority of scores in that collected
data always show a tendency to be closer the average. This phenomenon is called
‘central tendency’.
• The value of the point around which scores tend to cluster is called ‘Measures of Central
Tendency’.
• A measure of central tendency may be defined as a single measure representing all the
scores of given data.
Commonly used Measures of Central Tendency are
1. Mean
2. Median
3. Mode
1. MEAN (ARITHMETIC MEAN)
Case – I: Ungrouped Data (Discrete data)
If x1, x2, x3, …………..xn are N observations
Then A.M (X) = Sum of the x1+x2+x3+……………x
= n

= observations N
A.M () =
Total No. of
Eg, Calculate A.M of the observations: 12, 18, 14, 15, 16 x – Observations (Scores)
A.M (X) = =
12+18+14+15+16 = N- Total frequency
5
= 15
Case – II: Ungrouped Frequency Distribution (Discrete Frequency Distribution)
If x1, x2, x3, …………..xn are observations and f1, f2, f3, …………..fn then A.M is given by

A.M ( ) = = = A.M () =
x – Observations (Scores)
f – Frequency
N- Total frequency
Eg. Calculate A.M of the following data
Answer

Observati Freque Observatio Freque


ns ncy fx
ons
5 ncy
3
(x)
5 (f)
3 15
6 8
7 12 6 8 48 A.M ( ) =
8 10 7 12 84
9 7 8 10 80 =
9 7 63 = 7.25
Assumed
N = 40
Mean Method
∑fx =290
A.M () =A+
Case – III: Grouped Frequency Distribution (Continuous Frequency Distribution)
A- Assumed Mean
A.M () = d- deviations , d = , x – Mid-value of classes
x – Mid-value of classes
f – Frequency , N- Total frequency
f – Frequency
c – class width
N- Total frequency Prepared by KSK 8
Question Answer –Direct Method Answer –Assumed Mean Method

Class f Clas f mid- f x Class f mid- d fd


value
0 - 10 3 0 -s10 3 value
5 15 0 - 10 3 5 -2 -6
10 – 12 15 180 10 – 12 15 -1 -12
10 – 20 12
20- 20-
20
20 - 30 20 20 20 25 500 20 25 - A 0 0
30- 30-
30
30 - 40 10 30 10 35 350 10 35 1 10
40- 40-
40
40 - 50 5 40 5 45 225 5 45 2 10
50 50 N=
N= = =2
50 1270 50

A.M ( ) = A.M ( ) =A+

= = 25.4 = 25 + =
25.4
Arithmetic Mean – Merits Arithmetic Mean – demerits
• It is rigidly defined • AM is affected by extreme values
• AM is easy to understand • AM may lead to wrong conclusion if
• Simple to calculate the figures from which it is
• Based on all observations computed are not known.
• It is capable for further algebraic • AM can’t be calculated for a
treatment. distribution having open end
classes.

2.MEDIAN
• Median is defined as the middle most observation when the observations are arranged
in ascending or descending order of magnitude.

CALCULATION OF MEDIAN
1. Discrete Data & Discrete Frequency Distribution
Let N be the total number of observations,
Case I: N is odd
Median = (th observation when the data are arranged in ascending or descending
order of magnitude

Eg.1 Calculate Median: 8, 12, 16, 10, 9, 6, 17, 20, 25


Data in Ascending order of magnitude: 6. 8, 9, 10, 12, 16, 17, 20, 25

Prepared by KSK 8
th
Here N = 9, Then Median = ( observation = 5th observation
= 12
Case II: N is even

Median =Average of (th observation and (th observation when the data are arranged in
ascending or descending order of magnitude.
Median =

Eg.2 Calculate Median: 30, 26, 42, 28, 35, 20, 32, 50
Data in Ascending order of magnitude: 20, 26, 28, 30, 32, 35, 42, 50

Here N = 8 Median =

=
= = 31
Eg.3 Calculate Median Median = ( th observation = ( th
Observatio frequenc observation
n
5 y
3
6 8 = 11th observation = 6
Here N = 41
7 12
8 10
9 8
Total 41 2. Grouped (Contiguous) Frequency
Distribution
Median =lm + ( ) ×c
lm – Actual lower limit of Median Class
(Median Class – Class in which ( observation
falls
N – Total Frequency
cfm – Cumulative frequency Up to Median Class
fm – frequency of Median Class
c – Class interval
Eg. Calculate Median
Answer:
Class Frequen <CF
0–5 cy
5 5 Median = lm + ( ) ×c Here
5 – 10 10 15 lm = 10
10 – 15 15 30 N = 50
Median Class = 10 + ( ) ×5 cfm = 15
15 – 20 12 42 fm = 15
25 – 25 8 50 c=5
= 10 + ( ) ×5
N = 50
Steps: = 13.33
• Draw Less than or Greater than
Ogive.
Graphical determination of Median
• Locate N/2 on the Y – Axis
I Method N
• At N/2 draw a perpendicular to the Y
– Axis and extent it to meet the
Ogive
• From that point of intersection draw
a perpendicular to the X – Axis Prepared by KSK 8
• The point at which the perpendicular
N/2

II Method Median

Steps:
• Draw Less than and Greater than
Ogive simultaneously
• Draw perpendicular from the point
of intersection to the X - Axis
• The point at which the perpendicular
meets the X- Axis will be the
Median.

Median
Median – demerits
Median – Merits
• It is not based on all observations
• It is rigidly defined • Median is a non-algebric measure
• It is easy to understand and hence not suitable for further
• Simple to calculate algebric treatment
• It can be located by mere inspection • It is can’t be used for computing
• It is not affected by extreme values other statistical measures such as
• It can be calculated for a distribution Standard Deviation, Coefficient of
having open end classes correlation etc.
• It can be determined graphically. • When there are wide variations
between the values of different
scores, a Median may not be
representative of the distribution.
3.MODE
• Mode is the value of the variable which occurs most frequently.
• In certain cases such as exact Mode may not exist or there may be Two or Three
Modes in a distribution.
• When there are Two Modes we call it Bi-Modal Distribution
• If there are Three Modes, we call it Tri-Modal Distribution.
Calculation of Mode

1. Discrete Distribution

Observatio frequenc Eg: Calculate Mode


n
5 y
3
6Mode =lm + 8 ( ) ×c
Mode =7
7lm – Actual12 lower limit of Model Class
8 (Median10Class – Class having maximum
9frequency 8
f1
Total – Frequency
41of the class just below the
Model
2. Continuous Distribution Class
f2 – Frequency of the class just above the
Model Class Prepared by KSK 8
c – Class interval
Eg: Find Mode:
Class Frequency Answer:
80 – 84 4
Mode =lm + ( ) ×c
75 – 79 8
70 – 74 8 f2 =64.5 + ( ) ×5
65 – 69 12 Modal Class = 66.9
60 – 64 9 f1
55 – 59 7
50 – 54 5
45 – 40 3

Mode – Merits Mode – demerits


• Easy to locate • It is not based on all observations
• It is not capable for further algebric
• Not affected by extreme values
treatment
• Can calculate the Mode for the • A slight change in the distribution
may extensively disturb the Mode
distribution having open-end
• In a ungrouped data, if no score is
classes, if open-end classes have repeated, it may lead to wrong
conclusion that the distribution have
less frequency
no mode.
• It is useful in business matters. • As there be 2 or 3 modal values, it
becomes impossible to set a definite
value of a Mode.
EMPIRICAL RELATION

•In a large distribution, that is almost Normal, Mode can be calculated by using the
relation
• Mean – Mode = 3(Mean – Median)
• Mode = 3Median – 2 Mean
MEASURES OF DISPERSION (MEASURES OF VARIABILITY)

• Measures of central tendency need not give an exact picture of the distribution.
• If we compare two groups, merely on the basis of the average, there is a possibility of
being mislead to incorrect judgment
Eg: consider the Marks of two Groups
2, 8, 20, 28, 42 ------------------ Group 1
18, 19, 20, 21, 22 ------------------ Group 2
Here when we calculate the Mean for both groups, we get Mean = 20
But when we examine the scores, we can find that Group1 is Heterogeneous
Group and Group2 is a Homogeneous Group.
• The statistical measures used to determine the Nature and extent of dispersion of the
scores are known as Measures of Dispersion or Measures of Variability.
Prepared by KSK 8
• Measures of Dispersion measures the spreading of observations from the central value
of the distribution.
Commonly used Measures of Dispersion
1. Range 3. Mean Deviation
2. Quartile Deviation 4. Standard Deviation
1.RANGE
Range is the difference between the highest and lowest scores in a Distribution.
Range (R) = H –
L
H – Highest Value
L – Lowest Value

Eg: find Range 53, 51, 70, 45, 60, 62, 40, 53, 71, 55
Range (R) = H – L
= 71 – 40
= 31
Observation frequency Range (R) = H – L
5 3 =9-5
6 8 =4
7 12
8 10
9 8
Total 41
In a continuous distribution, Range is the difference between
the upper limit of the highest class and lower limit of the lowest class.

Eg:

Class f
10 – 20 12
Range (R) = H – L
20 - 30 20 = 50 - 10
30 - 40 10 = 40
40 - 50 5

Range – Merits Range – Merits


• Simplest measure of dispersion •
Not based on all observations.
• Easy to calculate and easy to •
Very mush affected by extreme
understand. values.
• It is influenced by fluctuations of
sampling.
• For open-end classes, calculation of
Mode is impossible.
3. QUARTILE DEVIATION (SEMI INTER QUARTILE RANGE)

• The quartile deviation is half the difference between the upper and lower quartiles in a
distribution.
• It is a measure of the spread through the middle half of a distribution.
• It can be useful because it is not influenced by extremely high or extremely low
scores.

Prepared by KSK 8
• Quartile: One of the four divisions of observations which have been grouped into four
equal-sized sets based on their statistical rank.
• Lower Quartile (first quartile) Q1: first point of division of observations which have
been grouped into four equal-sized sets based on their statistical rank.
• Upper Quartile (Third quartile) Q3: Third point of division of observations which
have been grouped into four equal-sized sets based on their statistical rank.
• Second Quartile Q2: Second point of division of observations which have been
grouped into four equal-sized sets based on their statistical rank.
• Second Quartile is called Median

Q1 =l1 + ( ) ×c
Quartile Deviation (Q) =
Q1 – Lower (First) Quartile Q3 =l3 + ( ) ×c
Q3 – Upper (Third) Quartile

Calculation of Quartile Deviation

1. Discrete Data:

Eg: find Quartile deviation: 2, 13, 17, 20, 25, 28, 30, 33, 37, 40, 41

Answer

2 13 17 20 25 28 30 33 37 40 41

Q Q Q
1 2 Median =l1 + 3 ( ) ×c
l1 – Actual lower limit of Q1 Class
Quartile Deviation (Q) = (Q1 Class – Class in which ( observation falls
N – Total Frequency
=
cf1 – Cumulative frequency Up to Q1 Class
= 10
f1 – frequency of Q1 Class
c – Class interval

Frequen
Class <CF
2. Continuous Distribution
cy
30 – 35 10 10 Answer
Q1 =l1 + ( ) ×c l1 = 35
35 – 40 16 26 = 35 + ( ) ×5 N = 100
= 39.68 cf1 =
10
40 – 45 18 44
Q1 c=5
Class Q1 =l3 + ( ) ×c f1 = 16
45 – 50 27 71
= 50 + ( ) ×5
= 51.11 l1 = 50
50 – 55 18 89 N = 100
cf1 =
55 – 60 8 97 Quartile Deviation (Q)71
=
Q3 = c=5
Class = 5.715
f1 = 18
60 – 65 3 100

Prepared by KSK 8
3. MEAN DEVIATION (AVERAGE DEVIATION)
• Mean Deviation is the average of the deviations of the scores taken from the Mean
• It may be calculated by taking the deviations of each of the scores from the mean and
fins the average of these scores.
• Deviations may –ve or +ve, so take absolute value of deviations.

Continuous
Discrete Data Discrete Distribution
Distribution
Mean Deviation = Mean Deviation =
Mean Deviation =
x - Scores x - Scores
x – Mid-value
- Arithmetic Mean - Arithmetic Mean
- Arithmetic Mean
N – Total Number of f - Frequency
f - Frequency
scores N – Total frequency
N – Total frequency

Calculation of Mean Deviation


1. Discrete Series
Calculate Mean Deviation 8, 10, 12, 14, 16, 18, 20, 22,
Score (x) Answer:
8 7
10 5 =
12 3
14 1 = 15
16 1
18 3 Mean Deviation =
20 7
=
22 8

= 4.38

2. Discrete Distribution Answer:


Eg:
Scor f Score (x) f fx
e22(x) 5 22 5 110 Answer: 14 70
Class Mid-Value f fx
27 10 27 10 (x) 270 19 90
20 - 24 22 5 110 14 70
32 25 32 25 800 4 100
25 – 29 27 10 270 19 90
37 30 37 30 1110 1 30
30 – 34 32 25 800 4 100
42 20 42 20 840 6 120
35 – 39 37 30 1110 1 30
47 10 47 10 470 11 110
40 – 44 42 20 840 6 120
Score f ∑fx =52
45 - 49 47 10 470 11 110
(x) =3600 0
20 - 5
∑fx =52
24–
25 10 Mean Deviation =
=3600 0
29–
30 25 =
Mean Deviation =
3534–
3. Continuous
30 Distribution ==5.2
Eg:39
40 – 20
44- = 5.2
45 10 Prepared by KSK 8
49
4. STANDARD DEVIATION

• Standard Deviation is the square root of the average of the squares of the
deviations of the scores taken from the mean. SD denoted by the symbol σ
(sigma).
• The Arithmetic Mean (Average) of the squares of deviations is known as Variance.
• Standard Deviation is the square root of the Variance.
Calculation of Standard Deviation – Steps
1. Find the Arithmetic Mean of the given data.
2. Find the deviations from Arithmetic Mean of scores.
3. Find the average of squares of deviations taken from the Mean.
4. Find the square root of the average of squares of deviations.

Continuous
Discrete Data Discrete Distribution
Distribution
Standard Deviation = Standard Deviation =
Standard Deviation =
x - Scores x - Scores
x – Mid-value
- Arithmetic Mean - Arithmetic Mean
- Arithmetic Mean
N – Total Number of f - Frequency
f - Frequency
scores N – Total frequency
N – Total frequency

Calculation of Standard Deviation

1. Discrete Series

Find Standard Deviation: 35, 49, 32, 45, 39


Score Answer
35 -5 25
=
49 9 81
= 40
32 -8 64
Standard Deviation =
45 5 25
=
40 1 1
= 6.26

2. Discrete Frequency Distribution (Ungrouped Distribution)

Prepared by KSK 8
Find Standard Distribution
Answer
Score Frequen Scor Freque fx d=(x - (x - )2 f(x - )2
cy e ncy )
22 5 22 5 110 -14 196 980
27 10 27 10 270 -9 81 810
32 25 32 25 800 -4 16 400
37 30 37 30 1110 1 1 30
42 20 42 20 840 6 36 720
47 10 47 10 470 11 121 1210
N=100 N=100 ∑fx=36 ∑fd2=4150
00

A.M ( ) = Standard Deviation =

= =

= 36 = 6.44
3. Continues Frequency Distribution (Grouped Distribution)
Score Frequen
Calculate Standard Deviation
cy
20 – 5
24–
25 10
29–
30 25
34 -
35 30
39–
40 20
4544
- 49 10
N=100

Answer

Score Freque
x fx (x - ) (x - )2 f(x - )2
ncy
20 – 22 5 110 -14 196 980
24 Standard Deviation =
25 – 27 10 270 -9 81 810
29
30 – 32 25 800 -4 16 400 =
34
35 - 37 30 1110 1 1 30
39 = 6.44
40 – 42 20 840 6 36 720
44
45 - 49 47 10 470 11 121 1210
N=100 ∑fx=36 ∑fd2=4150
00

For a large distribution, Short-cut method (Assumed Mean Method) can be used to calculate
Standard Deviation

Prepared by KSK 8
SD = c

c – Class interval d =
f – Frequency
x – Mid-point
N – Total Frequency
- Assumed Mean
d - Deviations

Calculate Standard Deviation Using Assumed Mean Method


Distribution Answer
class f class f x fx d d2 fd fd2
45 - 45 -
2 2 47 94 5 25 10 50
49 49
40 - 40 -
3 3 42 126 4 16 12 48
44 44
35 - 35 -
2 2 37 74 3 9 6 18
39 39
30 - 30 -
6 6 32 192 2 4 12 24
34 34
25 - 25 -
8 8 27 216 1 1 8 8
29 29
20 - 20 -
8 8 22 - 176 0 0 0 0
24 24
15 - 15 -
7 7 17 119 -1 1 -7 7
SD = c

SD = 5

= 11.31

Standard Deviation – Advantages


• Rigidly defined Standard Deviation – limitations
• Based on all the observations • Statistical interpretation is
• It is capable for further algebric comparatively difficult
treatment • It gives more weight to extreme
• SD is used in many advance scores and less to those which are
statistical studies near the mean; because the squares
• It is less affected by fluctuations in of the deviations are taken. These
sampling squares will become very large as
the deviations increase
CORRELATION
• Correlation may be defined as the relationship between two variables.
• There are three types of correlation
o Positive correlation
o Negative correlation
o Zero correlation
Positive correlation: When the first variable increase or decrease, the other variable also
increases or decrease respectively, then the relationship between this two variables are said
to be in Positive correlation.

Prepared by KSK 8
Eg: Intelligent and Achievement
Negative correlation: When the first variable increase or decrease, the other variable
decrease or increases respectively, then the relationship between this two variables are said
to be in Negative correlation.
Eg: Time spend to practice and Number of typing error
Zero correlation: if there is no relationship between two variables, then the relationship
between this variable are said to be in Zero correlation.
Eg: Body weight and Intelligent
COEFFICIENT OF CORRELATION
o The ratio indicating the degree of relationship between two related variables is called the
coefficient of correlation.
• For a perfect positive correlation, the Coefficient of Correlation is +1 and for a perfect
Negative correlation, the Coefficient of Correlation will be -1.
• Perfect positive or Negative correlation is possible only in Physical Science.

• In a Social Science like Education, the correlation between two variables will lie within the
limit +1 and -1
• Positive correlation varies from 0 to +1 and Negative correlation varies from 0 to -1
• Zero correlation indicates that there is no consistent relationship between two variables.
Use of Coefficient of Correlation
• It helps to determine the validity of • It indicates the nature of the
a test. relationship between two variables.
• It helps to determine the reliability • It predicts the value of one variable
of a test. given the value of another related
• It can be used to ascertain the variable.
degree of the objectivity of a test. • It helps to ascertain the traits and
• It can answer the validity arguments capacities of pupils.
for or against a statement.
Calculation of Correlation Coefficient
• There are two important techniques for calculating Correlation coefficient
 Rank Correlation
 Product Moment Correlation
Rank Correlation
• Spearman who for the first time measures the extent of correlation between two set
of scores by the method of Rank Difference

Rank D – Rank Difference


Correlation ρ = 1 - D=
Coefficient
N – Total Score
Find Rank Correlation Coefficient Answer
Name of Scor Scor Name of Score Score Rank Rank Rank
Student e in e in Student in in in in Differe D2
s Math Physi s Math Physic Math Physi nce (D=
s cs s s s (R1) cs )
Nikhil 45 68 Nikhil 45 68 4 3 1 1
Santhosh 53 76 Santhosh 53 76 2 1 1 1
John 67 70 John 67 70 1 2 1 1
Jenna 40 64 Jenna 40 64 5 5 0 0
Gopal 35 54 Gopal 35 54 6 6 0 0
Mohamm 50 66 Mohamm 50 66 3 4 1 1
ed ed ∑ D2=
Rank Correlation Coefficient = 1 -
4

=1
Prepared by KSK - 8
Here the correlation is found
to be Positive and High
Product Moment Correlation
Karl Pearson devised formula for the calculation of Product Moment Correlation
coefficient

x, y : the deviations of the first set of scores and the second


Product
set of scores from their respective Means
Moment r = 1 -
n : Number of scores in a set
Correlatio
n – Standard deviations of the first set of scores and
Coefficie second set of scores respectively

Eg: Find Product Moment Correlation coefficient


Answer
Height Height deviatio Deviatio
Height Height
of of n from n from
of Son of Son x2 y2 xy
Father( Father( Mean Mean
(h2) (h2)
h1) h1) x y
65 67 65 67 -3 -2 9 4 6
66 68 66 68 -2 -1 4 1 2
67 65 67 65 -1 -4 1 16 4
67 68 67 68 -1 -1 1 1 1
68 72 68 72 0 3 0 9 0
69 72 69 72 1 3 1 9 3
70 69 70 69 2 0 4 0 0
72 71 72 71 4 2 16 4 8
∑h1=54 ∑ ∑ ∑ xy
∑h2=552 2
4 r=1-x =36 y2=44 =24
AM of h1 =
AM of h2 = =

SD of h1 ( = 0.6
Positive correlation between
SD of h2 ( Height of Father and Height of
Son
Short-cut mark Mark
stude
Method
Product Test1 Test2 x2 y2 xy
Mark Mark nts
stude
Moment Test1 Test2 x, (x)
y : first set(y)
of scores and the second set
nts r
Correlatio (x)= A 8 of 9 64 81 72
(y)
n B N 6: Number7of scores 36 in a set 49 42
A 8 9
coefficien
B Product 6 Moment C
7 Correlation 4 3 16 9 12
Find coefficient Answer
C 4 3 D 7 6 49 36 42
D 7 6 E 3 5 9 25 15
E 3 5 F 6 6 36 36 36
F 6 6 G 5 5 25 25 25
G 5 5 H 4 5 16 25 20
H 4 5 I 5 4 25 16 20
I 5 4 J 6 5 36 25 30
J 6 5
Prepared
∑ x by KSK∑ y ∑xy =8
∑ x2 = 312 ∑y2 = 312
=54 =55 314
r=
Correlation is Positive and
High
r= =
0.76
Normal Probability Curve

Meaning and importance of Normal Probability Curve

• The normal probability curve is curve that graphically represents a Normal


Distribution.

• In a Normal Distribution, when the scores are arranged in the order of magnitude,
those at the centre will have the maximum frequency.

• The frequencies will gradually go on decreasing towards the right and left of the score
at the centre. Because of this property, the curve representing a normal distribution
will show symmetry on either side of its central axis. Hence it will be in ‘bell-shaped’

Normal Probability Curve

• These special features of the Normal Distribution will be seen in the dispersion of scores
regarding natural phenomena as intelligence, height, weight etc. in a population.
• This characteristic of Normal Distribution is found to be true to a great extent with regard
to achievement scores of a well conducted examination, if the number taking the
examination is sufficiently large.
• Hence properties of Normal Distribution and Normal Distribution curve are of great
importance in the study of group and their characteristics with respect to given variables.
Properties of Normal Probability Curve

• It is bell-shaped. This means that its peak is in the middle.

Prepared by KSK 8
• It is symmetrical. If a perpendicular is drawn from the peak to X-axis, this will divide the
whole area of the curve into two equal parts.
• The majority of scores will show a tendency to cluster around the centre. On either side of
the central axis the frequencies of scores will go on reducing, these being least at the two
ends.
• All the three Measures of Central Tendency, viz Mean, Median, and Mode of a normal curve
coincide, that is, they are all equal.
• The first and third quartiles are equidistant from the median.
• The ordinate at the mean is the highest. The height of other ordinates at various sigma
distances from the mean are also in fixed relationship with the height of the mean ordinate.
• The curve will gradually go on the nearer to the base line, but it will never meat the base
line. For practical purpose, the curve may be taken to end at points -3o- to +3o- distance
from the mean, because this region will cover almost 100% of the cases.
• If the total area enclosed by the normal probability curve is represented by N, the total
number of cases in the group considered, we can find out the area between any two points
with the help of mathematical formulae.
• The most important relationship in the Normal Probability Curve is the area relationship. In
a normal distribution 34.13% cases will be distributed between M and a score at a distance
of 1o- from M. Thus 68.26% cases are included between M+1o-. 99.37% or almost all the
cases are included between M+3o-.

Skewness and Kurtosis


Skewness

• If the distribution is not perfectly normal or symmetrical or the frequencies on either side
not even, then the frequency curve deviates from Normalcy. Such curve are said to be
skewed in nature.
• The lack of symmetry due to extended tails in a particular direction is known as Skewness.
• In a skewed distribution the Mean, Median and Mode will not be the same.
• There are two types of Skewness
• Negative Positive Skewness
Skewness
Negative Skewness

• If the tail extends to the left


(Negative direction of the curve),
the distribution is said to be
Negatively Skewed.

Positive Skewness
Prepared by KSK 8
• If the tail extends to the right (Positive
direction of the curve), the distribution
is said to be Positively Skewed.

• The distance between the Mean and Median will indicate the extent of skewness.

• In a negatively skewed curve the Mean lies to the left of the Median.
• In a positively skewed curve the Mean will lie to the right of the Median.
• The degree of Skewness of a frequency distribution may be calculated using the
formula
• Sk = When Mean, Median and Standard Deviation are given.
• When the percentiles are available the following formula is used to find out the
skewness Sk = (Here P90 is the 90th Percentile and P10 is the 10th
percentile)
• For a Normal curve the skewness is Zero.
Kurtosis
• Kurtosis refers to the Peakedness or Flatness of curve of frequency distribution compared
to Normal curve.
• The curve of A frequency distribution, which is more peaked than the normal curve , is said
to be Leptokurtic
• If the peak is found to be flatter than a normal curve, the curve is said to be Platykurtic.
• The curve of a normal distribution is said to be Mesokurtic.

The formula for determining the extent of Kurtosis is:

Ku = (Q –
Quartile Deviation)
Standard Scores
• Mean is the most representative score
for commending about the
position of other given scores.
• The distance from the mean is usually
expressed in terms of the Standard
deviation of the scores of the distribution
concerned.
• The scores used to indicate the standard deviation away from the mean of a given
distribution is known as standard scores.
• Commonly using standard scores are Z score and T score

Prepared by KSK 8
Z Score

• Z score indicated how many standard deviations away from the mean and in which
direction is a given raw score of a distribution.
• Z = X–
σ
where X - Row score
- Mean
σ - Standards Deviation

Example A Example B
X = 76 X = 67
= 82 = 62
σ=4 σ=5

z = 76 - 82 z = 76 - 62
4 5
= -1.50 = +1.00

• The raw score of 76 in Example A may be expressed as a z score of -1.50, indicating


that 76 is 1.5 standard deviations below the Mean.
• The raw score of 67 in Example B may be expressed as a z score of +1.00, indicating
that 67 is 1 standard deviation above the Mean.

T Score
• T score has been devise to avoid some confusion resulting from negative z score
(below the mean) and also to eliminate decimal values.
• To find out the T score, multiply the z score by 10 and add 50. T = 50 + 10z
• T score are always rounded to nearest whole number.
• For example, In Example A, T = 50 + 10(-1.50) = 50 + (-15.0) = 35
 In Example B, T = 50 + 10(1.00) = 50 + 10 = 60

Prepared by KSK 8

You might also like