You are on page 1of 8

1

STAT

Introduction: Data means information or a set of given facts. The data is usually collected through census or surveys. Statistics is defined as the collection, presentation, analysis and interpretation of numerical (statistical) data. Variable (or variate) A variable (or variate) which is not capable of assuming all values in a given range is called a discrete variable. A variable which is capable of assuming all the numerical values in a given range is called a continuous variable. Frequency Distribution: Let the data regarding the weights (in kgs) of 20 students of a class be given as 50 55 48 60 54 50 49 48 60 57 54 62 61 49 55 50 48 52 49 54

This is called the raw data. This is also called an individual series. We note that some of the weights (values of the quantitative variable) are repeated. If there are 3 students having weight 50 kg, then we say the frequency of 50 is 3. Therefore, the number of times the value of the item is repeated is called the frequency of that value. The table containing the weights and the corresponding frequencies is given as Weight (in kg) 50 48 54 49 60 61 55 57 62 Tally by bars ||| ||| ||| ||| || | || | | No. of students (frequency) 3 3 3 3 2 1 2 1 1

Tally bars are used to count the number of times the values of the variable has occurred. In the order of magnitude, the frequency distribution is written as follows; Weight (in kg) x 48 49 50 52 54 55 57 60 61 62 Total No. of students f 3 3 3 1 3 2 1 2 1 1 20

2
We denote the total number of students, that is the total frequency by n i.e. n = values of the variables x as x i and different frequencies by f i. The classes are written in two forms. (i) Inclusive form: In this case, the lower limit of a class is not equal to the upper limit of the previous class. For example: 45 49, 50 54, 55 59, 60 64 are in inclusive form. However, in the class 45 49, all items with values greater or equal to 44.5 but less than 49.5 are to be taken. Thus actual limits are 44.5 49.5, 49.5 54.5, 54.5 59.5. 59.5 64.5. (ii) Exclusive form: In this case, the lower limit of a class is equal to the upper limit of the previous class. For example we may have classes of the form 45 50, 50 55, 55 60, 60 65 etc. The value 50 is counted in the class 50 and under 55 and not in 45 and under 50. In both the forms, the length of classes (upper limit RELATIVE AND CUMULATIVE FREQUENCY Relative Frequency: The relative frequency gives useful information about the data, particularly when the class frequencies are large and total frequency is very large. class frequency 100 Relative frequency = %. Total frequency Cumulative Frequency of a value (or class of values) is obtained by adding all the frequencies of all values (or classes of values) less than or equal to that under consideration. Cumulative frequency is an important concept and is useful is determining the measures of location. TYPES OF AVERAGES (a) Mean (i) (iii) (v) (b) Median Arithmetic Mean Geometric Mean Harmonic Mean (ii) (iv) (vi) Weighted arithmetic mean Weighted Geometric Mean Weighted Harmonic Mean lower limit) is same. f. Also we denote different

(c) Mode THE ARITHMETIC MEAN: The arithmetic mean of a statistical data is defined as the ratio of the sum of all the values of the variable and the total number of items. It is denoted by A.M. Calculation of Arithmetic Mean: Let, x1, x2, , xn be a set of n observed values of a statistical data. We denote the arithmetic mean or simply the mean by x . Therefore, for this individual data, the arithmetic mean is defined as
n

1 x1 n

x2

... xn

1 = n

xi xi
i 1

i 1

when the observations x i, i = 1, 2, , n are very large then the arithmetic mean is calculated as follows: di where d= xi A, i = 1, 2, , n. A.M. A n A is the assumed mean.

3
(ii) For a Frequency Distribution (a) Let us consider a frequency distribution. Let x i be the values of the variable and f i be the corresponding frequencies that is, the grouped data is (x i, f i), i = 1, 2, , n. If the values of the variables are given as intervals or classes are taken as x i, then, the arithmetic mean of the frequency distribution is defined as xi fi x1f1 x2 f2 ... xn fn x f1 f2 ... fn fi (b) Short cut Method The mean of this frequency distribution is yi fi xi a fi or x a . y fi fi Hence, x
a di fi fi

where di = xi (c)

a, a is assumed mean.

Step Deviation Method x a In this case, define, di = i h or, xi = a + hdi where h is the length of the class intervals and a is the assumed mean. Then, xif i = (a + hdi) f i = a f i + h dif i. Thus x
a h di fi fi

where a = assumed mean, h = length of class interval f i = frequency of each variable x a di = i . h Weighted Arithmetic Mean If w1, w2, w3, , wn are the weights assigned to the values x 1, x2, x3, , x n respectively, then the weighted average is defined as: Weighted Arithmetic Mean =

w 1x1 w 2 x 2 w n x n . w1 w 2 w n

Geometric Mean If x1, x2, , xn are n values of a variable x, none of them being zero, then the geometric mean G is defined as G = (x1x2x3 . xn)1/n. Geometric mean for frequency distribution: Geometric mean of n values x 1, x2, x3, , xn of a variable x, occurring with frequency f 1, f 2, f 3, , f n respectively is given by
f f2 f G = x11 x 2 ....xnn 1/ n

4
n

fi log xi

or G = antilog

i 1

Harmonic Mean The harmonic mean of n items x 1, x2, x3,, xn is defined as: Harmonic Mean =

1 x1

1 x2

n 1 x3

1 xn

Harmonic Mean of Frequency Distribution: Let x1, x2, x3, , xn be n items which occur with frequencies f 1, f2, f3, , f n respectively. Then their Harmonic Mean is given by: Harmonic Mean =

f1 f1 x1

f2 f2 x2

f 3 fn f3 fn x3 xn

fi fi 1 xi

Relation between Arithmetic Mean, Geometric Mean and Harmonic Mean: The arithmetic mean (A. M.), Geometric mean (G.M.) and Harmonic Mean (H.M.) for a given set of observations of a series are related as under: A. M G.M H.M Median: Median is defined as the middle most or the central value of the variables in a set of observations, when observations are arranged either in ascending or in descending order of their magnitudes. It divides arranged series in two equal parts. Median is a position average, whereas, the arithmetic mean is calculated average. When a series consists of an even number of terms, median is the arithmetic mean of two central items. It is generally denoted by M. Case I: When n is odd. In this case the the the the

n 1 th value is the median i.e. M 2

n 1 th term. 2

Case II: When n is even.

In this case there are two middle terms

n n 1 th . The median is the average of these two th and 2 2

terms, i.e. M

n 2

n 1 2 th term 2

Case III: When the series is continuous.

5
In this case the data is given in the form of a frequency table with class-interval, etc., we prepare the cumulative frequency table and determine the median class i.e. the class in which the observation lies and the following formula is used to calculate the Median:
n 2
th

n M=L+ 2

C f i , where

L = lower limit of the class in which the median lies n = total number of frequencies, i.e., n = f. f = frequency of the class in which the median lies C = cumulative frequency of the class preceding the median class i = width of the class-interval of the class in which the median lies. Find the median of the wage distribution. Mode: Mode is defined as that value in a series which occurs most frequently. In a frequency distribution mode is that variate which has maximum frequency. This measure is used when it is important to know which values occurs most frequently. Continuous Frequency Distribution: i) Modal Class: It is that class in grouped frequency distribution in which the mode lies. Mode = L

fm 2fm

f1 f1 f2

i , where

L = the lower limit of the modal class i = the width of the modal class f 1 = the frequency of the class preceding modal class f m = the frequency of the modal class f 2 = the frequency of the class succeeding modal class.

If above formula fails then Mode = L

f2 f1 f2

i , where L, f 1, f2, i have usual meanings.

Symmetrical Distribution: A distribution in which mean, median and mode coincide is called symmetrical distribution. Relation between Mean, Median and Mode: Symmetrical distribution: A distribution in which same number of frequencies is found to be distributed at the same linear distance on either side of the mode. In this case, mean, median and mode coincide. Thus, Mean = Median = Mode.

A = M = M0

Asymmetrical distribution: In this distribution, variations do not have symmetry. If the distribution is moderately asymmetrical then mean, median and mode are connected by the formula Mode = 3 Median 2 Mean.

6
Measure of Dispersion: Dispersion is defined as scatter or spread of the observed valued of a quantitative variable from a central value. Normally, the following measures of dispersion are used: (a) Range (b) Mean Deviation (c) Standard Deviation (a) Range: It is the simplest form of measuring the variation. The range of a set of values is the difference between the largest and the smallest values in the set. Range gives very limited information. It tells the difference between the extreme values but nothing about the variations between other values (b) Mean Deviation: The mean deviation is defined as the arithmetic mean of the absolute values of the deviations of the observed values from mean or median. Method for Calculation of Mean Deviation Case I : For ungrouped data Let x1, x2, x3, , xn be n observations. Then Mean deviation from mean =
1 n
n

xi
i 1

where x = mean value of given observations. n = total number of observations or items. Mean deviation from median =
1 n
n

xi
i 1

where M = median of the given observations. Case 2: For grouped data


n

Let x1, x2, x3,, xn occur with frequencies f 1, f 2, f 3, , fn respectively and let
i 1

fi

N.

fi xi

Then Mean deviation from mean =

i 1 n

fi
i 1

where x = mean.
n

fi xi

Mean deviation from median =

i 1 n

fi
i 1

where M = median. Standard Deviation: Standard deviation of a given set of observations is defined as the positive square root of the average of squared deviations of all observations taken from their arithmetic mean. It is generally denoted by Greek alphabet or s. Variance The square of the standard deviation is called variance and is denoted by
2

7
Method of Calculating Standard Deviation: (a) For ungrouped data Direct Method: Let us consider n observations x 1, x2, , xn. Let the arithmetic mean of these observations be x . Then standard deviation is given by = =
1 n
1 x1 n
n

x2

...

xn

xi
i 1

1 n

xi

xi n

Short Cut Method: This method is applied to calculate standard deviation, when the mean of the data comes out to be a fraction. In this case, we shift x i by A (assumed mean) i.e. define di = xi A and then find the standard deviation. We have d x A . Hence, 2 1 1 2 = xi x di A d A n n =
1 n di d
2

di2 n n

di

where di = xi A, A = assumed mean, n = total number of observation. (b) For grouped data If a variate x takes values x 1, x2, , xn with respective frequencies f i, f 2, , f n then standard deviation is given by
n

fi xi

x fi

fi xi

i 1 n i 1

where x

i 1 n

.
fi

i 1

If class intervals are given, then mid values of class intervals give the values of variate x. But when the mean has a fractional value, then the following formula is applied to calculate standard deviation
n n 2

fi di

2 i 1 n

fi di fi
i 1

i 1 n

fi
i 1

where di = xi

A, A assumed mean.

Combined Standard Deviation: Let 1 and 2 be the standard deviations of the two groups containing n1 and n2 items respectively. Let x1 and x2 be their respective A.M. Let x and be the A.M. and S.D. of the combined group respectively. Then

8
x n1x1 n2 x2 , n1 n2

n1

2 1

n2

2 2

n1

n1d12 n2

n 2d 22

where d1

x1 x and d2

x2

x.

Coefficient of Variation: For comparing two or more series for variability, we calculate the coefficient of standard deviation and the coefficient of variation. The coefficient of standard deviation is defined as: coefficient of standard deviation = The coefficient of variation is defined as: coefficient of variation =
x

100 .

Coefficient of variation gives us a measure of scattering (dispersion). Scattering is less if the coefficient of variation is small.

You might also like