You are on page 1of 11

MEASURES OF CENTRAL TENDENCY

A single expression representing the whole group is selected which may convey a fairly adequate data
about the whole group.
This single expression is known as an Average.
Averages are central part of distribution therefore they are called measures of central tendency.
Different measures of Central Tendency include:-
1.Arithmetic Mean(A.M)
2. Geometric Mean(G.M)
3. Harmonic Mean(H.M)
4. Mode
5.Median
Out of these 5, median, A.M and G.M have been used to analyze the data.
1. Arithmetic Mean
The arithmetic mean is the "standard" average, often simply called the "mean".


The combined mean can be obtained for two or more series with the help of the following formula.

2 1
2 2 1 1
N N
N X N X
X
+
+
=
Where X1=mean of 1
st
series
X2=mean of 2
nd
series
X12=combined mean of series 1 and series 2
N1=no. of items in series 1
N2=no. of items in series 2



2. Geometric Mean
The geometric mean, in mathematics, is a type of mean or average, which indicates the central
tendency or typical value of a set of numbers. It is similar to the arithmetic mean, except that the
numbers are multiplied and then the nth root (where n is the count of numbers in the set) of the
resulting product is taken.
More generally, if the numbers are , the geometric mean G satisfies




Combined Geometric mean
)
log log
(
2 1
2 2 1 1
12
N N
GM N GM N
AL GM
+
+
=
3. Median
The middle score of a distribution when all the scores have been ranked in ascending/descending
order.
In an ordered array median is the middle number.
If N or n is odd, median is the middle number but when its even median is the mean of the 2 middle
numbers.


Median from Ungrouped Data:
Median = Value of item
If n is even then median is mean of (n/2)
th
and (n+1/2)
th
observation



MEASURES OF DISPERSION
In statistics, statistical dispersion (also called statistical variability or variation) is variability or spread in
a variable or a probability distribution. Common examples of measures of statistical dispersion are
the variance, standard deviation and inter-quartile range.
Dispersion is contrasted with location or central tendency, and together they are the most used properties
of distributions.
A measure of statistical dispersion is a real number that is zero if all the data are identical, and increases as
the data become more diverse. It cannot be less than zero.
Absolute measure of Dispersion-
1.Based on selected items
Range
Inter-Quartile range
2.Based on all items
Mean deviation
Quartile Deviation

Relative measure of dispersion-
1.Based on selected items
Coefficient of range
Coefficient of QD
2.Based on all items
Coefficient of Mean deviation
Coefficient of SD and covariance

1.RANGE
For ungrouped data range is the difference between the highest and lowest observation in a set of data.
RANGE=Highest-Lowest
Coefficient of Range=
L H
L H
+


H=Highest observation
L=lowest observation

2.Quartiles
In descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four
equal groups, each representing a fourth of the population being sampled.
Q1=size of (N+1)/4 item
Q2=size of N/2 item
Q3=size of 3(N+1)/4 item
Quartile Deviation=
2
1 3
Q Q

Inter-Quartile Range=
1 3
Q Q

The inter-quartile range measures the range of middle 50% values only.
COEFFICIENT OF QUARTILE DEVIATION


In statistics, the quartile coefficient of dispersion is a descriptive statistic which measures dispersion and
which is used to make comparisons within and between data sets.
The statistic is easily computed using the first (Q1) and third (Q3) quartiles for each data set.
Coefficient of Quartile Deviation=
1 3
1 3
Q Q
Q Q
+




3.MEAN DEVIATION
It takes into consideration all the values.
It is the arithmetic mean of the sum of all deviations from the measure of central tendency(mean or
median).
M.D=
N
X X


M.D=
N
M X



X=value of each observation
X =Mean
M=Median
N=no. of observations

COEFFICIENT OF MEAN DEVIATION =
Mean
D M.
(if deviations are taken around mean)
Median
D M.
( if deviations are taken around median)


4.STANDARD DEVIATION
Standard deviation is the most commonly used measure of dispersion.
It takes into account all the observations.
N
X X


=
) (
o
Independent of change of origin
Not independent of change of scale.
In a normal distribution there is a fixed relationship among measures of dispersion
QD=


MD =



Thus Standard Deviation is always greater than M.D and Q.D
Combined Standard Deviation =
12
o
2 1
2
2 2
2
1 1
2
2 2
2
1 1
N N
d N d N N N
+
+ + + o o

Where d1 =
12 1
X X and d2=
12 2
X X
5.VARIANCE
n probability theory and statistics, the variance is a measure of how far a set of numbers are spread out
from each other. It is one of several descriptors of a probability distribution, describing how far the numbers
lie from the mean (expected value). In particular, the variance is one of the moments of a distribution. In
that context, it forms part of a systematic approach to distinguishing between probability distributions. While
other such approaches have been developed, those based on moments are advantageous in terms of
mathematical and computational simplicity.
Variance is the arithmetic mean of the squares of all deviations from the arithmetic mean.In other words
Variance=o
2


COEFFICIENT OF VARIATION
It is a relative measure used to measure changes that have taken place over time.
It is used to compare variability of populations that are expressed in different units of measurement.
It is expressed as a percentage.
CV= 100 *
X
o

CORRELATION
Correlation is a measure of the relation between two or more variables.
Correlation coefficients can range from -1.00 to +1.00. The value of -1.00 represents a perfect
negative correlation while a value of +1.00 represents a perfect positive correlation. A value of 0.00
represents a lack of correlation.
The most familiar measure of dependence between two quantities is the Pearson product-moment
correlation coefficient, or "Pearson's correlation." It is obtained by dividing the covariance of the two
variables by the product of their standard deviations. Karl Pearson developed the coefficient from a similar
but slightly different idea by Francis Galton.
[4]

y x
N
Y Y X X
r
o o


=
) ( ) (


COEFFICIENT OF DETERMINATION
In statistics, the coefficient of determination R
2
is used in the context of statistical models whose main
purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of
variability in a data set that is accounted for by the statistical model.
[1]
It provides a measure of how well
future outcomes are likely to be predicted by the model
Relation to unexplained variance
In a general form, R
2
can be seen to be related to the unexplained variance, since the second term
compares the unexplained variance (variance of the model's errors) with the total variance (of the data).
See fraction of variance unexplained.
[edit]As explained variance
In some cases the total sum of squares equals the sum of the two other sums of squares defined above,

See sum of squares for a derivation of this result for one case where the relation holds. When this
relation does hold, the above definition of R
2
is equivalent to



In this form R
2
is given directly in terms of the explained variance: it compares the explained
variance (variance of the model's predictions) with the total variance (of the data).
.
SPEARMANs RANK CORRELATION COEFFICIENT
In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles
Spearman and often denoted by the Greek letter (rho) or as rs, is a non-parametric measure of statistical
dependence between two variables
The Spearman rank correlation coefficient is defined by

(1)
where is the difference in statistical rank of corresponding variables, and is an approximation to the
exact correlation coefficient
The formula to use when there are tied ranks is:
) 1 (
) 1 (
12
1
6
1
2
1
2
2


=
N N
m m d
k
i
i i




where mi=rank assigned to tied observation
k=no. of tied observations
PROBABLE ERROR
In statistics, the probable error of a quantity is a value describing the probability distribution of that
quantity. It defines the half-range of an interval about a cental point for the distribution, such that half of the
values from the distribution will lie within the interval and half outside.
[1]
Thus it is equivalent to half
the interquartile range However the term also has an older meaning that is still currently repeated as the
only meaning,
[2]
but that has been deprecated for some time.
[3]
This specifies the probable error as being
fixed a multiple of thestandard deviation, , where the multiplying factor derives from the normal
distribution. Specifically,
[1][2]

N
r
E P
) 1 (
* 6745 . 0 .
2

=
REGRESSION ANALYSIS
In statistics, regression analysis includes any techniques for modeling and analyzing several variables,
when the focus is on the relationship between a dependent variable and one or more independent
variables. More specifically, regression analysis helps one understand how the typical value of the
dependent variable changes when any one of the independent variables is varied, while the other
independent variables are held fixed. Most commonly, regression analysis estimates the conditional
expectation of the dependent variable given the independent variables that is, the average value of the
dependent variable when the independent variables are held fixed. Less commonly, the focus is on
a quantile, or other location parameter of the conditional distribution of the dependent variable given the
independent variables. In all cases, the estimation target is a function of the independent variables called
the regression function. In regression analysis, it is also of interest to characterize the variation of the
dependent variable around the regression function, which can be described by a probability distribution.
Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap
with the field of machine learning. Regression analysis is also used to understand which among the
independent variables are related to the dependent variable, and to explore the forms of these
relationships. In restricted circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables.


Regression equation Y on X
Y=aX+b
a=Slope b=intercept

Normal equations for Y on X
Regression Equation of Y on X Regression Equation of X on Y
Y=aX+b
The values a and b are obtained using the following
two equations


+ =
i
X b X a XY
2

. nb X a Y + =



X=aY+B
The values a and b are obtained using the following
two equations


+ = Y b Y a XY
2


+ = nb Y a X

) ( ) ( X X b Y Y
yx
=



) ( ) ( Y Y b X X
xy
=

=
2
x
xy
b
yx
x and y are deviations from actual
mean

=
2
y
xy
b
xy
x and y are deviations from actual
mean

x
y
yx
r b
o
o
=
y
x
xy
r b
o
o
=
2
X
yx
Y X
N
XY
b
o

=


2
y
xy
Y X
N
XY
b
o

=





xy yx
b b r *
2
=

STANDARD ERROR OF ESTIMATE
The standard error of estimate measures the dispersion about the regression line.
2
2
1
) (
r
N
X X
S
x
c
xy
=

=

o
2
2
1
) (
r
N
Y Y
S
y
c
yx
=

=

o


SKEWNESS,KURTOSIS and MOMENTS

In probability theory and statistics, skewness is a measure of the asymmetry of the probability
distribution of a real-valued random variable. The skewness value can be positive or negative, or even
undefined. Qualitatively, a negative skew indicates that the tail on the left side of the probability density
function is longer than the right side and the bulk of the values (possibly including the median) lie to the
right of the mean. A positive skew indicates that the tail on the right side is longer than the left side and the
bulk of the values lie to the left of the mean. A zero value indicates that the values are relatively evenly
distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution.
Moments about Mean or Central moments
0
) (
1
=

=

N
X X

2
2
) (
N
X X


=
3
3
) (
N
X X


=
4
4
) (
N
X X


=

SKEWNESS:-
3
2
2
3
1

| = or
1 1
| =
KURTOSIS
2
2
4
2

| = or 3
2 2
= |

You might also like