Professional Documents
Culture Documents
Data Analysis
Short Questions:
Question 1: What is data?
Answer: Data is the substrate for decision-making process. Data is measure of some ad
servable characteristic of characteristic of a set of objects of interest. Statistics is a vast
area of applied mathematics wherein data are collected, classified, presented and analyzed
for a specific purpose.
Question 2: What is role of statistics in business decision?
Answer: Statistics plays an important role in business, because it provides the
quantitative basis for arriving at decisions in all matters connected with operations of
business. Statistics helps in a business to plan production according to the tastes of the
consumers.
Statistics in business can also serve as a tool of management to evaluate performance of
machines and personnel. It also enables the businessman to judge the efficiency of new
production methods by studying relationship between costs and methods of production.
Question 3: Define Frequency Table.
Answer: frequency is the number of occurrences of a data item. A table such as the one
shown above that summarizes number of cases against a column of interest is called a
frequency table.
Question 4: What is Central Tendency?
Answer: In a series of statistical data that parameter which reflects a central value of the
series is called the central tendency. Central tendency refers to a single value that represent
the whole set of data.
Question 5: Define Average and discuss various types of averages.
Answer: An average can be defined as a central value around which other values of series
tend to cluster. An average is computed to give a concise picture of a large group. By the
use of average complex groups of large numbers are presented in a few significant words or
figures. Averages help in obtaining a picture of universe with the help of sample. Although
sample and the universe differ in size, still their average may be very much identical.
Average may be classified into tree board types:
1) Mathematical Averages:
a) Arithmetical mean
b) Geometric mean
c) Harmonic average
Santoshsahni833@gmail.com
2) Positional Averages:
a) Mode
b) Median
3) Commercial Averages:
a) Moving average
b) Progressive average
c) Quadratic average
Question 6: What you understand by term Range in statistics?
Answer: Range: Range of data set is the difference between the largest value and the
smallest value.
For example runs scored by two batsmen A and B, we had some idea of variability in the
scores on the basis of minimum and maximum runs in each series.
To obtain a single number for this, we find the difference of maximum and minimum
Values of each series. This difference is called the Range of the data. In case of batsman A,
Range = 117 0 = 117 and for batsman B, Range = 60 46 = 14. Clearly, Range of A >
Range of B. Therefore, the scores are scattered or dispersed in Case of A while for B these
are close to each other.
Thus, Range of a series = Maximum value Minimum value.
Question 7: Define Mean Deviation.
Answer: Mean deviation also known as average deviation, mean deviation is the mean of
the absolute amounts by which the individual items deviate from the mean. The following
procedure is usually applied:
1) Calculate the absolute deviation from the mean, removing any negative signs.
2) Add all the deviations.
3) Divide the sum of the deviation by the total number of items.
Symbolically, these steps may be summarized as follows:
For a sample size, the mean deviation is defined by
MD =
Where x is the arithmetic mean of variable x.
Question 8: What is Skewness?
Answer:
Skewness: Skewness is a measure of the lack of symmetry or degree of distortion from
symmetry exhibited by a normal distribution.
Negative skew: The left tail is longer; the mass of the distribution is concentrated on the
right of the figure. It has a few relatively low values. The distribution is said to be leftskewed. In such a distribution, the mean is lower than median which in turn is lower than
the mode (i.e.; mean < median < mode); in which case the skewness coefficient is lower
than zero. Example (observations): 1, 1000, 1001, 1002, 1003
Santoshsahni833@gmail.com
Positive skew: The right tail is longer; the mass of the distribution is concentrated on the
left of the figure. It has a few relatively high values. The distribution is said to be rightskewed. In such a distribution, the mean is greater than median which in turn is greater
than the mode (i.e.; mean > median > mode); in which case the skewness coefficient is
greater than zero. Example (observations): 1,2,3,4,100
In a skewed (unbalanced, lopsided) distribution, the mean is farther out in the long tail than
is the median. If there is no skewness or the distribution is symmetric like the bell-shaped
normal curve then the mean = median = mode.
Santoshsahni833@gmail.com
Serial num.
Height of
stu.
1
14.4
2
15.2
3
15.0
4
15.8
5
15.5
Height of student
14.4
15.2
15.0
15.8
15.5
Mean(X) =
X =
= 15.18
Question11: Find the mean of first n natural numbers?
Answer: since X =
Sum of First natural number = xi
xi = 1+2+3+n
=
X =
X =
4
7
7
10
10
15
Santoshsahni833@gmail.com
13
20
16
25
19
30
Xi
fx
4
7
7
10
28
70
10
15
150
13
20
260
16
19
25
30
400
570
xi = 69
fx = 1478
A.M. =
A.M. =
= 13.81
Question13: Find the mode of the given data.
Family size
No. of
family
1-3
7
3-5
8
5-7
2
7-9
2
9-11
1
Answer:
l=3
h=2
f0 = 7
f2 = 2
Mode = l +
=3+[
]*2
=3+
= 3.28
(Answer)
Question14: Find the Mode of the given data.
Age x
5-15
15-25
Santoshsahni833@gmail.com
25-35
35-45
45-55
55-65
No. of
pl. f
11
21
23
14
No. of people f
5-15
15-25
25-35
35-45
45-55
55-65
cf
6
11
21
23
14
5
6
17
38
61
75
80
Where
l = 35
h = 10
f0 = 21
f1 = 23
f2 = 14
Mode = 35 +
*2
= 35 +
= 35.86
Question15: Find the M.D. of the mean for the given data.
6, 7, 10, 12, 13, 4, 8, 12
Answer:
X =
= 72
=9
Xi x = 6 9 7- 9 10 9 12 9 13 9 4 9 8 9 12 9
= -3
|xi x| = 3
-2
-5
-1
M. D. =
Santoshsahni833@gmail.com
=
=
X = 2.75
Question16: Why Study Dispersion?
Answer:
A measure of location, such as the mean or the median, only describes the center of the
data, but it does not tell us anything about the spread of the data.
For example, if your nature guide told you that the river ahead averaged 3 feet in depth,
would you want to wade across on foot without additional information? Probably not.
You would want to know something about the variation in the depth.
A second reason for studying the dispersion in a set of data is to compare the spread in
two or more distributions.
Santoshsahni833@gmail.com
Long Questions:
Question1: Write short notes on followings.
a. Arithmetical Mean
b. Weighted Average
c. Geometric Mean
d. Harmonic Mean
Answer:
Santoshsahni833@gmail.com
G.M. =
d. Harmonic Mean: the harmonic mean is based on the reciprocals of the numbers
averaged. It is defined as the reciprocal of the arithmetical mean of the reciprocal of the
individual observations.
H.M. =
Question2: Write short notes on followings.
a. Mean
b. Median
c. Mode
Answer:
a. Mean: Arithmetic Mean or simple mean (represented by putting a bar above the
variable name) is the quantity obtained by dividing the sum of the values of items
(X) in a variable by their number (n) i.e. number of items.
X
b. Median: Median is the value of that item in the set of data which divides the data in
two equal parts, one part consisting of all the values less and other all value greater than it.
Santoshsahni833@gmail.com
Defined in another way median is that value of the central tendency, which divides the total
frequency into two halves.
When n is odd,
The middle position number =
When n is even,
The middle position number =
+1
c. Mode: A third type of Central value or Centre of the distribution is the value of
greatest frequency or, more precisely, of greatest frequency density. Graphically, it is the
value on the X-axis below the peak, or highest point of the frequency curve. This is called
then mode.
Mode=L1+
where
Santoshsahni833@gmail.com
b. Deciles: in a manner similar to median and quartiles, the data set can be divided into 10
equal parts when arranged either in ascending or descending order. Each point of division is
called a deciles. Thus, there are nine deciles represented as D1,D2,D3.D9.
The interpretation of a deciles is similar to that of median and quartile.
c. Percentiles: The data set can also be divided in to 100 equal parts whence each point of
division called percentile. The 99 number of percentiles are represented by
P1,P2,P3P99.
A general formula for all the positional measures of central tendency for a frequency for a
frequency class distribution is given by:
Ti=LTi+
d. Moving Average: The moving average is an arithmetic average of data over a period and
is updated regularly by replacing the first item in the average by the new item as it comes
in. it is useful eliminating the irregularity of time series and is generally computed to study
the trend.
Example: Suppose the prices of 12 months are given and a tree monthly average is to be
computed. Then the first item in the 3-month moving average would be the average
[(a1+a2+a3)/3], the second item would be the average of the next three
months[(a2+a3+a4)/3] and so on. The last item would be the average[(a10+a11+a12)/3]. As
the next month would come in a10 would be dropped and a13 would be added in
[(a10+a11+a12)/3] and so on.
e. Quadratic Average: the quadratic mean on average is estimated by taking the square root
of the average squares of the items of a series.
Qm =
Where Qm = Quadratic Mean
a2, b2,c2 =square of the different values
Santoshsahni833@gmail.com
Quadratic average is useful when some items have negative values and other positive
values because in such cases the mean is not very representative. It is also used in
averaging deviations, rather than original values, when the standard deviation is computed.
Question4: Write short notes on followings.
a. Standard Deviation
b. Variance
c. Coefficient of Variance
d. Quartile Deviation
Answer:
a. Standard deviation: The standard deviation of a sample(SD) is similar to the mean
deviation in that it considers the deviation of each X value from the mean. However,
instead of using the absolute values of the deviations, it uses the square of the
deviations. These are added, divided by n, and the square root extracted.
The formula for standard deviation SD
SD =
b. Variance: Variance is the square of SD and is represented by:
Variance = V =
c. Coefficient of variance: to get an indication of the variation that is related that is
related to the mean, we divide the standard deviation by the mean to get the coefficient of
variance. This enables us to compare two groups, which have different standard deviations
and means more easily.
Coefficient of variation =
d. Quartile deviation: Half of the interquartile range is called the quartile deviation or
semi-interquartile range. Symbolically,
The value of Q.D. givens the average magnitude by which the two quartiles deviate from
median.
If the distribution is approximately symmetrical, then Md
50 % fo the
observations and, thus, we can write Q1=Md-Q.D. and Q3=Md+Q.D.
Question5: Write a short notes on Measures of Skewness and Kurtosis. And Kurtosis Vs.
skewness
Answer: Definition of skewness: For univariate data Y1, Y2, ..., YN, the formula for
skewness is:
where is the mean, is the standard deviation, and N is the number of data points. The
skewness for a normal distribution is zero, and any symmetric data should have a skewness
Santoshsahni833@gmail.com
near zero. Negative values for the skewness indicate data that are skewed left and positive
values for the skewness indicate data that are skewed right. By skewed left, we mean that
the left tail is long relative to the right tail. Similarly, skewed right means that the right tail
is long relative to the left tail. Some measurements have a lower bound and are skewed
right. For example, in reliability studies, failure times cannot be negative.
Definition of kurtosis: For univariate data Y1, Y2, ..., YN, the formula for kurtosis is:
where
is the mean, is the standard deviation, and N is the number of data points.
The kurtosis for a standard normal distribution is three. For this reason, some sources use
the following defition of kurtosis:
This definition is used so that the standard normal distribution has a kurtosis of zero. In
addition, with the second definition positive kurtosis indicates a "peaked" distribution and
negative kurtosis indicates a "flat" distribution.
Which definition of kurtosis is used is a matter of convention. When using software to
compute the sample kurtosis, you need to be aware of which convention is being followed.
Santoshsahni833@gmail.com
Examples
The following example shows histograms for 10,000
random numbers generated from a normal, a double
exponential, a Cauchy, and a Weibull distribution.
800
820
860
900
920
980
1000
14
19
25
20
10
Santoshsahni833@gmail.com
No. f
7
14
18
25
20
10
5
D=X-A
-100
-80
-40
0
20
10
5
f*D
-700
-1120
-760
0
400
800
500
U=D/20
-5
-4
-2
0
1
4
5
f*u
-35
-56
-38
0
20
40
25
Let A = 900
Method (1)
A.M. = A +
A.M. = 900 +
= 891.2
Method (2)
A.M. = A +
A.M. = 900 +
= 891.2
Question6: Find the Quartiles of given data below?
Length
c
Leaves
f
118126
3
127135
5
136144
9
145153
12
154162
5
163171
4
172180
2
Leaves f
3
5
9
12
5
4
2
For Q2
Q2 = l1 +
l1= 144.5
Santoshsahni833@gmail.com
Length c
117.5-126.5
126.5-135.5
135.5-144.5
144.5-153.5
153.5-162.5
162.5-171.5
171.5-180.5
C*f
3
8
17
29
34
38
40
l2 = 153.5
f = 12
N = 40
C = 17
So, Q2 = 144.5 +
= 144.5 + 2.25
= 146.75
For Q1
Q1 = l1 +
l1= 135.5
l2 = 144.5
f=9
N = 40
C = 17
So, Q1 = 135.5 +
=137.5
For Q3
Q3 = l1 +
l1= 153.5
l2 = 162.5
f=5
N = 40
C = 29
So, Q3 = 153.5+
=155.3
Santoshsahni833@gmail.com
Question7: Find the M.D. about the mean for the given data.
Xi
fi
2
4
5
40
6
60
8
56
10
80
12
60
2
5
6
8
10
12
2
8
10
7
8
5
fi
4
40
60
56
80
60
fi*xi
40
5.5
2.5
1.5
.5
2.5
4.5
|xi x|
300
fi|xi x|
11
20
15
3.5
20
22.5
92
As
M.D
.(m
)=
N=
40
X =
=
= 7.5
M=
M = 2.3
Question8: Find the Median of the given data.
Less
Less
Less
Less
Less
Less
than
than
than
than
than
than
Height in c.m.
140
145
150
155
160
165
Number of student
4
11
29
40
46
51
Santoshsahni833@gmail.com
fc
4
11
29
40
46
51
Sinc
en
Median = l +
f-> frequency of observation class
l-> Lower limit of observation
cf-> frequency commutative of proceeding class
h-> class size
]*5
]*5
Median = 145 +
= 145 +
= 145 +
= 145 + 4.03
= 149.03
Question9: Find the M.D. about the median for the following data.
Xi
fi
3
3
6
4
9
5
12
2
13
4
15
5
21
4
22
3
Fi
Santoshsahni833@gmail.com
Cf
3
7
12
14
18
23
27
30
fi
3
4
5
2
4
5
4
3
|xi M|
10
7
4
1
0
2
8
9
30
28
20
2
0
10
32
27
fi*|xi - M|
fi*|xi - M| = 149
M.D. =
M.D. =
= 4.97
Question10: Find the M.D. about the mean for the following data.
Mark
obt.
No. of
stu.
10-20
20-30
30-40
40-50
50-60
60-70
70-80
14
fi
2
3
8
14
8
3
2
fi = 40
15
25
35
45
55
65
75
0
= 45
fi*|xi x| =400
M.D. =
M.D. =
Santoshsahni833@gmail.com
xi
fi*xi
30
75
280
630
440
195
150
fi*xi =
1800
|xi x|
30
20
10
0
10
20
30
fi*|xi x|
60
60
80
0
80
60
60
fi*|xi x|
=400
X =
=
180
= 10
(Answer)
Question11: Calculate Karl Pearsons coefficient of skewness for the following
distribution.
Monthly Salary (in Rs.)
400 but less than 600
600 but less than 800
800 but less than 1000
1000 but less than 1200
1200 but less than 1400
1400 but less than 1600
Number of salesmen
4
10
19
12
4
1
m.p.
m.
500
700
900
1100
1300
1500
f
4
10
19
12
4
1
N=50
n: X = A +
A = 900, fd = 5, N=50, i=200
X = 900 +
=920
Mode: mode = L +
Mode lies in the class 800- 1000
L = 800,
Mode=800+
=800 + 112.5=912.5
Santoshsahni833@gmail.com
(m900)/200
d
-2
-1
0
+1
+2
+3
fd
-8
-10
0
+2
+8
+3
Fd=5
fd2
16
10
0
12
16
9
fd2=63
Coe
ff.
Of
Sk.
=
Mea
S.D.
*200
*200
=223.61
Coeff. Of sk. =
=+0.034
Question12: The median of the following data is 525. Find the values of x and y, if
the total frequency is 100.
Class interval
Frequency
0-100
100-200
200-300
300-400
400-500
500-600
600-700
700-800
800-900
900-1000
2
5
X
12
17
20
Y
9
7
4
Answer:
Class interval
0-100
100-200
200-300
300-400
400-500
500-600
600-700
700-800
800-900
900-1000
F
2
5
X
12
17
20
Y
9
7
4
Santoshsahni833@gmail.com
Cf
2
7
7+x
19+x
36+x
56+x
56+x+y
65+x+y
72+x+y
76+x+y
, we get
525 =
525-500= (14-x)*5
25=70-5x
5x=70-25=45
So,
X=9
Correlation Analysis
Short Questions:
Question1: What is correlation analysis?
Answer: Correlation is a measure of degree of association between two (or more) variables
in a data set. Thus, if it is known that two variables are highly correlated then one can
predict the value of one variable on the basis of the value of the other variable.
two variables say X and Y are said to be correlated if:
a. Both increase and decrease together. In this case the variables are said to be positive
correlated.
b. One increase then the other decrease, when the variables are said to be negatively
correlated.
Question2: What is scatter diagram?
Answer: The simplest device for determining relationship between two variables is a special
type of dot chart called scatter diagram. When this method is used the given data are
plotted on a graph paper in the form of dots, i.e., for each pair of X and Y value we put a
dot and thus obtain as many points the number of observations. By looking to the scatter of
the various points we can form an idea as to whether the variables are related or not. The
more the plotted points scatter over a chart, the less relationship there is between the two
variables. The more nearly the points come falling on a line, the hither the degree of
relationship. If all the points lie on a straight line falling from the left-hand corner to the
upper right corner, correlation is said to be perfectly positive. On other hand, if all the
Santoshsahni833@gmail.com
points are lying on a straight line rising from the upper left hand corner to the lower righthand corner of the diagram correlation is said to be perfectly negative.
Question3: State Karl Pearson Coefficient of Linear Correlation.
Answer: we observed that the more is the covariance the more will be correlation between
the two variables. Therefore, covariance can be treated as a measure of correlation between
two variables. However, the magnitude of covariance will depend on the units of
measurements. The following expression derived from covariance does not suffer from the
of units of measurements and hence is called Karl Pearson coefficient of linear correlation or
simply coefficient of correlation and is denoted by r.
r=
Hence x= (X-X), y= (Y-Y)
Santoshsahni833@gmail.com
regression. With the help of regression analysis, we are in a position to find out the
average probable change in one variable given a certain amount of change in another.
Question6: State the Spearmans Rank Correlation.
Answer: This measure is especially useful when quantitative measures for certain factors
(such as in the evaluation of leadership ability or the judgment of female beauty) cannot be
fixed, but the individuals in the group can be arranged in order thereby obtaining for each
individual a number indicating his (her) rank in the group. In any event, the rank correlation
coefficient is applied to a set of ordinal rank numbers, with 1 for the individual ranked first
in quantity, or quality, and so on, to n for the individual ranked last in the group of n
individuals (or n pairs of individuals). Spearmans rank correlation coefficient is defined as:
R = 1Where R denotes rank coefficient of correlation and D refers to the difference of ranks
between paired items in two series.
Question7: What is difference between Regression & Correlation?
Answer: Following are the points of difference between correlation and regression:
1. Whereas correlation coefficient is a measure of degree of co variability between X and
Y, the objective of regression analysis is to study the nature of relationship between
the variables so that we may be able to predict the value of one on the basis of
production is called the interdependent variable and the variable that is to be predicted
is referred to as the dependent variable.
2. The cause and effect relation is clear indicated through regression analysis than by
correlation. Correlation is merely a tool of ascertaining the degree of relationship
between two variable and, therefore, we cannot say that one variable is the cause and
the other the effect.
Question8: What is relationship between Regression and Correlation?
Answer: The two coefficients of regression are related to the coefficient of correlation in a
following way.
Bd=r *r
r2
Or, r =
Hence, coefficient of correlation is geometric mean if the two coefficients of regression.
Question9: What is Partial and Multiple Correlation?
Answer: when three or more variables are studied it is a problem of either multiple or
partial correlation. In multiple correlations three or more variables are studied
simultaneously. For example, when we study the relationship between yield of rice per acre
Santoshsahni833@gmail.com
and both the amount of rainfall and the amount of fertilizer used, it is a problem of multiple
correlation.
Long Questions:
Question1:Define follows:
a. Positive and negative correlation
b. Linear and non-linear correlation
Answer:
a. Positive and Negative Correlation: whether correlation is positive (direct) or negative
(inverse) would depend upon the direction of change of the variable. If both the
variables are varying in the same direction, if as one variable is increasing the other on
an average, is also decreasing, correlation said to be positive. If, on the other hand, the
variables are varying in opposite direction, i.e., as one variable is increasing, the other is
decreasing or vice versa, correlation said to be negative.
b. Linear and Non-linear Correlation: the distinction between linear and non-linear
correlation is based upon the constancy of the ratio of change between variables. If the
amount of change in one variable tends to bear a constant ratio to the amount of
change in the other variable then the correlation is said linear.
Correlation called non-linear or curvilinear if the amount of change in one variable does not
bear a constant ratio to the amount of change in the order variable.
Santoshsahni833@gmail.com
Linear correlation
Non-linear correlation
x = (X X), y = (Y Y)
This method is to be applied only when the deviations of items are taken from actual means
and not from assumed means.
The value of the coefficient of correlation as obtained by the above formula shall always lie
between
when r = +1, it means there is perfect positive correlation between the
variables. When r= -1, it means there is perfect negative correlation between the variables.
When r = 0, it means there no relationship between the variables.
Question3: Two judges in a beauty competition rank the 12 entries as
follows:
X:
Y:
1
12
2
9
3
6
Santoshsahni833@gmail.com
4
10
5
3
6
5
7
4
8
7
9
8
10
2
11
11
12
1
What degree of agreement is there between the judgment of the two judges?
Answer: Calculation of Rank Correlation coefficient
1
2
3
4
5
6
7
8
9
10
11
12
X
R1
12
9
6
10
3
5
4
7
8
2
11
1
Y
R2
(R1 R2)
D
-11
-7
-3
-6
+2
+1
+3
+1
+1
+8
0
+11
D2
121
49
9
36
4
1
9
1
1
64
0
121
D2 = 416
R=1
-
D2 = 416, N = 12
R=1
=1= 1 1.454
= -0.454
Question4: Write a short note on regression lines.
Answer: if we take the case of two variables X and Y, we shall have two regression lines
as the regression of X on Y and of Y on X. The regression line of Y on X givens the most
probable values of Y for given value of X and the regression line of X on Y gives the most
probable values of X for given values of Y. thus we have two regression lines. However,
when there is either perfect positive or perfect negative correlation between the two
variables, the two regression lines will coincide, i.e., we will have only one line. The farther
the two regression lines from each other, the lesser is the degree of correlation and nearer
the two regression lines to each other, the higher the degree of correlation. If the varieties
are independent, r is zero and the lines of regression are at right angles, i.e., parallel to OX
and OY.
It should be noted that the regression lines intersect each other at the point of average of
X and Y, i.e., if from the point where both the regression lines intersect each other a
perpendicular is drawn on the X-axis, we will get the mean value of X and if from that point
a horizontal line is drawn on the Y-axis, we will get the mean value of Y.
Regression equation of Y on X
Santoshsahni833@gmail.com
XY= aX+bX2
Regression equation of X on Y
The regression equation of X on Y is expressed as follows:
Xc = a + bY
To determined the value of a and b the following two normal equations are to be solved
simultaneously:
XY= aY+bY2
Question5: From the following data obtain the regression equation of X on Y, and
also than of Y on X.
X
Y
6
9
2
11
10
5
4
8
8
7
X2
(X-6)
X
0
-4
+4
-2
+2
X= 0
0
16
16
4
4
X2 = 40
Regression equation X on Y
X-X = r (Y-Y)
Santoshsahni833@gmail.com
y
9
11
5
8
7
y =40
(y-8)
Y
+1
+3
-3
0
-1
Y= 0
Y2
XY
1
9
9
0
1
Y2 = 20
0
-12
-12
0
-2
XY= -26
X=
X-6 = -1.3(Y-8)
X-6 = -1.3Y + 10.4 or X = 16.4 1.3Y
Regression equation Y on x
Y-Y = r (X-X)
X=
Y-8 = -1.3(X-8)
Y-8 = -1.3X + 10.4 or Y = 16.4 1.3X
Question6: Calculation of Karl Pearsons coefficient of correlation from the
following data:
X
Y
6
10
8
12
12
15
15
15
18
18
20
25
24
22
28
26
31
28
Answer:
X
6
8
12
15
18
20
24
28
31
X=162
(X-18)
x
-12
-10
-6
-3
0
+2
+6
+10
+13
X=0
X2
144
100
36
9
0
4
36
100
169
X2=598
10
12
15
15
18
25
22
26
28
Y=171
r=
Santoshsahni833@gmail.com
(Y-19)
y
-9
-7
-4
-4
-1
+6
+3
+7
+9
Y=0
Y2
xy
81
49
16
16
1
36
9
49
81
Y2=338
+108
+70
+24
+12
0
+12
+18
+70
+117
Xy=431
r=
= +0.959
Question7: What is the utility of the study of correlation?
Answer: The study of correlation is of immense use in practical life because of the following
reasons:
1. Most of the variables show some kind of relationship. For example, there is relationship
between price and supply, income and expenditure, etc. with the help of correlation
analysis we can measure in one figure the degree of relationship exiting between the
variables.
2. Once we know that two variables are closely related, we can estimate the value of one
variable given the value of another.
3. Correlation analysis contributes to the economic behavior, aids in locating the critically
important variables on which others depend, may reveal to the economist the
connection by which disturbances spread and suggest to him the paths through which
stabilizing forces become effective.
In business, correlation analysis enables the executive to estimate costs, sales, prices
and other variables on the basis of some other series with which these costs, sales, or
prices may be functionally related. Some guesswork can be removed from decisions
when the relationship between a variable to be estimated and the one or more other
variables on which it depends are close reasonably invariant.
However, it should be noted that coefficient of correlation is one of the most widely used
and also one of the most widely abused of statistical Measures. It is abused in the sense
that one sometimes overlooks the fact that r measures nothing bit the strength of the
linear relationships and that it does not necessarily imply a cause-effect relationship.
4. Progressive development in the methods of science and philosophy has been
characterized by increase in the knowledge of relationship or correlations. Nature has
been found to be as multiplicity of interrelated forced.
Santoshsahni833@gmail.com