CORRELATION

COVARIANCE When tow or more random variables are defined on a probability space, it is useful to describe how they vary
together, it is useful to measure the relationship between the variables. A common measure of the relationship between two random variables is the covariance. CORRELATION A distribution involving two variables is known as a bivariate distribution. If these two variables vary such that change in one variable affects the change in the other variable, the variables are said to be correlated. Eg: 1. 2. 3. Relationship between the height and weight of a person Price of a commodity and its demand rainfall and production of rice,
The measure of correlation is called as the correlation co0efficient or correlation index. Types of correlation: Important ways of classifying correlation viz, 1. Positive and Negative 2. Simple , partial and multiple 3. Linear and non-linear Positive and Negative correlation If the two variables deviated in the same direction i.e. if the increase in one variable results in a corresponding increase in the other or decrease in one variable result in a corresponding decrease in other, then the correlation is said to be direct or positive. for example height and weight rainfall and production of rice. If the two variables constantly deviate in the opposite direction i.e. if the increase in one variable and decrease in the other , the correlation is said to be inverse or negative correlation. for example volume and pressure and commodity and its demand. Simple, partial and multiple correlation If only two variables are considered for correlation is called a simple correlation. When three or more variables are studied it is a problem of either multiple or partial correlation. In multiple correlation, three or more variable are studied simultaneously. For eg the studied if relation between the yield or rice per hectare and both the amount of rainfall and the usage fertilizers is a multiple correlation.
When three or more variables are involved in correlation analysis the correlation between the dependent variable and only one particular independent variable is called partial correlation. The influence of other independent variable is excluded. Linear and non-linear correlation If the amount of change in one variable tends to bear constant ratio to the amount of change in the other variable. Eg: X Y 1 5 2 10 3 18 4 23
A correlation is said to be non-linear or curvi linear if the amount of change in one variable does not bear a constant ration to the amount of change in the other variable. for eg if rainfall is doubled, the production of rice would not necessarily doubled. Methods of studying correlation: i) ii) iii) iv) v) vi) Scatter diagram method Graphic method Karl peasrsons co-efficient of correlation Rank method Concurrent deviation method Method of least squares.
Karl Pearsons co-efficient of correlation
When r = 1 is a perfect correlation if r = 0 is un correlated .
Rank Correlation Let us suppose that a group of n individuals is arranged in order of merit of two characteristics A and B. These ranks in the two characteristics will in general be different. The rank correlation co-efficient is given by = 1 - 6 d2 n (n 2 1)
Repeated Rank if there is more than one item with the same value in the series . In this case common ranks are given to the repeated ranks. this rank is the average of the ranks which these iotems wopuld have assumed if they are slightly different from each ther and the next item will get the rank next to the ranks already assumed. In the correlation formula we add the correction factor CF = m ( m 2 1 ) 12 2 to d where m is the number of items an item is repeated. This correction factor is to be added for each repeated value.
REGRESSION Regression is a mathematical measure of the average relation between two or more variables in terms of the original limits of the data. Line of regression X on Y
Y on X
Distinguish between correlation and regression Analysis. Correlation Relationship btn two variables Regression Is a mathematical measure of expressing the average relationship between the two variables Need not emply cause and effect Clearly indicates the cause and effect relationship between variables. Is symmetric i.e. bxy = byx Is not symmetric i.e. bxy byx Is a measure of the direction and degree of We can predicate the dependent variable linear relationship btn 2 variables. value for any given independent variable value
F test Let X1, ..., Xn and Y1, ..., Ym be. samples from two populations which each have a normal distribution. The expected values for the two populations can be different, and the hypothesis to be tested is that the variances are equal. Let
be the sample means. Let
be the sample variances. Then the test statistic
has an F-distribution with n 1 and m 1 degrees of freedom if the null hypothesis of equality of variances is true. Otherwise it has a non-central F-distribution. The null hypothesis is rejected if F is either too large or too small.
Null Hypothesis Alternative Hypothesis Eg.
H0 H1
In one sample of 8 observation the sum of the squares of deviations of the
sample values from the sample mean was 84.4 and in the other sample of 10 observations it was 102.6. test whether this difference is significant at 5% level.
Multiple-comparison ANOVA problems The F-test in one-way analysis of variance is used to assess whether the expected values of a quantitative variable within several pre-defined groups differ from each other. For example, suppose that a medical trial compares four treatments. The ANOVA F-test can be used to assess whether any of the treatments is on average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same mean response. This is an example of an "omnibus" test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pair wise tests among the treatments (for instance, in the medical trial example with four treatments we could carry out six tests among pairs of treatments). The advantage of the ANOVA Ftest is that we do not need to pre-specify which treatments are to be compared, and we do not need to adjust for making multiple comparisons. The disadvantage of the ANOVA Ftest is that if we reject the null hypothesis, we do not know which treatments can be said to be significantly different from the others if the F-test is performed at level we cannot state that the treatment pair with the greatest mean difference is significantly different at level . The formula for the one-way ANOVA F-test statistic is
or
ONE WAY CLASSIFICATION
Source of variation (S.V) Between classes Within classes Total
Sum of squares Degree of freedom (S.S) (d.f) Q1 Q2 Q h-1 N-h N-1
Mean square (M.S) Q1 / h-1 Q2 / N -h
Variance ratio (F) (1) / (2) (2) / (1)
Eg: A completely randomized design experiment with 10 plots and 3 treatments gave the following results Plot No.:1 Treat Yield :A :5 2 B 4 3 C 3 4 A 7 5 C 5 6 C 1 7 A 3 8 B 4 9 A 1 10 B 7
solution
Treat A B C 5 4 3
Yield from plots 7 4 5 3 7 1 Total 1
Ti
16 15 9 40 Q2 = 40 6 = 34
T i2
256 225 81 562
ni
4 3 3 10
Ti2/ni
64 75 27 166
Q = 200 160 =40
Q1= 166- 160= 6
TWO WAY CLASSIFICATION
Source of variation (S.V) Between rows Between columns Residual Total
Sum of squares Degree of freedom (S.S) (d.f) Q1 Q2 Q3 Q h-1 k-1 (h-1)(k-1) hk-1
Mean square (M.S) Q1 / h-1 Q2 / k-1 Q3/(h-1)(k-1)
Variance ratio (F) (1) /(3) (2) /(3)
Eg: Four doctors each test four treatments for a certain disease and observe the number of day each patient takes to recover. The results are as follows (recovery time in days) Doctor A B C D 1 10 11 9 8 2 14 15 15 13 3 19 17 16 17 4 20 21 19 20
Discuss the difference between (1) doctors (2) treatments
Normal distribution The normal distribution is a data distribution that can be used to describe many types of measurements in engineering. Basically, a normal distribution is a bell shaped curve. superimposed over a histogram of PCC compressive strength data. Such a distribution is very convenient to use because it is characterized by the mean ( ) and standard deviation ( ). The theoretical normal distribution extends out infinitely in both directions and never quite reaches the horizontal axis and has a total area under the curve of 1.00 (i.e., 100 percent of the data values are represented by the distribution). Since it extends indefinitely in either direction (minus infinity to plus infinity), it encompasses all of the results that can occur. The area under the curve within these two limits must therefore be equal to unity (i.e., 1.000 or 100 percent). For practical purposes, however, most of the data values (99.73 percent) occur within 3 of the mean. A far more important result than those mentioned above is also related to the fact that the area under the curve is equal to 100 percent. Because of this, it can be stated that the probability of finding a data value between any two values of x is equal to the area under the normal distribution between those values.
The Normal Distribution Equation The height of a normal distribution (y) can be defined by its corresponding value of x by the following equation:
Mean Variance SD
= = 2 =
Properties of Normal distribution 1) 2) 3) 4) 5) Bell shaped Mean= median=mode Symmetrical about x axis Only one ode Point of inflection is x =
Let Z be a standard normal variable calculate the following P ( 0 Z 1.2) P (-1.3 Z 0) P ( -1.5 Z 2) P ( 0.5 Z 1.8) X is normally distributed and the mean of X is 12 and SD is 4 . find the probability of the following 1) X 20 2) X 20 3) 0 X 12 The weekly wages of 1000 workmen are normally distributed around a mean of Rs 70 with a SD Rs 5 . estimate the number of workers whose weekly wages will be i) between Rs. 69 and Rs 72 ii) less than Rs 69 iii) More than Rs 72

CORRELATION

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CORRELATION

Uploaded by

Copyright:

Available Formats

COVARIANCE When tow or more random variables are defined on a probability space, it is useful to describe how they vary

Karl Pearsons co-efficient of correlation

When r = 1 is a perfect correlation if r = 0 is un correlated .

be the sample means. Let

be the sample variances. Then the test statistic

Null Hypothesis Alternative Hypothesis Eg.

In one sample of 8 observation the sum of the squares of deviations of the

ONE WAY CLASSIFICATION

Source of variation (S.V) Between classes Within classes Total

Sum of squares Degree of freedom (S.S) (d.f) Q1 Q2 Q h-1 N-h N-1

Mean square (M.S) Q1 / h-1 Q2 / N -h

Variance ratio (F) (1) / (2) (2) / (1)

Yield from plots 7 4 5 3 7 1 Total 1

Q = 200 160 =40

Q1= 166- 160= 6

TWO WAY CLASSIFICATION

Source of variation (S.V) Between rows Between columns Residual Total

Mean square (M.S) Q1 / h-1 Q2 / k-1 Q3/(h-1)(k-1)

Variance ratio (F) (1) /(3) (2) /(3)

Discuss the difference between (1) doctors (2) treatments

You might also like