Professional Documents
Culture Documents
Learning goals
Understanding bivariate data Understanding the idea of correlation Understanding linear regression
Bivariate Data
Prediction
If a new value of one variable is observed, can we predict the corresponding value of the other variable?
Applied Statistics and Computing Lab
8
Data type I:
X .. ..
Data type II
Data type II: (tabulating relative frequencies; in case if there are multiple observations with same values of X and Y)
X|Y .. ..
Totals
..
..
Totals
E(X), E(Y), V(X) and V(Y) are calculated as per the univariate mean and variance formulae
1
9
Covariance (contd.)
Covariance is independent of change of origin but affected by change of scale
( X a) (Y b) and V = d c cov(U ,V ) = cd [cov( X , Y )] For U =
Covariance of 2 variables is always lesser than or equal to the product of variances of those two variables
cov( X , Y ) Var ( X ).Var (Y )
Covariance (contd.)
cov(Waist circumference, adipose tissue area) = 643.39 Can we compare this with another covariance? For the Body measurement data, consider both the Weight and the Height of all the individuals What is the covariance between Height and Weight for both the genders? = 27.13 Kg. : Cms. and = 40.38 Kg. : Cms. What information do we obtain by comparing these two covariance values?
12
Standardization
If we standardize both the variables, the covariance is independent of the unit of measurement Makes the covariances of both categories comparable It would then lie between [-1,1] The number is closer to 0 => the variables do not covary much The number closer to 1 or -1 => the variables covary highly , = 0.43 , = 0.53 The height and weight are moderately related to each other, for both the genders We will see that this covariance is the same as the measure we study next!
13
Correlation coefficient
Denoted by (called rho) Defined as the measure of the degree of linear association between the two variables X and Y* Indicates the strength of and direction in which the two variables would move, in relation with each other Calculated as the proportion of the covariance between X and Y, to the product of standard deviations of X and Y (, ) = Correlation coefficient is also termed as the Pearson Product-moment Correlation Coefficient , = 0.77 (, = )0.43 (, = )0.53
14 *Aczel A., Sounderpandian J. Complete business statistics
Perfect positive correlation. If one of X or Y increases, the other one must increase as per an exact linear relation. Similarly if one decreases, the other decreases by the same rule. No linear relationship.
Perfect negative correlation. If one of X or Y increases, the other must decrease as per an exact liner relation. Similarly if one decreases, the other increases by the same rule. Strong negative correlation. If one of X or Y increases, the other decreases as per a moderately strong linear relation. Similarly if one decreases, the other increases by the same rule. Strong negative correlation. If one of X or Y increases, the other decreases as per a very strong linear relation. Similarly if one decreases, the other increases by the same rule.
Moderate positive correlation. If one of X or Y increases, the other must increase as per a moderately strong linear relation. Similarly if one decreases, the other decreases by the same rule. Weak positive correlation. If one of X or Y increases, the other must increase as per a weak linear relation. Similarly if one decreases, the other decreases by the same rule.
No linear relationship.
Correlation coefficient =0 Yet, there exists a perfect quadratic relation between X and Y
Applied Statistics and Computing Lab
18
Other measures
Rank correlation To measure the degree of correlation between two ordinal variables or rankings : Company rankings given by two different publications : Ranks of universities published on two websites Consider two groups of women. They are grouped based on whether they use a particular brand of shampoo (say Shampoo A) or not. For each of the groups, responses are collated to indicate which of the five characteristics about their shampoo are most important to them.
Characteristics Characteristic 1 Characteristic 2 Characteristic 3 Characteristic 4 Characteristic 5 Group 1 rankings 1 3 2 5 4 Group 2 rankings 5 3 4 1 2 D=(rank 2 rank 1) 4 0 2 -4 -2
19
This rank correlation is also equal to the Pearson product-moment correlation applied to the ranks organised in an ascending order Lies in the interval [-1,1] Higher the positive correlation coefficient, greater the degree of agreement between two ranks Higher the negative correlation coefficient (closer to -1), greater the degree of disagreement between two ranks A correlation coefficient of 0 indicates that there is absolutely no similarity in the two ranks given to the same object
Applied Statistics and Computing Lab
20
For n objects with ranks , ; for each i=1,2,,n, a pair of observations ( , ) and , is said to be, concordant if the ranks of both elements agree i.e. both ( > ) and > OR both ( < ) and < discordant if ( > ) and ( < ) OR ( < ) and ( > ), the pair is said to be discordant Neither concordant nor discordant if ( = ) or =
Lies in the interval [-1,1] If the agreement between two rankings is perfect, coefficient = 1 If the disagreement between two rankings is perfect, coefficient = -1 If the rankings are independent, the coefficient would be close to 0
21
Linear Regression
Suppose now, the variation in one variable (X) influences the variation in the other variable (Y) Is the adipose tissue area is influenced by waist circumference? Are ice-cream sales affected by the temperature in the city? The variable X i.e. the variable that influences, is also referred to as the predictor variable or the independent variable or the explanatory variable The variable Y i.e. the variable that is being influenced, is also referred to as the outcome variable or the dependent variable or the explanatory variable Can we draw one line such that the equation of that line explains the relation between X and Y? Which line describes the relationship in a reasonable way?
Applied Statistics and Computing Lab
22
23
If we can safely assume linear relationship between X and Y, this model predicts average value by which Y will change for one unit change in X
Applied Statistics and Computing Lab
25
The model is estimated using Method of least squares This method tries to minimize the sum of squared errors There are other methods of estimation
Applied Statistics and Computing Lab
26 Visuals from Aczel A., Sounderpandian J. Complete business statistics
28
29
R-codes
Function Dotplot R-code install.packages(TeachingDemos) library(TeachingDemos) dots(variable name) plot(variable1 name,variable2 name) cov(variable1 name,variable2 name) cor(variable1 name,variable2 name) cor(variable1 name,variable2 name, method=spearman) cor(variable1 name,variable2 name, method=kendall) lm(response variable ~ explanatory variable) abline(response variable ~ explanatory variable)
30
Scatter plot Covariance Correlation Spearmans rank correlation Kendalls tau Linear regression Regression line
Applied Statistics and Computing Lab
Thank you