You are on page 1of 10

Quiz 2 : Section II (A)

Name: Roll no:


Q1. A sample of 10 pairs of data gave the following regression equations. 3X +2Y =26 and 6X +y =31.
a)Find the correlation between the two variables.
b) Test for the significance of the correlation coefficient at 5% level of significance.
Solution:
First find out which equation represents Y on X and which represents X on Y
Hence 2y = 26 3x means y = 13 3/2x in this case b(yx) = -3/2
If 6x = 31-y then x = 31/6 - 1/6 y in this case b(xy) = -1/6
This is possible because -3/2 x -1/6 = = 0.25 = R square or R = 0.5
You can try the other way and you will find that it is not possible as R square will be greater than
1.which is impossible.
Hence R = 0.5. Now test this : Null hypothesis: = 0 ; alt hypothesis: 0
T cal = 0.5 / s.e Standard error =sqroot (1- r square ) / n-2 ( 8) = 0.3061
T cal = 1.632 t table value = 2.306 . hence accept null hypothesis.
Quiz 2 : Section II (B)
Name: Roll no:
Q1. A survey was conducted to study the linear relationship between expenditure on
accommodation(x) and expenditure on entertainment (y) . The results were as follows.
Expenditure on mean Standard deviation
Accommodation 173 66
Entertainment 47.8 22
Correlation coefficient = 0.57
a) Find the regression equation to estimate the expenditure on entertainment based on
expenditure of accommodation.
b) If the sample size considered for the above is 15, test the correlation coefficient at 5% level
of significance.
Solution :
I) b(yx ) = r x (y) / (x) = 0.57 x 22/66 = 0.19
Now (y-ybar ) = b(yx) x (x-xbar)
y-47.8 = 0.19 ( x-173) or y= 14.93 -0.19 x
II) Correlation = 0.57 given. Hence null hypothesis : =0 alt; 0
Hence t cal = 2.502 t table value = 2.160 hence reject null hypothesis.

Quiz 2 : Section II (C)
Name: Roll no:
Q1. Consider the following regression equations: y +5x -2 =0 and y+0.2x +4 =0 . Identify which
equation represents the regression of Y or X and which one represents X on Y.
Solution: First identify which of the two equations belong to b(yonx) and b(x on y)
Consider the first one as b(y on x) you will get y = 2 -5x or b(yx) = -5 .
the second equatiaon x = 20-4y b(xy) = -4 now multiplying the two regression coefficients will give -5
x -4 = 20 which cannot be the R square. Hence the equations are the other way around.
Hence the first equation is x = 2/5 1/5 y hence b(xy) = -1/5
The second equation is y= -4 -0.2x or b(yx) = -0.2
This is possible as the R 2 will be less than or equal to 1.

Q2. Two judges gave the following rankings for the 10 different participants in a music competition.
Find the correlation between the two judges.
Judge 1 3 1 4 2 6 7 8 9 5 10
Judge 2 2 1 5 3 7 8 6 9 4 10
Difference (1-2) 1 0 -1 -1 -1 -1 2 0 1 0
Difference ^2 1 0 1 1 1 1 4 0 1 0
This is rank correlation:
Sum of square differences (d^2) = 10
Hence r = 1- {6 x 10 / 10(100-1) } = 0.9393



Quiz 2 : Section II (D)
Name: Roll no:
Q1. In parts (a) to (d) find the missing entries.
SSR SSE TSS R
2

a 30 20 50 0.6
b 8 32 40 0.2
c 25 0 25 1.0
d 30 0 30 1.0

Q2. List the different assumptions while deriving the OLS estimates.
For OLS estimates: the assumptions required are:
a) Error term is a random variable
b) Expected value of (Error or a given xi) is =0
c) Variance of the error for each xi is constant
d) Correlation between ei and ej is zeo ( no autocorrelation)
e) X is fixed and deterministic
f) Model is considered as linear.
g) The number ofsample points is atleast greater than the number of independent variables.
h) Cov ( xi ej) = 0
The assumption of the Error is normally distributed for each xi is not required for estimation of OLS
estimates but required for testing the hypothesis.

Quiz 2 : Section II (E)
Name: Roll no:
Q1. An automotive engineer, interested in the relationship between miles per gallon (y) and
horsepower(x) sampled 30 cars and obtained the following information.
x = 2444 ; x
2
= 204232 . Assume correlation = - 0.777, y
est
= 56.90 -0.31x and SSE = 315.11
a) Is there sufficient evidence to indicate a linear relationship between (x) and (y) values forthe
population. Use 5% level of significance.
b) Find 95% prediction interval for miles per gallon for a car with 90 horsepower.
Solution: a) test the correlation coefficient because correlation is only for linear association.
Null hypothesis: = 0 alt 0 ,
T cal = 0.777 / sq root (1-0.777
2
/28) = 6.53
T table value = 2.048 hence reject null hypotheis.
For the second part: since the correlation is significant hence the model will also be
significant due to single independent variable.
Hence we can safely assume that the model y (est) = 56.9 -0.31x is a significant model
Hence at 90 HP the yest = 29
Now Standard error for y(est) = S
e
(1/30 + (90-xbar)
2
/ (x
i
xbar)
2
)
S
e
= square root of ( 315.11 /28) = 3.35
The value under the root sign is 0.0475
Hence S.E. ( yest) = 3.35 x 0.0475 = 1.495
Hence the interval estimate for E ( Yi X = 90 hp ) is 29 1.495



Quiz 2 - Section I (A)
Name: Roll no:
Q1. The following is the data on the inflation rate and prime lending rate.
Inflation rate 3.3 6.2 11.0 9.1 5.8 6.5 7.6
Prime lending rate 5.2 8.0 10.8 7.9 6.8 6.9 9.0
a) Find the linear relationship to determine the prime lending rate for an inflation rate of 8.2%.
b) Test the correlation coefficient to determine if it is different from zero.
Solution:
a) Put it inthe standard formula and get y ( prime lending rate) = 3.174 +0.654x ( inflation rate)
For x = 8.2% the value of y = 8.536
c) Null hypothesis: =0 alt: 0
Find the correlation from the formula which is 0.908 .
T cal = 4.85 t table value = 2.306. hence reject null hypothesis.



Quiz 2 - Section I (B)
Name: Roll no:
Q1.
Model Summary
b

Model R R Square
Adjusted R
Square
Std. Error of
the Estimate Durbin-Watson
1 .424
a
.180 .142 ** 2.493
Predictors : debt to capital ratio %, dependent variable : return on capital %
ANOVA
b

Model Sum of Squares df Mean Square F F critical
1 Regression ** ** ** ** **
Residual ** ** **

Total 2332.580 23

Coefficients
a

Model
Unstandardized Coefficients
Standardized
Coefficients
T Sig. B Std. Error Beta
1 (Constant) 29.358 3.044

9.646 .000
debt to capital ratio % -.262 .119 -.424 ** **
a. Dependent Variable: return on capital %



a) Verify as many assumptions you can for OLS model from the data provided.
b) Fill the missing values marked (**), Test the model and regression coefficient at 5%
level of significance.
Solution: in this problem R square is given as 0.180 = SSR / SST . Since SST is given as 2332.58
calculate SSR. = 419.86 . now if SSR is known then SSR + SSE = SST hence SSE = 1917.72
The degree of freedom for SSR = 1 , SST = 23 given hence SSE df = 22 . The MSR and MSE can be
calculated. MSR = 419.86 MSE = 86.94 and hence F cal = 4.82. F table value = 4.30
Standard error of estimate is MSE = 9.32
The t cal value = -0.262 / 0.119 = -2.20 t table value = - 2.074 hence reject null hypothesis.
For the assumptions:
If we see the residual plot we note that
a) The number of points above and below the Zero standard value are approximately equal
giving the indication that the E ( error for given X) = 0 is valid.
b) There are two outliers in the residual one above zero and one below zero which will also
cancel to make the expected value of the Error = 0.
c) There is no fanning effect noticed to indicate the presence of heteroscasdasticity.
d) The residual error term also does not show any non linear tendency hence the model is
linear assumption can be accepted.
e) For single independent variable and sample size of 30 the upper and lower limit for the DW
statistic is 1.24 and 1.47. which makes 4-lower limit = 2.76 and 4 upper limit = 2.57. The
value of DW is given as 2.493 which is in between 2 and 2.57 hence there is no
autocorrelation.
f) The assumption of normality seems Ok since the number of residuals at the extreme ends
are very few compared to near zero value of the residuals.


Quiz 2 - Section I (C)
Name: Roll no:
Q1. A data for average cholesterol intake (x) and death rate (Y) from heart disease was recorded for
different countries. The data as well as some of the spss output is given below.
Cholesterol intake 59 31 30 19 15 12 8
Death rate 72 65 30 21 9 9 12

Model Summary
b

Model R R Square
Adjusted R
Square
Std. Error of the
Estimate Durbin-Watson
1 .893
a
.797 .757 ** 2.642

ANOVA
b

Model Sum of Squares df Mean Square F F critical.
1 Regression ** ** ** ** **
Residual ** ** **

Total 4266.857 **


Q1. A) Fill up the missing values shown in (**) and comment on the significance of the
model developed.
B) Check as many assumptions as possible based on the data given and comment on the
use of the model to estimate death rate.
Solution: Since R
2
= 0.787 = SSR /SST = SSR / 4266.857 ; SSR = 3400.68
Hence SSR + SSE = SST ; hence SSE = 866.17
The degree of freedom is 1 for SSR and SST it is 6 since the total sample size is 7.
Hence df for SSE = 5 . hence MSR = 3400.68 and MSE = 173.23
F cal = 19.63 and F critical = 6.61.
c) Assumptions:
i) For the scatter plot between x and y it can be seen that the linear assumption is OK.
(ii) Error term is random cannot said so since there is a repeated random term is
negative and only two terms are positive.
iii) Expected value of Error =0 is also not correct since most of the error terms are
negative and only two are positive.
iv) variance constant cannot be concluded since the number of sample points are not
sufficient to come to any conclusion.
v) As the sample size reduces the values of upper and lower limit is decresing thus the
are of no autocorrelation above 2 increases with smaller sample size. Thus even upto
4-1.3 = 2.7 we could safely conclude that there would be no autocorrelation hence
2.642 is just about OK but there could be some autocorrelation.

Quiz -2 - Section I (D)
Name: Roll no:
The following is the summary data.
x = 49.5 ; y = 54.6 ; x
2
= 386.79 ; y
2
= 444.94 ; xy = 410.14;
Y
est
= 3.175 + 0.654 x ; number of pairs of data = 7.

a) Is there sufficient evidence to indicate a positive linear relationship between the variables (x)
and (y) . Use 5% level of significance.
b) Develop the ANOVA table for the model and check its significance at 5% level.
Solution: calculate the correlation coefficient = 0.908.
Hence test the same.
a) Null hypothesis: =0 alt: 0
Find the correlation from the formula which is 0.908 .
T cal = 4.85 t table value = 2.306. hence reject null hypothesis.
Develop anova table as follows;
Calculate SST = (Y-Ybar)
2
= {(y
2
) [ (y)
2
/n]} = 444.94 (49.5 x 49.5 / 7) = 19.06
Now R
2
= SSR /SST hence SSR = 15.71 ; SSR +SSE = SST or SSE = 3.345
Sample size is 7 hence df for total = 6 . SSR df = 1 hence df for SSE = 5 . MSR = 15.71 MSE = 0.669
F cal = 23.48 . F table value is much smaller. Hence reject null hypothesis that the model is
insignificant. Hence model is significant.

Quiz 2 - Section I (E)
Name: Roll no:
In a study of the cancer incidence(y) to the exposure of contaminants (x) the following data was
obtained.
x = 41.56 ; y = 1416.1 ; x
2
= 289.422 ; y
2
= 232498.97 ; xy = 7439.37;
Y
est
= 114.72 + 9.23 x ; number of pairs of data = 9.
a) Find the ANOVA table for the above model and test its significance.
b) Find the interval estimate for estimating (y) at x=4.37
Solution:
a) Find R2 from the data given.
Calculate SST = (Y-Ybar)
2
= {(y
2
) * (y)
2
/n]} = 232498.97 (41.56 x 41.56 / 9) = 9714.97
Now R
2
= SSR /SST hence SSR = 8341.15 ; SSR +SSE = SST or SSE = 1373.81
Df for total = 8 , df for SSR = 1 df SSE = 7 ; MSR = 8341.15 . MSE = 196.25
F cal = 42.50 hence reject model insignificance.
b) For interval estimate we need Standard error for the y est. As well as yest.
Yest at x = 4.37 we can directly put in the equation and is equal to 155.05
Standard error of the estimate is MSE = 14.008
SE for y est = S
e
(1/9 + (4.37-xbar)
2
/ (x
i
xbar)
2
)
S
e
= square root of ( 196.25 /7) = 14.008

Hence S.E. ( yest) = Hence S
e
(1/9 + (4.37-xbar)
2
/ (x
i
xbar)
2
) = 4.679
Hence the interval estimate for E ( Yi X = 4.37 ) is 155.05 (2.365 x4.679)
155.05 11.065

You might also like