You are on page 1of 14

Week 7 Homework Problems

Section 10.1: Homework 2


Table #10.1.6 contains the value of the house and the amount of rental income in
a year that the house brings in ("Capital and rental," 2013). Create a scatter plot
and find a regression equation between house value and rental income. Then use
the regression equation to find the rental income a house worth $230,000 and for
a house worth $400,000. Which rental income that you calculated do you think is
closer to the true rental income? Why?

State random variables


x = value of the house
y = the amount of rental income in a year

Rental vs Value
350000
R 300000
e 250000
n 200000
t 150000
a 100000
l 50000
0
0 10 20 30 40 50 60
Value

Scatter Plot.

Now let's find out a regression equation between house value and rental income
Using excel we find:
̅ = 𝟏𝟕𝟒𝟑𝟕𝟓 , 𝒀
𝑿 ̅ = 𝟗𝟔𝟏𝟏. 𝟑𝟑
𝑺𝑺𝒙 = 𝟒𝟕𝟐𝟕𝟖𝟐𝟖𝟏𝟐𝟓
𝑺𝑺𝒚 = 𝟒𝟕𝟗𝟔𝟖𝟐𝟎. 𝟖𝟗
𝑺𝑺𝒙𝒚 = 𝟏𝟏𝟓𝟏𝟔𝟏𝟓𝟖𝟑. 𝟑

𝐒𝐒𝐱𝐲 𝟏𝟏𝟓𝟏𝟔𝟏𝟓𝟖𝟑.𝟑
Slope: 𝒃 = = = 𝟎. 𝟎𝟐𝟒𝟑𝟓𝟖𝟐𝟒
𝐒𝐱 𝟒𝟕𝟐𝟕𝟖𝟐𝟖𝟏𝟐𝟓

̅ − 𝒃𝒙
Y-intercept: 𝒂 = 𝒚 ̅ = 𝟗𝟔𝟏𝟏. 𝟑𝟑 − 𝟎. 𝟎𝟐𝟒𝟑𝟓𝟖𝟐𝟒(𝟏𝟕𝟒𝟑𝟕𝟓) = 𝟓𝟑𝟔𝟑. 𝟖𝟔𝟓
Regression equation: 𝒚 = 𝟓𝟑𝟔𝟑. 𝟖𝟔𝟓 + 𝟎. 𝟎𝟐𝟒𝟑𝟓𝟖𝒙

with the equation of regression we want find the rental income a house of $230.000:
𝒚 = 𝟓𝟑𝟔𝟑. 𝟖𝟔𝟓 + 𝟎. 𝟎𝟐𝟒𝟑𝟓𝟖(𝟐𝟑𝟎. 𝟎𝟎𝟎)
𝒚 = $ 𝟓𝟑𝟔𝟗. 𝟒𝟔𝟕𝟑𝟒 Rental

Now we do the same for $ 400,000:


𝒚 = 𝟓𝟑𝟔𝟑. 𝟖𝟔𝟓 + 𝟎. 𝟎𝟐𝟒𝟑𝟓𝟖(𝟒𝟎𝟎. 𝟎𝟎𝟎)
𝒚 = $ 𝟓𝟑𝟕𝟑. 𝟔𝟎𝟖𝟐 Rental

Although both rental incomes are very close to each other, the best approach is:
𝒚 = $ 𝟓𝟑𝟕𝟑. 𝟔𝟎𝟖𝟐 Rental

If we take into account that the average gave us $ 9611.33 which means that the rental
values are around that data, for this reason the previous rental value was chosen since it is
the one that is soon the closest.

Section 10.1: Homework 4


The World Bank collected data on the percentage of GDP that a country spends
on health expenditures ("Health expenditure," 2013) and also the percentage of
women receiving prenatal care ("Pregnant woman receiving," 2013). The data for
the countries where this information are available for the year 2011 is in table
#10.1.8. Create a scatter plot of the data and find a regression equation between
percentage spent on health expenditure and the percentage of women receiving
prenatal care. Then use the regression equation to find the percent of women
receiving prenatal care for a country that spends 5.0% of GDP on health
expenditure and for a country that spends 12.0% of GDP. Which prenatal care
percentage that you calculated do you think is closer to the true percentage?
Why?
State random variables
x = Health Expenditure
y = Prenatal care

Health Expenditure vs Prenatal Care


12

10
Health Expenditure

0
0 2 4 6 8 10 12 14 16 18
Prenatal Care

Scatter Plot.
Now let's find out a regression equation between house value and rental income
Using excel we find:

̅ = 𝟔. 𝟏𝟐𝟔𝟕% , 𝒚
𝒙 ̅ = 𝟕𝟗. 𝟗𝟏𝟑𝟑%
𝑺𝑺𝒙 = 𝟑. 𝟕𝟖𝟐𝟎
𝑺𝑺𝒚 = 𝟑𝟓𝟒. 𝟓𝟔𝟏𝟐
𝑺𝑺𝒙𝒚 = 𝟔. 𝟐𝟖𝟎𝟑

𝐒𝐒𝐱𝐲 𝟔.𝟐𝟖𝟎𝟑
Slope: 𝒃 = = 𝟑.𝟕𝟖𝟐𝟎 = 𝟏. 𝟔𝟔𝟎𝟔
𝐒𝐱

̅ − 𝒃𝒙
Y-intercept: 𝒂 = 𝒚 ̅ = 𝟕𝟗. 𝟗𝟏𝟑𝟑 − 𝟏. 𝟔𝟔𝟎𝟔(𝟔. 𝟏𝟐𝟔𝟕) = 𝟔𝟗. 𝟕𝟑𝟗𝟑
Regression equation: 𝒚 = 𝟔𝟗. 𝟕𝟑𝟗𝟑 + 𝟏. 𝟔𝟔𝟎𝟔𝒙

From the regression equation we will find the percentage of women who receive prenatal
care for a city that spends 5%
𝒚 = 𝟔𝟗. 𝟕𝟑𝟗𝟑 + 𝟏. 𝟔𝟔𝟎𝟔(𝟓. 𝟎)
𝒚 = 𝟕𝟖. 𝟎𝟒𝟏𝟖%

Now we do the same for 12.0%


𝒚 = 𝟔𝟗. 𝟕𝟑𝟗𝟑 + 𝟏. 𝟔𝟔𝟎𝟔(𝟏𝟐. 𝟎)
𝒚 = 𝟖𝟗. 𝟔𝟔𝟓𝟑 %

The percentage data that best approximates the real one is 78.0418%, since taking into
account that the average in prenatal care gave us 79.9133% it can be concluded that it is
the best approximation for this exercise.
Section 10.2: Homework 2
Table #10.1.6 contains the value of the house and the amount of rental income in
a year that the house brings in ("Capital and rental," 2013). Find the correlation
Coefficient and coefficient of determination and then interpret both.
Now we will find the correlation coefficient as follows:

𝐒𝐒𝐱𝐲 𝟏𝟏𝟓𝟏𝟔𝟏𝟓𝟖𝟑. 𝟑
𝐫= = = 𝟎. 𝟕𝟔𝟓
√𝐒𝐒𝐱 𝐒𝐒𝐲 √(𝟒𝟕𝟐𝟕𝟖𝟑)(𝟒𝟕𝟗𝟔𝟖𝟐𝟎. 𝟖𝟗)

The value obtained from the correlation coefficient tells me that the linear association of
the data between the two variables is positive and strong between them and it is concluded
that there is a linear trend between the two study variables

Coefficient of determination:

𝒓𝟐 = (𝟎. 𝟕𝟔𝟓)𝟐 = 𝟎. 𝟓𝟖𝟓𝟐

This means that it is a model whose estimates fit the real variable quite well, that is the
model explains the real variable by 58.5%.

Section 10.2: Homework 4


The World Bank collected data on the percentage of GDP that a country spends
on health expenditures ("Health expenditure," 2013) and also the percentage of
women receiving prenatal care ("Pregnant woman receiving," 2013). The data for
the countries where this information is available for the year 2011 are in table
#10.1.8. Find the correlation coefficient and coefficient of determination and then
interpret both.

Now we will find the correlation coefficient as follows:

𝐒𝐒𝐱𝐲 𝟔. 𝟐𝟖𝟎𝟑
𝐫= = = 𝟎. 𝟏𝟕𝟏𝟓
√𝐒𝐒𝐱 𝐒𝐒𝐲 √(𝟑. 𝟕𝟖𝟐𝟎)(𝟑𝟓𝟒. 𝟓𝟔𝟏𝟐)

The value obtained from 0.1715 corresponding to the correlation coefficient, indicates that
there is a weak correlation, that is, there is no linear trend between the two study variables.

Coefficient of determination:

𝒓𝟐 = (𝟎. 𝟏𝟕𝟏𝟓)𝟐 = 𝟎.0294


Which means that 20.94% of the variability in health expenditure explained by the linear
model. The other 79.06% is explained by other variables

Section 10.3 : Homework 2


Table #10.1.6 contains the value of the house and the amount of rental income in
a year that the house brings in ("Capital and rental," 2013).

a.) Test at the 5% level for a positive correlation between house value and rental
amount.

We indicate the random variables.


x = value of the house
y = the amount of rental income in a year

We indicate the null and alternative hypothesis and the level of significance
𝑯𝟎 : 𝝆 = 𝟎
𝑯𝑨 : 𝝆 > 𝟎
𝜶 = 𝟎. 𝟎𝟓

Now you will find the value of the test statistic and p-value:
𝒓
𝒕=
𝟐
√𝟏 − 𝒓
𝒏−𝟐

Previously we had found the value of r and 𝒓𝟐 and replacing we have the following:

𝟎. 𝟕𝟔𝟓
𝒕= = 𝟖. 𝟎𝟓𝟔𝟎
√ 𝟏 − 𝟎. 𝟓𝟖𝟓𝟐
𝟒𝟖 − 𝟐
Now we introduce the value of t and df in the calculator TI-89 to obtain the value of the p-
value in the following way:
𝒕𝒄𝒅𝒇(𝟖. 𝟎𝟓𝟔𝟎, 𝟏𝐄𝟗𝟗, 𝟒𝟔)
𝝆 = 𝟏. 𝟐𝟐𝟐𝑬−𝟏𝟎
As you can see the value of p <0.05, for this reason we can affirm that there is a positive
correlation between the two study variables

b.) Find the standard error of the estimate.

𝑺𝑺𝒚 − 𝒃 ∗ 𝑺𝑺𝒙𝒚 𝟒𝟕𝟗𝟔𝟖𝟐𝟎. 𝟖𝟗 − (𝟎. 𝟎𝟐𝟒𝟑𝟔)(𝟏𝟏𝟓𝟏𝟔𝟏𝟓𝟖𝟑. 𝟑)


𝑆𝑒 = √ =√
𝒏−𝟐 𝟒𝟖 − 𝟐
𝑺𝒆 = 𝟐𝟎𝟖. 𝟎𝟕𝟎𝟏
c.) Compute a 95% prediction interval for the rental income on a house worth
$230,000.
Given the fixed value xo, the prediction interval for an individual y is
̂−𝑬<𝒚<𝒚
𝒚 ̂+𝑬
𝑺𝑺𝒙 = 𝟒𝟕𝟐𝟕𝟖𝟐𝟖𝟏𝟐𝟓
𝒙𝟎 = 𝟐𝟑𝟎. 𝟎𝟎𝟎
𝑺𝒆 = 𝟐𝟎𝟖. 𝟎𝟕𝟎𝟏
̅ = 𝟏𝟕𝟒𝟑𝟕𝟓
𝒙
̂ = 𝟓𝟑𝟔𝟗. 𝟒𝟔𝟕𝟑𝟒
𝒚
n = 48
Then for a 95% confidence interval we have a 𝒕𝒄 = 𝟐. 𝟑𝟔𝟓
Now:
̅) 𝟐
𝟏 (𝒙𝟎 − 𝒙
𝑬 = 𝒕𝒄 𝑺𝒆 √𝟏 + +
𝒏 𝑺𝑺𝒙

Now we proceed to replace the previously obtained values:

𝟏 (𝟐𝟑𝟎. 𝟎𝟎𝟎 − 𝟏𝟕𝟒𝟑𝟕𝟓)𝟐


𝑬 = (𝟐. 𝟑𝟔𝟓)(𝟐𝟎𝟖. 𝟎𝟕𝟎𝟏)√𝟏 + +
𝟒𝟖 𝟒𝟕𝟐𝟕𝟖𝟐

E = 124630.6384
𝟓𝟑𝟔𝟗. 𝟒𝟔𝟕𝟑𝟒 − 𝟏𝟐𝟒𝟔𝟑𝟎. 𝟔𝟑𝟖𝟒 < 𝒚 < 𝟓𝟑𝟔𝟗. 𝟒𝟔𝟕𝟑𝟒 + 𝟏𝟐𝟒𝟔𝟑𝟎. 𝟔𝟑𝟖𝟒
𝟏𝟏𝟗𝟐𝟔𝟏. 𝟏𝟕𝟏𝟏 < 𝒚 < 𝟏𝟑𝟎𝟎𝟎𝟎. 𝟏𝟎𝟓𝟕

Statistical interpretation:
There is a 95% chance that the interval 119261.1711 < y < 130000.1057 contains the true
value for the rental income on a house worth $230,000.

Section 10.3 : Homework 4


The World Bank collected data on the percentage of GDP that a country spends
on health expenditures ("Health expenditure," 2013) and also the percentage of
women receiving prenatal care ("Pregnant woman receiving," 2013). The data for
the countries where this information is available for the year 2011 are in table
#10.1.8.

a.) Test at the 5% level for a correlation between percentage spent on health
expenditure and the percentage of women receiving prenatal care.

We indicate the random variables.


x = independent variable
y = dependent variable
We indicate the null and alternative hypothesis and the level of significance
𝑯𝟎 : 𝝆 = 𝟎
𝑯𝑨 : 𝝆 > 𝟎
𝜶 = 𝟎. 𝟎𝟓

Now you will find the value of the test statistic and p-value:
𝒓
𝒕=
𝟐
√𝟏 − 𝒓
𝒏−𝟐

Previously we had found the value of r and 𝒓𝟐 and replacing we have the following:

𝟎. 𝟏𝟕𝟏𝟓
𝒕= = 𝟎. 𝟔𝟐𝟕𝟔
√ 𝟏 − 𝟎. 𝟎𝟐𝟗𝟒
𝟏𝟓 − 𝟐
Now we introduce the value of t and df in the calculator TI-89 to obtain the value of the p-
value in the following way:
𝒕𝒄𝒅𝒇(𝟎. 𝟔𝟐𝟕𝟗, 𝟏𝐄𝟗𝟗, 𝟏𝟑)
𝝆 = 𝟎. 𝟐𝟕𝟎𝟓

There is a positive correlation between health expenditures and the percentage of women receiving
prenatal care

b.) Find the standard error of the estimate.


𝑺𝑺𝒚 − 𝒃 ∗ 𝑺𝑺𝒙𝒚 𝟑𝟓𝟒. 𝟓𝟔𝟏𝟐 − (𝟏. 𝟔𝟔𝟎𝟔)(𝟔. 𝟐𝟖𝟎𝟑)
𝑆𝑒 = √ =√
𝒏−𝟐 𝟏𝟓 − 𝟐
𝑺𝒆 = 𝟓. 𝟏𝟒𝟓𝟏

c.) Compute a 95% prediction interval for the percentage of woman receiving
prenatal care for a country that spends 5.0 % of GDP on health expenditure.

Given the fixed value xo, the prediction interval for an individual y is
̂−𝑬<𝒚<𝒚
𝒚 ̂+𝑬
𝑺𝑺𝒙 = 𝟑. 𝟕𝟖𝟐𝟎
𝒙𝟎 = 𝟓. 𝟎 %
𝑺𝒆 = 𝟓. 𝟏𝟒𝟓𝟏
̅ = 𝟔. 𝟏𝟐𝟔𝟕
𝒙
̂ = 𝟕𝟖. 𝟎𝟒𝟏𝟖%
𝒚
n = 15
Then for a 95% confidence interval we have a 𝒕𝒄 = 𝟐. 𝟑𝟔𝟓
Now:
̅) 𝟐
𝟏 (𝒙𝟎 − 𝒙
𝑬 = 𝒕𝒄 𝑺𝒆 √𝟏 + +
𝒏 𝑺𝑺𝒙

Now we proceed to replace the previously obtained values:

𝟏 (𝟓. 𝟎 − 𝟔. 𝟏𝟐𝟔𝟕)𝟐
𝑬 = (𝟐. 𝟑𝟔𝟓)(𝟓. 𝟏𝟒𝟓𝟏)√𝟏 + +
𝟏𝟓 𝟑. 𝟕𝟖𝟐𝟎

E = 14.4095
𝟕𝟖. 𝟎𝟒𝟏𝟖 − 𝟏𝟒. 𝟒𝟎𝟗𝟓 < 𝒚 < 𝟕𝟖. 𝟎𝟒𝟏𝟖 + 𝟏𝟒. 𝟒𝟎𝟗𝟓
𝟔𝟑. 𝟔𝟑𝟐𝟑 < 𝒚 < 𝟗𝟐. 𝟒𝟓𝟏𝟑
Statistically it means that the percentage of women who receive antenatal care is between
𝟔𝟑. 𝟔𝟑𝟐𝟑 < y < 𝟗𝟐. 𝟒𝟓𝟏𝟑 for the country that spends 5%.

Section 11.1 : Homework 2


Researchers watched groups of dolphins off the coast of Ireland in 1998 to
determine what activities the dolphins partake in at certain times of the day
("Activities of dolphin," 2013). The numbers in table #11.1.6 represent the
number of groups of dolphins that were partaking in an activity at certain times of
days. Is there enough evidence to show that the activity and the time period are
independent for dolphins? Test at the 1% level.

State the null and alternative hypotheses and the level of significance
𝑯𝟎 : 𝝆 = 𝟎 (𝐩𝐞𝐫𝐢𝐨𝐝 𝐚𝐧𝐝 𝐚𝐜𝐭𝐢𝐯𝐢𝐭𝐲 𝐚𝐫𝐞 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭)
𝑯𝑨 : 𝝆 > 𝟎 (𝐩𝐞𝐫𝐢𝐨𝐝 𝐚𝐧𝐝 𝐚𝐜𝐭𝐢𝐯𝐢𝐭𝐲 𝐚𝐫𝐞 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭)
𝜶 = 𝟎. 𝟎𝟏

Now you will find the value of the test statistic and p-value:
Test statistic:
First find the expected frequencies for each cell.
Using excel we find the value of the chi-square:
𝑿𝟐 = 𝟔𝟖. 𝟒𝟔𝟒𝟔
Now we proceed to find the degrees of freedom for this exercise:
𝒅𝒇 = (𝟑 − 𝟏)(𝟒 − 𝟏) = 𝟔
With the previously obtained data we will calculate the p-value:
𝑿𝟐 𝒄𝒅𝒇(𝟔𝟖. 𝟒𝟔𝟒𝟔, 𝟏𝑬𝟗𝟗, 𝟔)
𝝆 = 𝟖. 𝟒𝟑𝟗𝟐𝑬−𝟏𝟑 ≈ 𝟎
Conclusion
Reject Ho, since the value of p is less than 0.01, therefore there is enough evidence to show
that the activity and time period are independent for the dolphins.

Section 11.1: Homework 4


A person’s educational attainment and age group was collected by the U.S.
Census Bureau in 1984 to see if age group and educational attainment are related.
The counts in thousands are in table #11.1.8 ("Education by age," 2013). Do the
data show that educational attainment and age are independent? Test at the 5%
level.
State the null and alternative hypotheses and the level of significance

𝑯𝟎 : 𝝆 = 𝟎 (𝐀𝐠𝐞 𝐠𝐫𝐨𝐮𝐩 𝐚𝐧𝐝 𝐄𝐝𝐮𝐜𝐚𝐭𝐢𝐨𝐧 𝐚𝐫𝐞 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭)


𝑯𝑨 : 𝝆 > 𝟎 (Age group and Education are dependent)
𝜶 = 𝟎. 𝟎𝟓

Now you will find the value of the test statistic and p-value:
Test statistic:
First find the expected frequencies for each cell.

Using excel we find the value of the chi-square:


𝑿𝟐 = 𝟐𝟐𝟑𝟕𝟑. 𝟓𝟔𝟓𝟔
Now we proceed to find the degrees of freedom for this exercise:
𝒅𝒇 = (𝟒 − 𝟏)(𝟓 − 𝟏) = 𝟏𝟐
With the previously obtained data we will calculate the p-value:
𝑿𝟐 𝒄𝒅𝒇(𝟐𝟐𝟑𝟕𝟑. 𝟓𝟔𝟓𝟔, 𝟏𝑬𝟗𝟗, 𝟏𝟐)
𝝆=𝟎
Conclusion
Reject Ho, since the value of p is less than 0.05, there is enough evidence to demonstrate
that educational attainment in relation to age are independent of each other
Section 11.2: Homework 4
In Africa in 2011, the number of deaths of a female from cardiovascular disease
for different age groups are in table #11.2.6 ("Global health observatory," 2013).
In addition, the proportion of deaths of females from all causes for the same age
groups are also in table #11.2.6. Do the data show that the death from
cardiovascular disease are in the same proportion as all deaths for the different
age groups? Test at the 5% level.

Using excel we find the value of test statistic is


𝑿𝟐 = 𝟏. 𝟎𝟑𝟒𝟎
Now we proceed to find the degrees of freedom for this exercise:
𝒅𝒇 = (𝟐 − 𝟏)(𝟒 − 𝟏) = 𝟑
With the previously obtained data we will calculate the p-value:
𝑿𝟐 𝒄𝒅𝒇(𝟏. 𝟎𝟑𝟒𝟎, 𝟏𝑬𝟗𝟗, 𝟑)
𝝆 = 𝟎. 𝟕𝟗𝟑𝟎𝟐𝟔
Conclusion
We cannot reject ho, for this reason the data indicate that there is not enough evidence to
show that the data for cardiovascular disease are in the same proportion as all deaths for
different age groups
Section 11.2: Homework 6
A project conducted by the Australian Federal Office of Road Safety asked people
many questions about their cars. One question was the reason that a person
chooses a given car, and that data is in table #11.2.8 ("Car preferences," 2013).

Do the data show that the frequencies observed substantiate the claim that the
reasons for choosing a car are equally likely? Test at the 5% level.

We indicate the null and alternative hypothesis and the level of significance

𝑯𝟎 : 𝝆 = 𝟎
𝑯𝑨 : 𝝆 > 𝟎
𝜶 = 𝟎. 𝟎𝟓

Then:
𝟏
𝑷(𝟏) = 𝑷(𝟐) = 𝑷(𝟑) = 𝑷(𝟒) = 𝑷(𝟓) = 𝑷(𝟔) =
𝟔

Now you can find the expected frequency for each side of the die. Since all the
probabilities are the same, then each expected frequency is the same.
𝟏
𝐄𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲 = 𝐄 = 𝐧 ∗ 𝐏 = 𝟑𝟎𝟎 ∗ = 𝟓𝟎
𝟔

Using excel we find the value of test statistic is


𝑿𝟐 = 𝟒𝟐. 𝟐
Now we proceed to find the degrees of freedom for this exercise:
𝒅𝒇 = 𝒌 − 𝟏 = 𝟔 − 𝟏 = 𝟓
With the previously obtained data we will calculate the p-value:
𝑿𝟐 𝒄𝒅𝒇(𝟒𝟐. 𝟐, 𝟏𝑬𝟗𝟗, 𝟓)
𝛒 = 𝟓. 𝟑𝟔𝟔𝟐𝟏𝑬−𝟖 ≈ 𝟎
Conclusion
Reject Ho since the p-value is less than 0.05
Interpretation:
This means that if there is a positive correlation, therefore the data of the frequencies
observed justify that the reason for choosing a car is equally likely in all cases

You might also like