Professional Documents
Culture Documents
SQQS2013
: SQQS2013 / APPLIED STATISTICS st 1 MAY 2010 2.30 PM 5.00 PM (2 HOUR) : DMS, TE, KYM, IKIP, PMI, NEGERI, KIA, KTB
PLEASE DO NOT OPEN THIS QUESTION BOOKLET UNTIL FURTHER INSTRUCTION IS GIVEN
CONFIDENTIAL
i) A numerical quantity computed from the data of a sample and is used in reaching a decision on whether or not to reject the null hypothesis is referred to as: significance level critical value
test statistic
parameter (1 mark)
ii) In developing an 87.4% confidence interval estimate for a population mean, the value of z to use is 1.15 0.32
1.53
0.16 (1 mark)
iii) Given significance level 4.4%, the critical value for testing that the proportion in population A is different from population B is 2.12 1.82
2.00
1.96 (1 mark)
iv) When the p-value is found to be equal to 0.076, the result at 0.05 significance level is
reject H0
v) In Levenes test of equality of variance, we conclude with the assumption that the variance of the two populations are equal when we reject H0
vi) The manager of a cyber caf claims that the mean daily revenue was $700 with a standard
deviation of $70. A sample of 32 days reveals mean daily revenue of $620. The test we would use is
z-test
t-test. (1 mark)
vii) To determine if the mean test scores of English students, E, is higher than from the mean test scores of American students, A, the alternative hypothesis is H1: E A (1 mark) viii) For a left-tailed test of the difference two means of independent populations, the alternative hypothesis for the Levenes test of equality of variance is: H1: 12 = 22
H1: E < A
H1: E A
H1: E > A
H1: 12 22 2
H1: 12 < 22
H1:12 22
(1 mark) x) If there are two unbiased estimators, the one whose variance is smaller is said to be relatively efficient.
True
false (1 mark)
b) Mary, the owner of two laundry shops (Perfect laundry and Best Laundry) would like to
determine the number of complaint due to any unsatisfaction with her laundry services. Customer satisfaction is the key for the success. Thus, the owner of the laundry shops has set up, if the number of complaints is at most 5 per week, then the services provided by her laundry shops are success. A number of complaints for 45 weeks of her two laundry shops are given in Table 1. Table 1 Perfect Laundry
Week Number of complaint Week Number of complaint Week Number of complaint 1 4 16 0 31 0 2 2 17 3 32 0 3 0 18 2 33 4 4 7 19 6 34 5 5 6 20 5 35 5 6 0 21 1 36 7 7 6 22 0 37 5 8 4 23 1 38 0 9 2 24 0 39 5 10 6 25 4 40 5 11 5 26 1 41 4 12 2 27 6 42 5 13 5 28 0 43 6 14 1 29 4 44 2 15 0 30 6 45 1
Best laundry
Week Number of complaint Week Number of complaint Week Number of complaint 1 8 16 0 31 0 2 2 17 2 32 2 3 1 18 2 33 8 4 4 19 6 34 6 5 6 20 5 35 6 6 7 21 6 36 7 7 6 22 4 37 5 8 0 23 6 38 3 9 7 24 6 39 7 10 2 25 2 40 8 11 1 26 4 41 0 12 3 27 6 42 5 13 5 28 5 43 6 14 2 29 3 44 3 15 0 30 3 45 7
i) Construct a 90% confidence interval for the different in the proportion number of weeks with
status failure between Perfect Laundry and Best Laundry. (5 marks) -1m -1m -1m for 1.6449
-1m 3
ii) Based on your answer in (i), does Mary have any significant evidence to conclude that there is
no different in the proportion number of weeks with status failure between her two laundry shops? Give your reason. (2 marks)
-
0 is not in the interval -1m At 90% confidence level, there is not enough evidence to conclude that there is no different in the proportion number of days with status failure between the two laundries. -1m
iii) For nearly 10 years, Perfect Laundry had a good achievement in their services with at most 10% of failure for every 45 weeks the laundry operates. Can we conclude that the Perfect Laundry achievement remain the same now? Do an appropriate test at 5% significance level. (6 marks)
-1m
-1m
iv) If the significance level change to 1%, is there any changes in your conclusion in (iii)? Give your reason. (2 marks) , Fail to Reject H0 -1m The conclusion in (iii) has changed at 1% level of significance -1m
QUESTION 2 (25 MARKS) a) Choose the correct answer. i) Analysis of variance is used to compare nominal data. compute t test. compare population proportion.
H 0 : 1 = 2 = 3 = 4
H 0 : x1 x 2 x3 x 4
(1 mark) iii) If an ANOVA test is conducted and the null hypothesis is rejected, what does this indicate? Too many degrees of freedom No difference between the population means
Difference between at least one pair of population means None of the above (1 mark)
b) A study was conducted to compare the final scores obtained by students from 5 different
schools in four different subjects. The researchers wanted to show that schools have an effect on the scores. He believed that the subjects have an effect on the scores too. The following data represent the final scores obtained by randomly selected students from 5 different schools in Mathematics, English, Science and Biology Subject Schools 1 2 3 4 Mathematics English Science 68 57 73 83 94 91 72 81 63 55 73 77 5 Biology 61 86 59 66
5 92 68 75 87 i) Based on the data, complete the analysis of variance table. (10 marks) Source of Variation Treatment Block Error Total Sum of Square 1618.7 (2m) 42.15 (2m) 1112.1 (1m) 2772.95 Degree of freedom 4 3 12 19 (1m) and ni - school Mean of Square 404.675 (1m) 14.05 (1m) 92.675 (1m) F 4.3666 (1m)
x x
= 259
n1 = 4
j
= 354
n2 = 4
= 275
n3 = 4
= 271
n4 = 4
= 322
n5 = 4
x x
and n j - subject
= 370
n1 = 5
ij
= 373
n2 = 5
= 379
n3 = 5
x
2 ij
= 359
n4 = 5
= 1481 , n = 20
= 112441
259 2 354 2 275 2 2712 322 2 14812 SSA = + + + + = 1618.7 4 4 4 4 20 4 370 2 373 2 379 2 359 2 14812 SSB = + + + = 42.15 5 5 5 20 5 SSE = 2772 .95 1618 .7 42.15 = 1112 .1 1618.7 42.15 1112.1 MSA = = 404.675 , MSB = = 14.05 , MSE = = 92.675 4 3 12 404.675 F= = 4.3666 92.675
ii) Use a 0.05 level of significance to test the researcher interest. (4 marks)
1m
1m
We conclude that the school have an effect on the scores. -1m c) A study on rental rates in four cities has been done. Based on the OUTPUT 2.1, what is your conclusion on the rental rates between the four cities at = 0.05 ? OUTPUT 2.1
Rental per month for two-bedded apartments
df 3 96 99
F 3.802
Sig. .013
(4 marks)
H 0 : 1 = 2 = 3 = 4
1m
1m
We conclude that the rental rates is differ for different cities. -1m d) A researcher in a manufacturing company has done a research to study the effect of incentives given by the company on the workers productivity. To reduce the error of the experiment, the workers commitment was also considered in the study. The collected data have been analyzed using SPSS and the output is as below. OUTPUT 2.2
Tests of Between-Subjects Effects Dependent Variable: productivity
Type III Sum of Squares df Mean Square F 543.222(a) 5 108.644 115.035 27.556 2 13.772 B D 2 C 24.482 3.776 A .944 547.000 9 A R Squared = .993 (Adjusted R Squared = .984)
B=
C = 24 .482 .944 = 23 .111 D = 23.111 2 = 46.222 QUESTION 3 (25 MARKS) a) Answer the following questions. (4 marks)
i) Give one of measurement scales that can be analyzed using Chi-square test? (1 mark) Nominal or Ordinal 5). If this were not possible, what would be a (1 mark) Combine rows or columns Or increase sample sizes
ii) One guideline to ensure a good approximation to the Chi-square distribution is that expected
b) A social worker believes that the age distribution of regular users of marijuana in a certain
population is as follows: below 21, 30%; 21 30, 60%; 31 40, 8%; and over 40, 2% of the total population. A random sample of 300 drawn from the population yielded the age breakdown shown in Table 3.1. Do these data provide sufficient evidence to support the social workers belief at 5% level of significance? Table 3.1 Age, years Number Below 21 96 21 30 171 31 40 22 Over 40 11 (7 marks) Ho: The age distribution of regular users of marijuana is follows the social workers belief. H1: The age distribution of regular users of marijuana is different than the social workers belief. -1m The observed and expected frequencies are shown in the table below, where E = np. Below 21 Observed Expected 96 90 21 -30 171 180 31 - 40 22 24 Over 40 11 6 Total 300 300
= 7.8147
-1m
Failed to reject H o -1m There is no sufficient evidence at the 0.05 level of significance to show that the age distribution of regular users of marijuana is different than the social workers belief.. -1m We assume that the social workers belief is not true.
for a charity organization stationed in a shopping mall during the Christmas season. The numbers of people contributing during five minutes time intervals were counted. The results are shown in Table 3.2. Table 3.2 Number of contributors Number of intervals 0 15 1 30 2 36 3 33 4 22 5 12 6 6
ii) Test the hypothesis that the number of people contributing during five minutes time intervals that follows a Poisson distribution at 1% significance level. (8 marks)
0 1 2 3 4 5 15 30 36 33 22 12 0.0821 0.2052 0.2565 0.2138 0.1336 0.0668 12.6434 31.6008 39.501 32.9252 20.5744 10.2872 0.4392 0.0811 0.3103 0.0002 0.0988 0.2852
H0 : The number of people contributing during five-minutes time intervals follows a Poisson distribution. H1 : The number of people contributing during five-minutes time intervals do not follows a Poisson distribution. -1m 9
= 1.2487
-2m -1m
We do not have enough evidence that he number of people contributing during five-minutes time intervals do not follows a Poisson distribution We assume the number of people contributing during five-minutes time intervals that follows a Poisson distribution. -1m
c) A study is conducted to see if there is any association between the colour of cars involved in accidents and the time the accidents occur. The result of the analysis is displayed in Output 3.1. Output 3.1
Car_Colour * Time Crosstabulation Time Car_Colour Bright Count Expected Count Count Expected Count Count Expected Count Morning 30 Noon 50 Night 35 120 Total 355
A
80 -
Dark
B
60.8 -
Total
Chi-Square Tests Value 30.179(a) 29.010 1.611 355 Df Asymp. Sig. (2-sided) .000 .000 .204
C
2 1
A = 35.6338 10
B = 40
C = (r 1) (c 1) = (21)(31) = 2
ii) At 2.5% significance level test whether there is an association between the colour of cars involved in accidents and the time the accidents occur. (4 marks)
H0 : There is no association between colour of car involved in accidents and the time the accidents occur H1 : There is an association between colour of car involved in accidents and the time the accidents occur -1m P-value : 0.000 < = 0.025 -1m Reject H0 -1m There is an association between colour of car involved in accidents and the time the accidents occur. -1m QUESTION 4 (25 MARKS) a) A biologist assumes that there is a linear relationship between the amount of fertilizer supplied to tomato plants and the subsequent yield of tomatoes obtained. Eight tomato plants, of the same variety, were selected at random and treated, weekly, with solution in which fertilizer (in grams) was dissolved in a fixed quantity of water. The yield of tomatoes (in kilograms) was recorded. Tomato Plant Fertilizer (in grams) Yield of tomatoes (in kilograms) A 1.0 3.9 B 1.5 4.4 C 2.0 5.8 D 2.5 6.6 E 3.0 7.0 F 3.5 7.1 G 4.0 7.3 H 4.5 7.7
i) Identify the dependent and independent variables. (2 marks) Dependent variable: yield of tomato plant Independent variable: amount of fertilizer -- 1M -- 1M
ii) Find the Pearson correlation coefficient and interpret. (2 marks) r = 0.9444 -- 1 M The correlation coefficient suggests a strong positive linear relationship between yield of tomato plant and amount of fertilizer. -- 1M 11
iii) Test the relationship between yield of tomato plant and amount of fertilizer based on Pearson
(6 marks)
t test = (0.9444)
82 = 7.0356 1 (0.9444) 2
-- 2M -- 1M -- 1M
There is enough evidence to conclude that the positive relationship between y and x is significant. --1M
iv) Fit a least squares line. (3 marks) a = 3.2524 b = 1.0810 y = 3.2524 + 1.0810x v) Interpret the slope. (1 mark) Amount of fertilizer (x) with an increment of 1 grams will increase 1.0810 kilograms yield of tomato plant (y). -- 1M -- 1M -- 1M -- 1M
vi) Estimate the yield of plant treated, weekly, with 3.2 grams of fertilizer. (2 marks)
y = 3.2524 + 1.0810(3.2 )
-- 1M
y = 6.7116
-- 1M
b) A researcher wants to find out the relationship between concentration of cholesterol in blood
serum (y), age (x1), and body mass index (x2). He run a regression analysis using computer and the output obtained is shown in OUTPUT 4.1. OUTPUT 4.1
ANOVA(b)
12
df 2 27 29
Sig. .000(a)
49.703 a Predictors: (Constant), x1, x2 b Dependent V Coefficients(a) Unstandardized Coefficients Model B 1 (Constant) x1 x2 a Dependent Variable: y -.740 .041 .201
(1 mark)
-- 1M
ii) Briefly explain about the significant of estimated regression model and its coefficients. (Use
= 5%).
(4 marks) The ANOVA table showed that the model is significant since the p-value = 0.000 less than = 5%. -- 2M
The coefficient table showed that both x1 and x2 has a significant contribution to the model since the p-value of x1 = 0.006 and p-value of x2 = 0.031 are less than = 5%. 2M
--
iii) How many variables have positive significant effect on the model at = 1%?
(1 mark)
One
-- 1M
13
(1 mark) b2 = 0.201 indicates that, assuming the other variable are constant, a body mass index variability with an increment of 1 unit will increase 0.201 average concentration of cholesterol in blood serum. -- 1M
v) What is the estimated value of concentration of cholesterol in blood serum if a person is 47 years old and his body mass index is 23.1? (2 marks)
-- 1M -- 1M
~ END OF QUESTIONS~
14