You are on page 1of 9

FACULTY OF SCIENCE AND TECHNOLOGY

FAKULTI SAINS DAN TEKNOLOGI

_________________________________________________________________________ SBST 3103 INTRODUCTORY DATA ANALYSIS


PENGENALAN ANALISIS DATA

ASSIGNMENT (30%)
TUGASAN (30%)

JANUARY 2011 SEMESTER


SEMESTER JANUARY 2011

___________________________________________________________________ Name: Wan Norzila bt Wan Mohamed Rashdi Matric number: 651001085712001 NRIC: 651001-08-5712 Telephone number: 0195227613 E-mail address: w_norz5712@yahoo.com Tutors name: Encik Hamdan Thosam Learning Centre: Greenhill Learning Centre January 2011 Semester

Question 1
1

a) Analysis on frequency data is performed to determine these properties, which are: the experiment is a multinomial experiment involving identification of probability and type of distribution. determining dependency and homogeneity between two factors.

The Chi Square statistic compares the tallies or counts of categorical responses between two (or more) independent groups. (note: Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.)

The chi-square test statistic approaches the chi-square distribution with in :


i.

k 1 degrees of freedom in Goodness-of-fit Test


ii. iii.

and (r 1)(k 1) in Test of Independence and Test of Homogeneity.

In each case, the approximation to chi-square distribution is accurate if all expected frequencies are at least 1 and at most 20% of the expected frequencies are less than 5.
1. GOODNESS-OF-FIT TEST (Fitting to a given probability)

A test that used to perform hypothesis test to determine whether a population of interest is suitable or follow a probability distribution of a random variable. In order to understand the principles of this test we need to understand the multinomial experiment concept. Multinomial Experiment (a) This experiment consists of n identical trials.
2

(b) The outcome of each trial falls into one of k categories or cells. (c) (d) The probability that the outcome of a single trial will fall in a particular cell, say, cell i, is pi, where i = 1,2,,k, and remains the same from trial to trial and P1 + P2 + ... + Pk = 1.. The experimenter counts the observed number of outcomes in each category, written as O1 + O2 + ... + Ok where O1 ( j=1, 2 ,...,k) with n = as O1 + O2 + ... + Ok.

(e) When performing a goodness-of-fit test, there are two assumptions are required, which are: (i) (ii) (f) the experiment satisfies the properties of a multinomial experiment. all expected frequencies are at least 1 and at most 20% of the expected frequencies are less than 5

The required alternative hypothesis:

(g) H1: at least one of the multinomial probabilities is unequal. (h) In n trials, the expected number that falls into the j-th category under the null hypothesis is as Ej = npj . 2. TEST OF INDEPENDENCE The main objective for Test of Independence is to;

determine whether two criteria (variables) that are associated with the subject in a population are independent of each other or not. For example, subjects chosen at random from a population may be classified according to their views on certain issues and their political affiliation.

The question of independence of the two methods of classifications (variables) can be investigated using a test of hypothesis based on the chi-square statistic. The steps are: Step 1: State the null and alternative hypothesis H0 : The two methods of classifications / both variables are independent H1 : Both variables are dependent

Step 2: Determine the significance level and rejection region The significance level is usually set at = 0.01 and = 0.05. Check the number of rows (r) and columns (c) in the related table. Calculate the degrees of freedom for the test, v = (r - 1)(c - 1). Next, determine the critical value (based on the Chi3

Square Table), that is

. Reject when the test statistic value X >

Step 3: Calculate the test statistic Calculate the expected frequency values for each cell. Next, calculate the value of the chi-square statistic, that is

Step 4: Determine test Result Results of the test depends on information in Steps 2 and 3. Step 5: State the conclusion

TEST OF HOMOGENEITY
In Test of Homogeneity, we test the hypothesis that the population proportions within each category are same/homogenous. This applies when either the row or column totals are predetermined. Data is given in the form of a two-way contingency table, which is on classification of variable and another one on population classification. It is important to stress that the assumptions and statements under the null and alternative hypothesis are different but the analysis techniques are the same.

The chi-square statistic is used, that is,

Where O is the observed frequency at each cell/category and E is the expected Ri N frequency calculated using formula

Eij = Ci x

where,

Eij expected frequency for ij-cell = N = Cj = total overall observation total jth column where j = 1,, ntotal jth column where j = 1,, nj

Ri b) i)

total ith row where i = 1,, ni

Ho: The employees level of satisfaction towards job condition is independence of the age. H1: The employees level of satisfaction towards job condition depend on the age.

ii)

Test statistic at

X>

iii) Where = 0.05 and v (degree of freedom) can be calculated:

v = ( R-1)( C-1)
= (4-1)(2-1) = 3x1 = 3 Therefore the Critical value f(x) = 7.81473

critical region x

iv) Level of satisfaction Very Satisfied Satisfied Not satisfied Very satisfied

Number of employees by level of satisfaction and age group Less than 35 years Observed Frequency, O 50 90 160 200 Expected Frequency,E (120x500)1000=60 (200x500)1000=100 (290x500)1000=145 (390x500)1000=195 O-E -10 -10 15 5 X2 =10060 X2= 100100 X2 = 225145 X2= 25 195
5

Total 35 years and above Level of satisfaction Very Satisfied Satisfied Not satisfied Very satisfied Total

500

4.346596

Observed Frequency, O 70 110 130 190 500

Expected Frequency,E (120x500)1000=60 (200x500)1000=100 (290x500)1000=145 (390x500)1000=195

O-E 10 10 -15 -5 X2 =10060 X2= 100100 X2 = 225145 X2= 25 195 4.346596

Total X2 for less than 35 years and 35 years and above = 4.346596 + 4.346596 = 8.693192 iv)

critical region
7.81473

8.6932

From the graph it shows that the result falls in the critical region. Which means we reject Ho v) Conclusion This shows enough evidence that the employees level of satisfaction towards job condition at 0.05 significance level is depend on age.

Question 2 a) i) The Pearson (rp) correlation coefficient is used for quantitative data, for both discrete
and continuous forms. It is generated from the Pearson product moment for n pairs of variables (X, Y). The formula is:

Since body weigh of individuals and the frequency of doing exercise can be measured numerically, therefore this method of computing correlation coefficient suits this study. ii) The following table is an explanation on the r values for each of the four conditions on the relationship between two variables.
Table 1: r Values and Relationship between Two Variables

No . 1 a 2 a 3 a 4

r Value r = 1.00 r = +1.00

Relationship between Two Variables There exists a perfect negative linear relationship There exists a perfect positive linear relationship

1.00 < r < There exists a strong 0.50 negative linear relationship +0.50 < r < +1.00 0.50 < r < 0 0<r< +0.50 r=0 There exists a strong positive linear relationship There exists a weak negative linear relationship There exists a weak positive linear relationship No linear relationship exists between the two variables

By following the table above, r value at -0.862 exists a strong negative linear relationship. This means that there exists a strong negative linear relationship between body weight of individuals and frequency of doing exercise. We can say that the body weight of an individuals decreases as they increase their frequency of doing exercise.

b)

Experiment results to compare the lifespan of four different brands of spark plug. (k = 4) five spark plug of each brand were used. ( N= 5x4 = 20) number of miles used until failure as an indicator of lifespan was recorded.
7

Source of variation Treatments Error Total

Sum of squares SS(Tr )=176.5 SS(Er)A= 236.0 -176.5 = 59.5 SST = 236.0

Degree of freedom, df B=k-1 = 4-1 =3 C= N-k =5x4-4 = 16 D = N-1= 20-1 = 19

Mean square error 176.5/3=58.83333 59.5/16=3.71875 23619=12.42

F - value 58.83333/3.71875 = 15.8207

note: k = sample size. N= total number of observation in all samples Mean of Squares Treatments k-1 Mean of squares error = N-k c) i) H0 : 1 = 2 = ... = k (no mean difference for k factor levels, k = number of populations) H1 : not all populations have equal mean. ii) significance level, = 0.05 The critical value, that is Fv1,v2, SS(Tr ) k-1 SS(Er) N-k

v1 = k-1 = 4 1 = 3 v2 = N-k = 20 4 = 16 , F3,16,0.05 = 3.239

iii) = 58.83333/3.71875 = 15.8207

iv)

Since the test statistic value , F = 15.827 > 3. 239, reject Ho.

f(x)
8

critical region
x 15.827 3.239

v)

There is not enough evidence to accept the claim, that the mean lifespan test of four different spark plug are the same.

You might also like