You are on page 1of 10

FINAL PROJECT

Stat 500
Abstract
Using a given data set, conclusions were drawn and tests done in response to questions provided. This report is completed by Arman Tabatabaei.

Robert Berus, David Kleppang, Arman Tabatabaei

1. In cleaning up the data there needed to be a consistent measuring unit, as per every participants, such as inches. Therefore I converted the ones that entered the measures seemingly in another unit such as: Inches to Centimeter or Feet to Inches wherever appropriate. Descriptive Statistics: Height
Variable Height N 44 N* 0 N 42 Mean 64.21 N* 2 SE Mean 2.05 StDev 13.57 Minimum 5.30 Q1 63.00 Median 67.00 Q3 69.00 Maximum 77.00

Descriptive Statistics: Ideal-Hight


Variable Ideal-Hight Variable Ideal-Hight Mean 66.96 SE Mean 2.20 StDev 14.28 Minimum 5.50 Q1 66.75 Median 70.00 Q3 72.00

Maximum 80.00

Descriptive Statistics: Weight


Variable Weight Variable Weight N 44 N* 0 Mean 143.95 SE Mean 4.34 StDev 28.79 Minimum 101.00 Q1 118.50 Median 140.00 Q3 165.00

Maximum 218.00

Descriptive Statistics: Ideal-weight


Variable Ideal-weight Variable Ideal-weight N 43 N* 1 Mean 136.35 SE Mean 4.27 StDev 27.98 Minimum 99.00 Q1 110.00 Median 135.00 Q3 155.00

Maximum 216.00

Descriptive Statistics: Pairs of jeans


Variable Pairs of jeans Variable Pairs of jeans N 44 N* 0 Mean 6.159 SE Mean 0.905 StDev 6.000 Minimum 0.000 Q1 4.000 Median 5.000 Q3 7.000

Maximum 40.000

Descriptive Statistics: Left wrist


Variable Left wrist Variable Left wrist N 44 N* 0 Mean 15.284 SE Mean 0.374 StDev 2.480 Minimum 5.500 Q1 14.500 Median 15.500 Q3 16.875

Maximum 19.000

Descriptive Statistics: Right wrist


Variable Right wrist Variable Right wrist N 44 N* 0 Mean 15.475 SE Mean 0.373 StDev 2.472 Minimum 5.500 Q1 15.000 Median 15.850 Q3 16.925

Maximum 18.500

Descriptive Statistics: Left arm


Variable Left arm N 44 N* 0 Mean 66.86 SE Mean 1.58 StDev 10.46 Minimum 24.00 Q1 64.13 Median 67.75 Q3 72.88 Maximum 83.00

Descriptive Statistics: Right arm


Variable Right arm Variable Right arm N 44 N* 0 Mean 66.15 SE Mean 1.84 StDev 12.22 Minimum 24.00 Q1 66.08 Median 67.75 Q3 71.00

Maximum 83.00

Descriptive Statistics: Head 1


Variable Head 1 N 44 N* 0 Mean 54.54 SE Mean 1.35 StDev 8.94 Minimum 22.00 Q1 55.00 Median 57.00 Q3 58.00 Maximum 62.00

Descriptive Statistics: Head 2


Variable Head 2 N 43 N* 1 Mean 55.56 SE Mean 1.18 StDev 7.73 Minimum 22.00 Q1 55.50 Median 57.50 Q3 58.00 Maximum 62.50

Descriptive Statistics: GPA


Variable GPA N 33 N* 11 Mean 5.31 SE Mean 1.61 StDev 9.26 Minimum 3.00 Q1 3.50 Median 3.70 Q3 3.82 Maximum 56.80

Descriptive Statistics: Hours st>>


Variable Hours st>> Variable Hours st>> N 43 N* 1 Mean 5.756 SE Mean 0.587 StDev 3.850 Minimum 0.000 Q1 3.000 Median 5.000 Q3 7.000

Maximum 16.000

Descriptive Statistics: Hours TV


Variable Hours TV Variable Hours TV N 42 N* 2 Mean 6.750 SE Mean 0.730 StDev 4.728 Minimum 0.000 Q1 4.000 Median 6.500 Q3 10.000

Maximum 24.000

After running a Boxplot Multi Y simple for all column the outliers where indicated by asterisk on the boxplots.

Outliers: Height: rows 5 (from 53 to 64 in), and 29 (from 53 to 64 in) Ideal height: rows 5 (from 58 to 70 in) and 16 (it is ok because 6 8 could be an ideal height wont delete) and row 29 from 55 to 66 in Pairs of jeans: rows 21, 25 and 2 (it is acceptable for someone to have 40, 12 & 14 pairs of jeans wont delete) Left Wrist: rows 33 (from 5.5in to 14cm) and 5 (from 6in to 15.24cm) Right wrist: rows 33 (from 5.5in to 14cm) and 5 (from 6in to 15.24cm) Left arm: rows 33 (from 26.5in to 67.3cm) and 5 (from 24in to 61cm) Right arm: rows 18 (from 24in to 61cm), 33(from 26.5in to 67.3cm) and 5(from 24in to 61cm). Rows 15 and 23 are marked as outliers but wont be deleted because the numbers are reasonable) Head 1: rows 18(from 24in to 61cm), 33(from 22in to 55.9cm) and 5(from 22in to 55.9cm) Head 2: rows 5(from 22in to 55.9cm) and 33(from 22in to 55.9cm). Row 24 is marked as outlier but wont be deleted because the value is acceptable)

GPA: rows 18 and 27 (row 18 with value of 6.0 doesnt seem to be correct and row 27 with 56.8!!row 18 and 27 will be deleted and replaced by an asterisk) and row 25 with GPA of 3.0 is an outlier but wont be deleted since its an acceptable value. Hours studied: rows 4, 22, 34 and 37 (however they wont be deleted since the values are acceptable) Hours TV: row 36 (wont be deleted since its an acceptable value) After trimming and adjustments this is the new Boxplot:

2. We are going to test two means of independent samples. For this problem we chose to test the means for hours spent to watch TV: between Males and Females. (TESTING if females watched TV more than males ). Descriptive Statistics: Hr TV females, Hr TV males
Variable Hr TV females Hr TV males Variable Hr TV females Hr TV males N 24 17 N* 0 0 Mean 6.98 6.294 SE Mean 1.12 0.882 StDev 5.50 3.636 Minimum 0.00 0.000 Q1 3.25 4.500 Median 5.00 7.000 Q3 10.00 10.000

Maximum 24.00 10.000

N1 and N2 are less than 30, so we need to check for normality assumption.

The Std Dev are quite different where for females is 5.5 and for males is 3.636, and the ratio is almost 0.66, The two samples are approximately normally distributed but because of the std dev difference we will use a non-pooled t test.

The hypothesis: Assumption that Females 1 watch more TV than males 2

Ho: 1- 2=0 Ha: 1 - 2 >0 One sided right tailed test lpha= 0.025 Two-Sample T-Test and CI: Hr TV females, Hr TV males
Two-sample T for Hr TV females vs Hr TV males Hr TV females Hr TV males N 24 17 Mean 6.98 6.29 StDev 5.50 3.64 SE Mean 1.1 0.88

Difference = mu (Hr TV females) - mu (Hr TV males) Estimate for difference: 0.69 97.5% lower bound for difference: -2.20 T-Test of difference = 0 (vs >): T-Value = 0.48 P-Value = 0.317

DF = 38

p-value is 0.317 Since the p-value is larger than = 0.025, we cannot reject the null hypothesis. At 2.5% level of significance, the data does not provide sufficient evidence that the mean TV hours watched of females is more than males.

3. Test of two means paired samples. We tested if the two head circum of (head 1 and head 2) measurements produced different means for the class. Head1, N1 is 44 and head2, N2 is 43, therefore no need to check for normality since >30. Setting up the hypothesis: H0: 1-2=0 Ha: 1-2 0 Alpha=0.01 Paired T-Test and CI: Head 1, Head 2
Paired T for Head 1 - Head 2 Head 1 Head 2 Difference N 43 43 43 Mean 56.823 57.140 -0.316 StDev 1.990 1.901 0.944 SE Mean 0.303 0.290 0.144

99% CI for mean difference: (-0.705, 0.072)

T-Test of mean difference = 0 (vs not = 0): T-Value = -2.20

P-Value = 0.034

p-value is 0.034 Since the p-value is larger than = 0.01, we cannot reject the null hypothesis. At 1% level of significance, the data does not provide sufficient evidence that the mean Head1 is different from mean head2 in the class. CI: 99% confidence interval for 1 - 2 is: (-0.705, 0.072) We are 99% confident that the difference between the mean head1 circumference and head2 circumference measurements is between -0.705 and 0.072

4. I have provided descript stat for both columns of height and ideal height: Descriptive Statistics: Height, Ideal-Hight
Variable Height Ideal-Hight Variable Height Ideal-Hight N 44 42 N* 0 2 Mean 66.881 69.929 SE Mean 0.591 0.532 StDev 3.917 3.446 Minimum 60.000 63.000 Q1 63.625 67.000 Median 67.000 70.000 Q3 69.000 72.000

Maximum 77.000 80.000

Since the students want to be taller than they are, I set up the hypothesis as follows: Both sample sized are larger than 30, they are 42 in fact therefore no need to check for normality. However the test is set up as paired test where height is 1 and ideal height is 2 H0 : 1 - 2 = 0 Ha: 1-2<0 (one tailed test left sided) , because we are checking to see if students want to be taller than they are now so the difference between their height now and idea would be less than 0 at alpha 0.01 Paired T-Test and CI: Height, Ideal-Hight
Paired T for Height - Ideal-Hight Height Ideal-Hight Difference N 42 42 42 Mean 67.101 69.929 -2.827 StDev 3.872 3.446 3.171 SE Mean 0.598 0.532 0.489

99% upper bound for mean difference: -1.643 T-Test of mean difference = 0 (vs < 0): T-Value = -5.78

P-Value = 0.000

The std dev difference is 3.171

p-value is very small, in fact its 0.000, since p-value is <alpha=0.01, we reject the null hypothesis and state that the students on average want to be taller than they are now by concluding the alternative hypothesis. We have std diff = 3.171, power=0.85, diff=1.5 and alpha=0.01

Power and Sample Size


Paired t Test Testing mean paired difference = 0 (versus > 0) Calculating power for mean paired difference = difference Alpha = 0.01 Assumed standard deviation of paired differences = 3.171 Sample Size 54 Target Power 0.85

Difference 1.5

Actual Power 0.855497

The sample size needed is 54.

5.

We set up the hypothesis: H0 : gender and hair dyed are independent (the two categorical variables are independent) Ha: gender and hair dyed are related (the two categorical variables are dependent)

Tabulated statistics: Gender, Hair dyed?


Rows: Gender no * F M All 1 13 15 29 yes 0 12 3 15 Columns: Hair dyed? All 1 25 18 44 Count

Cell Contents:

Chi-Square Test: no, yes


Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts 1 no 13 16.28 0.660 15 11.72 yes 12 8.72 1.233 3 6.28 Total 25

18

0.917 Total 28

1.712 15 43

Chi-Sq = 4.523, DF = 1, P-Value = 0.033

= * (nij Eij)/Eij ] = 4.523 with df = (r 1)(c - 1) = 0 and p value=0.033 Since p value = 0.033< 0.05 we reject the null hypothesis that gender and hair dyed independent, therefore we conclude that they are related AND GENDER AND HAIR DYED ARE DEPENDENT. From online notes: Exercise caution when there are small expected counts. Minitab will give a count of

the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the chi-square test if more than 20% of the cells have expected frequencies below five, especially if the p-value is small and these cells give a large contribution to the total chi-square value. However in this case there was only one cell with value of less than 3, therefore we still proceeded with chi-square test of independence.

You might also like