Professional Documents
Culture Documents
Stat 500
Abstract
Using a given data set, conclusions were drawn and tests done in response to questions provided. This report is completed by Arman Tabatabaei.
1. In cleaning up the data there needed to be a consistent measuring unit, as per every participants, such as inches. Therefore I converted the ones that entered the measures seemingly in another unit such as: Inches to Centimeter or Feet to Inches wherever appropriate. Descriptive Statistics: Height
Variable Height N 44 N* 0 N 42 Mean 64.21 N* 2 SE Mean 2.05 StDev 13.57 Minimum 5.30 Q1 63.00 Median 67.00 Q3 69.00 Maximum 77.00
Maximum 80.00
Maximum 218.00
Maximum 216.00
Maximum 40.000
Maximum 19.000
Maximum 18.500
Maximum 83.00
Maximum 16.000
Maximum 24.000
After running a Boxplot Multi Y simple for all column the outliers where indicated by asterisk on the boxplots.
Outliers: Height: rows 5 (from 53 to 64 in), and 29 (from 53 to 64 in) Ideal height: rows 5 (from 58 to 70 in) and 16 (it is ok because 6 8 could be an ideal height wont delete) and row 29 from 55 to 66 in Pairs of jeans: rows 21, 25 and 2 (it is acceptable for someone to have 40, 12 & 14 pairs of jeans wont delete) Left Wrist: rows 33 (from 5.5in to 14cm) and 5 (from 6in to 15.24cm) Right wrist: rows 33 (from 5.5in to 14cm) and 5 (from 6in to 15.24cm) Left arm: rows 33 (from 26.5in to 67.3cm) and 5 (from 24in to 61cm) Right arm: rows 18 (from 24in to 61cm), 33(from 26.5in to 67.3cm) and 5(from 24in to 61cm). Rows 15 and 23 are marked as outliers but wont be deleted because the numbers are reasonable) Head 1: rows 18(from 24in to 61cm), 33(from 22in to 55.9cm) and 5(from 22in to 55.9cm) Head 2: rows 5(from 22in to 55.9cm) and 33(from 22in to 55.9cm). Row 24 is marked as outlier but wont be deleted because the value is acceptable)
GPA: rows 18 and 27 (row 18 with value of 6.0 doesnt seem to be correct and row 27 with 56.8!!row 18 and 27 will be deleted and replaced by an asterisk) and row 25 with GPA of 3.0 is an outlier but wont be deleted since its an acceptable value. Hours studied: rows 4, 22, 34 and 37 (however they wont be deleted since the values are acceptable) Hours TV: row 36 (wont be deleted since its an acceptable value) After trimming and adjustments this is the new Boxplot:
2. We are going to test two means of independent samples. For this problem we chose to test the means for hours spent to watch TV: between Males and Females. (TESTING if females watched TV more than males ). Descriptive Statistics: Hr TV females, Hr TV males
Variable Hr TV females Hr TV males Variable Hr TV females Hr TV males N 24 17 N* 0 0 Mean 6.98 6.294 SE Mean 1.12 0.882 StDev 5.50 3.636 Minimum 0.00 0.000 Q1 3.25 4.500 Median 5.00 7.000 Q3 10.00 10.000
N1 and N2 are less than 30, so we need to check for normality assumption.
The Std Dev are quite different where for females is 5.5 and for males is 3.636, and the ratio is almost 0.66, The two samples are approximately normally distributed but because of the std dev difference we will use a non-pooled t test.
Ho: 1- 2=0 Ha: 1 - 2 >0 One sided right tailed test lpha= 0.025 Two-Sample T-Test and CI: Hr TV females, Hr TV males
Two-sample T for Hr TV females vs Hr TV males Hr TV females Hr TV males N 24 17 Mean 6.98 6.29 StDev 5.50 3.64 SE Mean 1.1 0.88
Difference = mu (Hr TV females) - mu (Hr TV males) Estimate for difference: 0.69 97.5% lower bound for difference: -2.20 T-Test of difference = 0 (vs >): T-Value = 0.48 P-Value = 0.317
DF = 38
p-value is 0.317 Since the p-value is larger than = 0.025, we cannot reject the null hypothesis. At 2.5% level of significance, the data does not provide sufficient evidence that the mean TV hours watched of females is more than males.
3. Test of two means paired samples. We tested if the two head circum of (head 1 and head 2) measurements produced different means for the class. Head1, N1 is 44 and head2, N2 is 43, therefore no need to check for normality since >30. Setting up the hypothesis: H0: 1-2=0 Ha: 1-2 0 Alpha=0.01 Paired T-Test and CI: Head 1, Head 2
Paired T for Head 1 - Head 2 Head 1 Head 2 Difference N 43 43 43 Mean 56.823 57.140 -0.316 StDev 1.990 1.901 0.944 SE Mean 0.303 0.290 0.144
P-Value = 0.034
p-value is 0.034 Since the p-value is larger than = 0.01, we cannot reject the null hypothesis. At 1% level of significance, the data does not provide sufficient evidence that the mean Head1 is different from mean head2 in the class. CI: 99% confidence interval for 1 - 2 is: (-0.705, 0.072) We are 99% confident that the difference between the mean head1 circumference and head2 circumference measurements is between -0.705 and 0.072
4. I have provided descript stat for both columns of height and ideal height: Descriptive Statistics: Height, Ideal-Hight
Variable Height Ideal-Hight Variable Height Ideal-Hight N 44 42 N* 0 2 Mean 66.881 69.929 SE Mean 0.591 0.532 StDev 3.917 3.446 Minimum 60.000 63.000 Q1 63.625 67.000 Median 67.000 70.000 Q3 69.000 72.000
Since the students want to be taller than they are, I set up the hypothesis as follows: Both sample sized are larger than 30, they are 42 in fact therefore no need to check for normality. However the test is set up as paired test where height is 1 and ideal height is 2 H0 : 1 - 2 = 0 Ha: 1-2<0 (one tailed test left sided) , because we are checking to see if students want to be taller than they are now so the difference between their height now and idea would be less than 0 at alpha 0.01 Paired T-Test and CI: Height, Ideal-Hight
Paired T for Height - Ideal-Hight Height Ideal-Hight Difference N 42 42 42 Mean 67.101 69.929 -2.827 StDev 3.872 3.446 3.171 SE Mean 0.598 0.532 0.489
99% upper bound for mean difference: -1.643 T-Test of mean difference = 0 (vs < 0): T-Value = -5.78
P-Value = 0.000
p-value is very small, in fact its 0.000, since p-value is <alpha=0.01, we reject the null hypothesis and state that the students on average want to be taller than they are now by concluding the alternative hypothesis. We have std diff = 3.171, power=0.85, diff=1.5 and alpha=0.01
Difference 1.5
5.
We set up the hypothesis: H0 : gender and hair dyed are independent (the two categorical variables are independent) Ha: gender and hair dyed are related (the two categorical variables are dependent)
Cell Contents:
18
0.917 Total 28
1.712 15 43
= * (nij Eij)/Eij ] = 4.523 with df = (r 1)(c - 1) = 0 and p value=0.033 Since p value = 0.033< 0.05 we reject the null hypothesis that gender and hair dyed independent, therefore we conclude that they are related AND GENDER AND HAIR DYED ARE DEPENDENT. From online notes: Exercise caution when there are small expected counts. Minitab will give a count of
the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the chi-square test if more than 20% of the cells have expected frequencies below five, especially if the p-value is small and these cells give a large contribution to the total chi-square value. However in this case there was only one cell with value of less than 3, therefore we still proceeded with chi-square test of independence.