You are on page 1of 30

user_type Gender hand_lenghand_widtheight

student female 6.5 7.5 62


student female 6.5 8 60
student female 7 7.25 62
student male 7 8.5 72
student female 5 6 67.5
student female 5.8 6 62
student female 7 8.5 68
student female 6.5 7 70
student female 6.5 7.5 67.2
student female 7 5 62.5
student male 8.25 9.5 75
student male 7 7.75 67
student female 6.5 8 65
student female 7.5 8 62.5
student female 6.5 7 62
student male 8.2 9.3 74
student female 6.5 7 68
student male 7.5 9.5 71.5
student male 7.63 8.75 71
student male 8.25 8.5 73
student female 6.5 7.5 62
student male 7.4 8.5 64
student male 7.5 8.5 70
student male 7.75 9 69
student female 7.5 8 67
student male 7.75 9.5 72
student female 7 8 66
student female 6.5 7.5 69
student female 7.5 8 68
student male 7 7.5 69
instructor male 7.4 8.5 64
student male 7 8.5 66
student female 6.5 7.75 63.6
student male 7.5 8.5 69
student female 7 8.25 62
student female 6.5 7 66
student male 7.4 7 67
student female 6.7 7.4 59
student female 6.7 7.5 67
student male 8 8.5 67
student male 8 9 73
student female 7 8 64
student female 6.75 7.5 65
student male 8 9.5 76
student male 7 8.125 69.5
student male 6.5 8 73.5
student male 8 4 73
instructor male 6.8 7.4 65
student female 6.75 7.5 66
student male 7.5 10 71
height
62
60
62
72 Bin Frequency Histogram Height
67.5 62 8
15
62 65 9

Frequency
10
68 68 14
70 71 9 5
67.2 74 8 0
62 65 68 71 74 77 More
62.5 77 2
Bi n
75 More 0
67 Frequency
65
62.5
62 Variable Height
74 Minimum 59 Box Plot - Height
71
68 Q1 64
71.5 Median 67 70

71 Q3 70 69

73 Maximum 76 68
62 67
64 Variable Height 66
Units

70 Median - Min 8 65
69 Q1 64 64
67 Median - Q1 3 63
72 Q3 - Median 3 62
66 Max - Median 9 61
69 Hei ght

68
69 Part A Using the boxplot, are there any outliers present? Explain.
64 1.5 IQR 9
66 Q3+1.5*IQ 79
63.6 Q1-1.5*IQ 55
69 No, there are no outliers.
62
66 Part B Looking at the histogram, comment on the distribution shape of height: (left skewe
67 The shape of the histogram is symmetric
59
67
67
73
64
65
76
69.5
73.5
73
65
66
71
eight

74 77 More

ot - Height

Hei ght

n shape of height: (left skewed, symmetric, right skewed?)


hand_length
hand_width height
6.5 7.5 62 Variables N
6.5 8 60 Hand Length 50
7 7.25 62 Hand Width 50
7 8.5 72 Height 50
5 6 67.5
5.8 6 62
7 8.5 68
6.5 7 70
6.5 7.5 67.2
7 5 62.5
8.25 9.5 75
7 7.75 67
6.5 8 65
7.5 8 62.5
6.5 7 62
8.2 9.3 74
6.5 7 68
7.5 9.5 71.5
7.63 8.75 71
8.25 8.5 73
6.5 7.5 62
7.4 8.5 64
7.5 8.5 70
7.75 9 69
7.5 8 67
7.75 9.5 72
7 8 66
6.5 7.5 69
7.5 8 68
7 7.5 69
7.4 8.5 64
7 8.5 66
6.5 7.75 63.6
7.5 8.5 69
7 8.25 62
6.5 7 66
7.4 7 67
6.7 7.4 59
6.7 7.5 67
8 8.5 67
8 9 73
7 8 64
6.75 7.5 65
8 9.5 76
7 8.125 69.5
6.5 8 73.5
8 4 73
6.8 7.4 65
6.75 7.5 66
7.5 10 71
Mean Median Range Variance Std Dev
7.081 7 3.25 0.424 0.651
7.890 8 6 1.232 1.110
67.296 67 17 17.219 4.150
hand_length hand_width height
6.5 7.5 62
6.5 8 60
Height vs Hand Width
7 7.25 62 80

7 8.5 72 70
5 6 67.5 60
5.8 6 62 50
7 8.5 68

Height
40
6.5 7 70 30
6.5 7.5 67.2 20
7 5 62.5
10
8.25 9.5 75
0
7 7.75 67 3 4 5 6 7 8 9 10
6.5 8 65 Hand Width
7.5 8 62.5
6.5 7 62
8.2 9.3 74
6.5 7 68
7.5 9.5 71.5 Height vs Hand Length
7.63 8.75 71 80
8.25 8.5 73 70
6.5 7.5 62 60
7.4 8.5 64
50
7.5 8.5 70
Height

40
7.75 9 69
30
7.5 8 67
20
7.75 9.5 72
7 8 66 10

6.5 7.5 69 0
4.5 5 5.5 6 6.5 7 7.5 8
7.5 8 68
Hand Length
7 7.5 69
7.4 8.5 64
7 8.5 66
6.5 7.75 63.6
7.5 8.5 69
7 8.25 62
6.5 7 66
7.4 7 67
6.7 7.4 59
6.7 7.5 67
8 8.5 67
8 9 73
7 8 64
6.75 7.5 65
8 9.5 76
7 8.125 69.5
6.5 8 73.5
8 4 73
6.8 7.4 65
6.75 7.5 66
7.5 10 71
and Width Height vs Hand Width
Correlation 0.395 Medium Positive

Height vs Hand Length


Correlation 0.547 Large Positive

7 8 9 10 11
d Width

nd Length

6.5 7 7.5 8 8.5


d Length
hand_width height
7.5 62 SUMMARY OUTPUT
8 60
7.25 62 Regression Statistics
8.5 72 Multiple R 0.394536101
6 67.5 R Square 0.155658735
6 62 Adjusted R0.138068292
8.5 68 Standard E3.852453052
7 70 Observatio 50
7.5 67.2
5 62.5 ANOVA
9.5 75 df SS MS F Significance F
7.75 67 Regression 1 131.3323 131.3323 8.849051 0.004579
8 65 Residual 48 712.3869 14.84139
8 62.5 Total 49 843.7192
7 62
9.3 74 CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%
7 68 Intercept 55.66017377 3.94931 14.09364 1.11E-18 47.71955 63.60079
9.5 71.5 hand_widt1.474849639 0.495792 2.974736 0.004579 0.477993 2.471706
8.75 71
8.5 73
7.5 62
8.5 64 RESIDUAL OUTPUT PROBABILITY OUTPUT
8.5 70
9 69 Observation
Predicted heightResiduals
Standard Residuals Percentile height
8 67 1 66.72154607 -4.72155 -1.2383 1 59
9.5 72 2 67.45897089 -7.45897 -1.95623 3 60
8 66 3 66.35283366 -4.35283 -1.1416 5 62
7.5 69 4 68.1963957 3.803604 0.997552 7 62
8 68 5 64.50927161 2.990728 0.784363 9 62
7.5 69 6 64.50927161 -2.50927 -0.65809 11 62
8.5 64 7 68.1963957 -0.1964 -0.05151 13 62
8.5 66 8 65.98412125 4.015879 1.053224 15 62
7.75 63.6 9 66.72154607 0.478454 0.125482 17 62.5
8.5 69 10 63.03442197 -0.53442 -0.14016 19 62.5
8.25 62 11 69.67124534 5.328755 1.397545 21 63.6
7 66 12 67.09025848 -0.09026 -0.02367 23 64
7 67 13 67.45897089 -2.45897 -0.6449 25 64
7.4 59 14 67.45897089 -4.95897 -1.30056 27 64
7.5 67 15 65.98412125 -3.98412 -1.04489 29 65
8.5 67 16 69.37627542 4.623725 1.21264 31 65
9 73 17 65.98412125 2.015879 0.528694 33 65
8 64 18 69.67124534 1.828755 0.479618 35 66
7.5 65 19 68.56510811 2.434892 0.638587 37 66
9.5 76 20 68.1963957 4.803604 1.259816 39 66
8.125 69.5 21 66.72154607 -4.72155 -1.2383 41 66
8 73.5 22 68.1963957 -4.1964 -1.10057 43 67
4 73 23 68.1963957 1.803604 0.473022 45 67
7.4 65 24 68.93382052 0.066179 0.017357 47 67
7.5 66 25 67.45897089 -0.45897 -0.12037 49 67
10 71 26 69.67124534 2.328755 0.61075 51 67
27 67.45897089 -1.45897 -0.38264 53 67.2
28 66.72154607 2.278454 0.597558 55 67.5
29 67.45897089 0.541029 0.141893 57 68
30 66.72154607 2.278454 0.597558 59 68
31 68.1963957 -4.1964 -1.10057 61 68
32 68.1963957 -2.1964 -0.57604 63 69
33 67.09025848 -3.49026 -0.91537 65 69
34 68.1963957 0.803604 0.210757 67 69
35 67.82768329 -5.82768 -1.5284 69 69
36 65.98412125 0.015879 0.004164 71 69.5
37 65.98412125 1.015879 0.266429 73 70
38 66.5740611 -7.57406 -1.98641 75 70
39 66.72154607 0.278454 0.073029 77 71
40 68.1963957 -1.1964 -0.31377 79 71
41 68.93382052 4.066179 1.066416 81 71.5
42 67.45897089 -3.45897 -0.90717 83 72
43 66.72154607 -1.72155 -0.4515 85 72
44 69.67124534 6.328755 1.65981 87 73
45 67.64332709 1.856673 0.48694 89 73
46 67.45897089 6.041029 1.58435 91 73
47 61.55957233 11.44043 3.000422 93 73.5
48 66.5740611 -1.57406 -0.41282 95 74
49 66.72154607 -0.72155 -0.18924 97 75
50 70.40867016 0.59133 0.155085 99 76

Part 4A
Y = 55.66017 + 1.47485*X

Part 4B
X 8.5
Y = 55.66017 + 1.47485*8.5
68.1964

Part 4C

Although we can predict an individual's height given that the hadn width is 20". Ho
not possible in a realistic scenario and hence, the value generated will be absurd.
Part 4D
X 9
Y = 55.66017 + 1.47485*9
68.93382
Actual Value 72
Residual 3.066179

Part 4E ANOVA
df SS MS F Significance F
Regression 1 131.3323 131.3323 8.849051 0.004579
Residual 48 712.3869 14.84139
Total 49 843.7192

Part 4F R^2 0.155659

Part 5
X 4
Y = 55.66017 + 1.47485*4
61.55957
Actual Value 73
Residual 11.44043
hand_width Residual Plot hand_width Line Fit Plot
15 80
10 60
Residuals

5 40 height

height
0 Predic
20
-5 3 4 5 6 7 8 9 10 11 0
-10 3 4 5 6 7 8 9 10 11
hand_width hand_width

Q-Q Plot
80
60
Lower 95.0%
Upper 95.0% 40
height

47.71955 63.60079 20
0.477993 2.471706 0
0 20 40 60 80 100 120
Sample Percentile

LITY OUTPUT

Height vs Hand Width


80
70
60
50
Height

40
30
20
10
0
3 4 5 6 7 8 9 10 11
Hand Width

Bin Frequency
-3 12 Histogram - Residuals
1 19 20
5 15 15
Frequency

9 3 10 Frequency
13 1 5
0
-3 1 5 9 13 More
Bin
20
15

Frequency
10 Frequency
5
More 0 0
More 1 -3 1 5 9 13 More
Bin

Predicted vs Residual Plot


15
10
Residuals

5
0
-5 61 62 63 64 65 66 67 68 69 70 71
-10
Predicted Height

hat the hadn width is 20". However, it is


e generated will be absurd.
Significance F
_width Line Fit Plot

height
Predicted height

6 7 8 9 10 11
hand_width
hand_width height hand_length
7.5 62 6.5 SUMMARY OUTPUT
8 60 6.5
7.25 62 7 Regression Statistics
8.5 72 7 Multiple R 0.58652104
6 67.5 5 R Square 0.344006931
6 62 5.8 Adjusted R0.329429307
8.5 68 7 Standard E3.389458184
7 70 6.5 Observatio 47
7.5 67.2 6.5
9.5 75 8.25 ANOVA
7.75 67 7 df SS MS F Significance F
8 65 6.5 Regression 1 271.1072 271.1072 23.59829 1.47E-05
8 62.5 7.5 Residual 45 516.9792 11.48843
7 62 6.5 Total 46 788.0864
9.3 74 8.2
7 68 6.5 CoefficientsStandard Error t Stat P-value Lower 95%
9.5 71.5 7.5 Intercept 44.82555634 4.649317 9.641321 1.61E-12 35.46135
8.75 71 7.63 hand_widt2.788820534 0.57409 4.857807 1.47E-05 1.632543
8.5 73 8.25
7.5 62 6.5
8.5 64 7.4
8.5 70 7.5 RESIDUAL OUTPUT PROBABILITY OUTPUT
9 69 7.75
8 67 7.5 Observation
Predicted heightResiduals
Standard Residuals Percentile
9.5 72 7.75 1 65.74171035 -3.74171 -1.11612 1.06383
8 66 7 2 67.13612062 -7.13612 -2.12865 3.191489
7.5 69 6.5 3 65.04450522 -3.04451 -0.90815 5.319149
8 68 7.5 4 68.53053089 3.469469 1.034917 7.446809
7.5 69 7 5 61.55847955 5.94152 1.772311 9.574468
8.5 64 7.4 6 61.55847955 0.44152 0.131702 11.70213
8.5 66 7 7 68.53053089 -0.53053 -0.15825 13.82979
7.75 63.6 6.5 8 64.34730009 5.6527 1.686158 15.95745
8.5 69 7.5 9 65.74171035 1.45829 0.434997 18.08511
8.25 62 7 10 71.31935142 3.680649 1.09791 20.21277
7 66 6.5 11 66.43891549 0.561085 0.167367 22.34043
7.4 59 6.7 12 67.13612062 -2.13612 -0.63719 24.46809
7.5 67 6.7 13 67.13612062 -4.63612 -1.38292 26.59574
8.5 67 8 14 64.34730009 -2.3473 -0.70018 28.7234
9 73 8 15 70.76158732 3.238413 0.965994 30.85106
8 64 7 16 64.34730009 3.6527 1.089573 32.97872
7.5 65 6.75 17 71.31935142 0.180649 0.053886 35.10638
9.5 76 8 18 69.22773602 1.772264 0.528653 37.23404
8.125 69.5 7 19 68.53053089 4.469469 1.333209 39.3617
8 73.5 6.5 20 65.74171035 -3.74171 -1.11612 41.48936
7.4 65 6.8 21 68.53053089 -4.53053 -1.35142 43.61702
7.5 66 6.75 22 68.53053089 1.469469 0.438332 45.74468
10 71 7.5 23 69.92494116 -0.92494 -0.2759 47.87234
24 67.13612062 -0.13612 -0.0406 50
25 71.31935142 0.680649 0.203032 52.12766
26 67.13612062 -1.13612 -0.3389 54.25532
27 65.74171035 3.25829 0.971923 56.38298
28 67.13612062 0.863879 0.257689 58.51064
29 65.74171035 3.25829 0.971923 60.6383
30 68.53053089 -4.53053 -1.35142 62.76596
31 68.53053089 -2.53053 -0.75484 64.89362
32 66.43891549 -2.83892 -0.84683 67.02128
33 68.53053089 0.469469 0.140039 69.14894
34 67.83332575 -5.83333 -1.74004 71.2766
35 64.34730009 1.6527 0.492988 73.40426
36 65.4628283 -6.46283 -1.92781 75.53191
37 65.74171035 1.25829 0.375338 77.65957
38 68.53053089 -1.53053 -0.45655 79.78723
39 69.92494116 3.075059 0.917267 81.91489
40 67.13612062 -3.13612 -0.93548 84.04255
41 65.74171035 -0.74171 -0.22125 86.17021
42 71.31935142 4.680649 1.396203 88.29787
43 67.48472319 2.015277 0.601142 90.42553
44 67.13612062 6.363879 1.898298 92.55319
45 65.4628283 -0.46283 -0.13806 94.68085
46 65.74171035 0.25829 0.077046 96.80851
47 72.71376169 -1.71376 -0.5112 98.93617

Part A
Y = 44.8255 + 2.7888*X

Part B
X 8.5
Y = 44.8255 + 2.7888*8.5
68.53053

Part C
X 9
Y = 44.8255 + 2.7888*9
69.92494
Actual Value 72
Residual 2.075059

Part D ANOVA
df SS MS F
Regression 1 271.1072 271.1072 23.59829
Residual 45 516.9792 11.48843
Total 46 788.0864

Part E R^2 0.344007

Part F
R^2 Old 0.155659
Ratio 2.210007

Part G
Yes the plot of residual vs Predicted Y is considered as random pattern

Yes the plot of residual vs Hand Width is considered as random pattern

No there are no Outliers

Yes the Histogram in Mounded Shaped

Yes the new model is better than the previous model. However, it can be
since the Coefficient of determination is still 0.344 showing that only 34
variance in dependent variable can be explained by variation in indepen
hand_width Residual Plot hand_width Line Fit Plot
10 80
60
5
40
Residuals

height

height
0 20 Predicted he
5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 0
-5
5 5 5 5 5 .5
-10 5. 6. 7. 8. 9. 10
hand_width hand_width

Significance F Normal Probability Plot


80
60
40
height

20
Upper 95%
Lower 95.0%
Upper 95.0%
0
54.18976 35.46135 54.18976 0 20 40 60 80 100 120
3.945098 1.632543 3.945098 Sample Percentile

PROBABILITY OUTPUT Height vs Hand Width


80
height 70
59 60
60
50
62
Height

40
62
30
62
20
62
62 10

62 0
5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5
62.5
Hand Width
63.6
64
64 Bin Frequency
64 -4 6 Histogram - Residuals
65 -1 11 20
65 2 17 15
Frequency

65 5 10 10 Frequency
66 8 3 5
66 More 0 0
66 -4 -1 2 5 8 More
66 Bin
5

Fre
0
-4 -1 2 5 8 More
Bin
67
67
67
67
Predicted vs Residual Plot
67.2 10
67.5 5

Residuals
68 0
68 60 62 64 66 68 70 72 74
-5
68
-10
69
69 Predicted Height
69
69
69.5
70
70
71
71
71.5
72
72
73
73
73.5
74
75
76
Significance F
1.47E-05

sidered as random pattern

nsidered as random pattern

Outliers: http://emp.byui.edu/BrownD/Stats-intro/dscrptv/graphs/qq-plot_egs.htm

us model. However, it can be made better


0.344 showing that only 34.4% of
ined by variation in independent variable.
dth Line Fit Plot

height
Predicted height

5 5 .5
8. 9. 10
width

9.5 10 10.5

siduals

Frequency

8 More
8 More

dual Plot

68 70 72 74

eight
plot_egs.htm

You might also like