Professional Documents
Culture Documents
10
11
12
13
14
15
16
17
18
19
Experiments
20
Observational Studies
21
22
23
24
25
26
Samples of Convenience
27
Math 321?
Upper-division English?
28
29
30
2)
X = Sex
T = Hair Color
W = Zip Code
Y = Age
U = Rainfall
Z = Volume of milk
Math 321 - Dr. Minnotte
31
10
32
A subscript of i (occasionally j or k)
indicates a specific, but arbitrary,
observation.
33
1)
2)
34
11
35
36
Example: Stocks:
37
12
38
Outliers
39
40
13
Measures of Variability
41
42
43
14
44
45
46
15
s2 = ?
s=?
Math 321 - Dr. Minnotte
47
48
49
16
50
51
52
17
, is the
Example: Stocks:
=?
53
54
Quartiles
55
18
Q1 = ?
Q3 = ?
56
57
Percentiles
58
19
Variable
Stock Returns 19
Mean
StDev
Variance
Minimum
Q1
Median
Q3
Maximum
15.37
13.66
186.49
-7.20
5.48
17.60
28.90
37.40
IQR
23.43
59
60
Bar Charts
61
20
Category
Perfect
62
Count
64
Good
Satisfactory
Fail
47
33
6
Total
150
63
64
21
65
66
67
22
68
69
70
23
1000
900
800
700
600
500
400
300
200
100
0
Survived
Died
1st Class
2nd
Class
3rd
Class
Crew
Died
Survived
1st
Class
2nd
Class
3rd
Class
Crew
71
72
73
24
74
75
Pie Charts
76
25
77
78
79
Minitab:
26
Dotplots
80
66
70
69
80
68
67
72
73
70
57
63
78
70
67
53
75
67
70
81
76
79
75
76
58
31
81
Histograms
82
27
Constructing a Histogram
Find the minimum and maximum of the
data.
1)
2)
3)
4)
83
84
85
28
Multimodal?
86
87
88
29
89
90
91
30
92
93
Boxplots
94
31
Q3 + 1.5 IQR.
95
96
97
32
98
99
Examples:
100
33
101
Scatterplots (2.1)
102
103
34
104
105
106
35
Minitab Scatterplot:
107
Correlation
108
109
36
110
Then
is a good, unitless
measure of the linear relationship between x
and y called the correlation coefficient.
Math 321 - Dr. Minnotte
What is r?
Math 321 - Dr. Minnotte
111
112
37
Properties of r
1.
2.
3.
4.
113
Properties of r (continued)
5.
Note that strength is often context- and disciplinedependent. An engineer might find any correlation less
than .95 to be weak, while a social scientist might find a
correlation of .3 to be very strong.
114
115
38
116
Properties of r (continued)
6.
7.
117
118
39
x determines y
y determines x
Some third value, z, (called a confounding
factor) determines both x and y.
119
120
121
40
122
123
124
41
125
126
127
42
128
129
130
43
131
132
133
44
Note: r = 0.933.
Math 321 - Dr. Minnotte
134
S = 0.330519
R-Sq = 87.1%
R-Sq(adj) = 87.0%
Analysis of Variance
Source
DF
SS
MS
83.3794
83.3794
763.25
0.000
Error
113
12.3444
0.1092
Total
114
95.7238
Regression
135
Chapter 3: Probability
136
45
Examples:
Examples:
137
Die: S = {1, 2, 3, 4, 5, 6}
Card: S = ?
Component: S = ?
138
139
46
Events
Definition: Set A is a subset of set B
(A B) if every element of A is also in B.
Example: S = {1, 2, 3, 4, 5, 6}
A = {1, 3, 5} S
B = {1, 2, 6, 7} S
Examples:
140
141
Combining Events
1)
Keyword: or
Example: S = {1, 2, 3, 4, 5, 6}
A = {1, 3, 5} S
B = {1, 2, 3} S
AB=?
Math 321 - Dr. Minnotte
142
47
2)
Example: S = {1, 2, 3, 4, 5, 6}
A = {1, 3, 5}
B = {1, 2, 3}
AB=?
143
3)
Keyword: not
Example: S = {1, 2, 3, 4, 5, 6}
A = {1, 3, 5}
Ac = ?
144
4)
Example: S = {1, 2, 3, 4, 5, 6}
A = {1, 3, 5}
C = {4, 6}
A and C = , so A and C are mutually
exclusive.
Math 321 - Dr. Minnotte
145
48
A or B?
Not A?
146
1)
2)
3)
P(S) = 1.
0 P(A) 1 for all events A.
If A and B are mutually exclusive,
P(A B) = P(A) + P(B).
Math 321 - Dr. Minnotte
147
148
49
Note:
149
150
1)
Show:
Note: Since Sc = , P() = 0.
Math 321 - Dr. Minnotte
151
50
2)
Show:
Note: if A and B are mutually exclusive,
P(A B) = P() = 0, so this is the same
as axiom 3.
152
P(A B) = 1/6.
153
Then:
P(T 60) = ?
P(lifetime no more than 80) = ?
Math 321 - Dr. Minnotte
154
51
155
156
157
52
A = {1, 3, 5}
P(A) = 3/6 = 1/2
B = {1, 2, 3}
P(B) = 3/6 = 1/2
P(A B) = P({1, 3}) = 2/6 = 1/3
If I roll the die and, without showing you, tell
you event B has occurred (I rolled no greater
than 3), now what is the probability of event
A?
158
159
Die:
Math 321 - Dr. Minnotte
160
53
161
162
163
54
Independence
164
165
A = {draw a club}
B = {draw an ace}
C = {draw a red card}
166
55
167
A = {red 6}
B = {black 6}
P(A) = 1/6
P(B) = 1/6
(fair dice)
168
169
56
170
Recall,
171
172
57
173
174
175
58
X=#H
Y=#H#T
Z = # H before first T
176
177
178
59
179
Notation:
180
S = {1, 2, 3, 4, 5, 6}
p(1) = P(X = 1) = 1/6
p(2) = P(X = 2) = 1/6
and so on.
We might write
p(x) = 1/6
x {1, 2, 3, 4, 5, 6}
181
60
1)
? p(x) ?
2)
x S p(x) = ?
182
183
184
61
185
186
187
62
188
189
If f(x) is a p.d.f.:
f(x) ?
190
63
191
P(2.5 X 3.0) = ?
P(0.2 X 0.2) = ?
192
193
64
If X is continuous,
194
limx-F(x) = 0
2)
limxF(x) = 1
3)
4)
P(0.5 X 1.0) = ?
195
196
65
if X is discrete, and
if X is continuous.
It can be thought of as the long-term average
of X, or the mean of a sample that follows the
distribution of X perfectly.
197
Example: Machines
x
p(x)
=?
0
1
2
3
0.12 0.27 0.46 0.15
198
199
Example:
=?
Example:
=?
66
Expectations of Functions of
Random Variables
Example: X ~ p(x) = , x = 1, 2.
What is E(X2)?
Is E(X2) = [E(X)]2?
Math 321 - Dr. Minnotte
200
201
202
67
203
=?
E(X2) = ?
V(X) = ?
=?
204
Example: Machines
x
p(x)
=?
E(X2) = ?
V(X) = ?
=?
0
1
2
3
0.12 0.27 0.46 0.15
205
68
Example:
=?
E(X2) = ?
V(X) = ?
=?
206
207
208
69
209
210
211
70
212
213
214
71
215
Note that
Xis.
216
Proof:
217
72
218
219
p(0) = P(X = 0)
p(1) = P(X = 1)
and so on.
Math 321 - Dr. Minnotte
220
73
221
222
223
74
224
225
226
75
227
228
X Binomial(n, p) or X Bin(n, p) .
229
76
230
What is p(2)?
231
232
77
233
234
235
78
236
237
If X ~ N(, 2),
238
79
Its width is
determined by 2;
large values of 2
imply a wide, low
curve, while small
values imply a
narrow, tall one.
V(X) = 2.
Math 321 - Dr. Minnotte
239
240
241
80
242
Examples:
P(Z 1.00) = ?
P(-2.00 Z 0.75) = ?
243
244
81
P(X 6.00) = ?
245
Normal Percentiles
246
247
82
248
249
250
83
251
252
253
84
Note that
254
Is centered on 50 ();
Is narrower than the solid normal curve for the
individual Xs the variance and standard
deviation of are smaller than those of X.
Remains bell-shaped and (roughly?) normal.
255
256
85
257
258
is
259
86
260
261
262
87
263
264
265
88
266
267
268
89
269
270
271
90
272
273
274
91
275
Sample
Inferential
Statistics
276
277
92
278
If X ~ Binomial(n, p) (n known, p
unknown), estimate p with
.
279
Properties of Estimates
280
93
so
is an
Also,
and
(proof:)
281
282
283
94
Let
284
285
Find:
286
95
Find:
287
288
289
96
290
291
292
97
Is it correct to say
P(458.70 486.02) = 0.95 ?
293
294
295
98
296
297
298
99
s If s is bigger,
is less accurate, and the
interval must be wider.
Confidence level To be more confident of
including the true value, we must make the
interval wider.
n as n gets bigger, the standard error of
gets smaller, and the interval gets narrower.
Math 321 - Dr. Minnotte
299
300
301
100
n = 50,
= 2.0727, s = 0.0711
Find a 95% confidence interval for .
w=?
302
Confidence Bounds
303
304
101
305
306
307
102
308
Here:
H0:
= 2.04 (= 0)
309
H1:
2.04
310
103
Here: z = ?
If H0 is true,
311
312
1)
2)
H0 is wrong.
313
104
1)
2)
3)
4)
5)
314
315
1)
316
105
2)
317
H0 = ?
H1 = ?
Math 321 - Dr. Minnotte
318
3)
z=?
Math 321 - Dr. Minnotte
319
106
4)
> 0
< 0
P(z* z)
320
z=?
P=?
321
5)
322
107
323
324
325
108
326
327
328
109
= 49.9865
329
330
331
110
The t-statistic
332
333
334
111
Table A.3
contains
important
percentiles
(critical values).
Each row
represents a
different t
distribution.
335
Example: T ~ t12
Example: T ~ t9
P(T 1.833) = ?
336
337
112
338
339
340
113
t-Tests (6.4)
341
1)
2)
3)
342
4)
H1
> 0
P(t* t)
< 0
P(t* t)
343
114
5)
H1?
t=?
d.f. = ?
P=?
Conclusion?
Math 321 - Dr. Minnotte
344
Variable
Cholesterol in m
Mean
StDev
SE Mean
20
205.800
48.392
10.821
95% CI
(183.152, 228.448)
2.85
0.010
345
Fail to
Reject H0
H0True
H1 True
Type I
Error
Correct
Decision
Correct
Decision
Type II
Error
346
115
347
Reject H0
(Convict)
Decision
Fail to
Reject H0
(Acquit)
H1 True
(Defendant
Guilty)
Correct
Type I Error
Decision
Correct
Decision
Type II
Error
348
Power (6.7)
349
116
3.
350
351
Sample
Difference
Size
Power
0.786845
Math 321 - Dr. Minnotte
352
117
353
354
355
118
Formulation A: P = 0.49
Formulation B: P = 0.24
Formulation C: P = 0.17
Formulation D: P = 0.003
Formulation E: P = 0.53
356
357
358
119
359
360
361
120
1)
2)
3)
362
is an unbiased estimator of X - Y.
1)
Show:
2)
is
Show:
Math 321 - Dr. Minnotte
363
3)
4)
5)
364
121
1)
365
366
367
122
2)
Two-sided:
H0: X Y = vs. H1: X Y .
One-sided:
H0: X Y vs. H1: X Y > .
or:
H0: X Y vs. H1: X Y < .
368
369
3)
is
370
123
Example: Students:
z=?
4)
X Y
X Y >
X Y <
P(z* z)
372
5)
371
373
124
374
375
376
125
Example: Students:
377
378
379
126
380
381
382
127
383
Other examples:
384
385
128
386
387
388
129
Mean
StDev
SE Mean
With
25
44.18
3.99
0.80
Without
20
38.56
3.63
0.81
5.62500
3.71000
P-Value = 0.000
DF = 42
389
Mean
StDev
StdDrug
40
31.1825
4.8318
SE Mean
0.7640
NewDrug
40
33.8375
4.9379
0.7808
Difference
40
-2.65500
3.73012
0.58978
P-Value = 0.000
390
391
130
If X ~
(X has an F distribution with 1 and
2 degrees of freedom),
392
393
394
131
395
Example: x ~ F5,7
396
397
132
398
We wish to test
H0: 1= 2 = = I vs.
H1: Two or more of the i are different.
399
Level 2
Level 3
Flowrate
10.6
11.7
19.6
(ml/s) at
9.7
12.7
15.1
collapse
8.3
17.6
16.6
11
11.209
14
15.086
10
17.330
Ji
N = 11 + 14 + 10 = 35
Math 321 - Dr. Minnotte
400
133
401
402
403
134
Note that
404
405
406
135
F=?
P-value?
Conclusion?
407
408
Level
level 1
level 2
level 3
DF
2
32
34
SS
204.02
138.47
342.49
R-Sq = 59.57%
N
11
14
10
Mean
11.209
15.086
17.330
StDev
1.899
2.150
2.168
MS
102.01
4.33
F
23.57
P
0.000
R-Sq(adj) = 57.04%
409
136
410
411
412
137
We use qI,N-I,.
413
q3,32,.05 3.49
2 1:
3 1:
3 2:
414
Lower
1.814
3.884
Center
3.877
6.121
Upper
5.939
8.357
--+---------+---------+---------+------(-----*-----)
(-----*------)
--+---------+---------+---------+-------3.5
0.0
3.5
7.0
Lower
0.125
Center
2.244
Upper
4.364
--+---------+---------+---------+------(-----*-----)
--+---------+---------+---------+-------3.5
0.0
3.5
7.0
415
138
Model Assumptions
416
417
418
139
1)
419
2)
420
3)
=?
z=?
421
140
4)
> 0
< 0
P(z* z)
422
5)
423
424
141
Sample p
95% CI
P-Value
176
400
0.440000
(0.390707, 0.490187)
0.019
Sample
1
Sample p
95% CI
176
400
0.440000
(0.391355, 0.488645)
Z-Value
P-Value
-2.40
0.016
425
Alternative
Sample
Target
Proportion
Size
Power
Actual Power
0.55
783
0.8
0.800239
426
We have:
427
142
428
429
430
143
431
432
433
144
434
435
436
145
437
Rural: nX = 160, X = 64
Urban: nY = 261, Y = 89
Find a 95% confidence interval for pX pY.
Math 321 - Dr. Minnotte
438
439
146
Rural: nX = 160, X = 64
Urban: nY = 261, Y = 89
Can we conclude that rural households are
more likely to use a natural Christmas tree
than urban ones?
440
441
Minitab:
Sample
Sample p
1684
9916
0.169827
1630
7180
0.227019
-0.0571930
-0.0469659
Z = -9.34
P-Value = 0.000
442
147
443
If Y ~
444
445
148
Table A.5
contains
important critical
values,
If
then
446
Example: X~
Example: X~
P(X 9.236) = ?
447
448
149
449
Total
Count
14
20
17
10
21
n = 90
450
Th
Total
Count
65
43
48
41 73 n = 270
Red
Count
57
Pink White
89
54
Total
n = 200
451
150
452
453
454
151
i = 1, , 6.
i = 1, , 5.
455
We will compute P-values using the chisquare table, so these are referred to as
chi-square statistics (and tests).
Math 321 - Dr. Minnotte
456
457
152
458
Example (Die):
Roll
Oi
1
14
2
20
3
17
4
10
5
8
6
21
Total
N = 90
Ei
15
15
15
15
15
15
N = 90
d.f. = 5,
0.05 P 0.10
Math 321 - Dr. Minnotte
459
Example (Snapdragons):
Color
Oi
Red
57
Pink
89
White Total
54
N = 200
Ei
X2 = ?
P?
460
153
461
Right-handed
Men
934
Women
1070
Total
2004
Left-handed
113
92
205
Ambidextrous
20
28
Total
1067
1170
2237
462
463
154
O11
O12
O21
O22
Oi1
Oi2
Row
Totals
O1j
O1J
O1
O2j
O2J
O2
Oij
OiJ
O i
OI1
OI2
OIj
OIJ
OI
Column
Totals
O1
O2
Oj
OJ
O=N
464
465
466
155
467
Right-handed
Men
934
Women
1070
Total
2004
Left-handed
113
92
205
Ambidextrous
20
28
Total
1067
1170
2237
X2?
d.f.?
P?
Math 321 - Dr. Minnotte
468
Minitab:
Total
Men
Women
Total
934
1070
2004
955.86
1048.14
0.500
0.456
113
92
97.78
107.22
2.369
2.160
20
13.36
14.64
3.306
3.015
1067
1170
205
28
2237
469
156
Example:
470
Formally:
yi = 0+ 1xi+ i
0 and 1 are (unknown) constants
1,,n are assumed to be independent draws
from a N(0, 2) distribution.
yi ~ N(0+ 1xi, 2)
E(yi) = 0+ 1xi
Math 321 - Dr. Minnotte
471
472
157
473
474
has a
475
158
Most commonly, 10 = 0.
Math 321 - Dr. Minnotte
476
n = 115
= .836
s = .331
= 119.25
=?
t=?
=?
P?
477
Predictor
Constant
January
Coef
-0.4698
0.83617
SE Coef
0.1257
0.03027
T
-3.74
27.63
P
0.000
0.000
478
159
479
480
Inference in Correlation
Then
.
Math 321 - Dr. Minnotte
481
160
482
r = -.488
U=?
d.f. = ?
P?
483
HS Grad
Illiteracy
-0.657
0.000
HS Grad
0.703
0.000
-0.488
0.000
Murder
484
161
485
486
487
162
488
489
Power Transformations
490
163
491
492
493
164
494
495
496
165
497
498
Predictor
Constant
WATER80
Coef
SE Coef
412.0
189.0
2.18
0.030
0.48885
0.02638
18.53
0.000
0.019271
0.003368
5.72
0.000
EDUCATION
-43.65
13.23
-3.30
0.001
PEOPLE81
234.71
28.00
8.38
0.000
CHPEOPLE
96.56
80.76
1.20
0.232
INCOME
499
166
S = 851.914
R-Sq = 67.5%
R-Sq(adj) = 67.1%
Analysis of Variance
Source
Regression
DF
SS
MS
737617962
147523592
203.27
0.000
725757
Residual Error
490
355620748
Total
495
1093238710
500
501
502
167
503
and
and combine them with the coefficient of
multiple determination
Math 321 - Dr. Minnotte
504
505
168
506
507
508
169
509
510
511
170
Predictor
Constant
WATER80
Coef
SE Coef
-768.9
313.4
-2.45
0.014
0.9742
0.1090
8.93
0.000
0.021263
0.003310
6.42
0.000
EDUCATION
39.55
22.25
1.78
0.076
PEOPLE81
216.57
27.52
7.87
0.000
-0.033617
0.007275
-4.62
0.000
INCOME
WATER80*EDUCATION
S = 835.152
R-Sq = 68.7%
R-Sq(adj) = 68.4%
Math 321 - Dr. Minnotte
512
12 years?
16 years?
513
Example: Yield
(kg/ha) vs. time
to harvest (days
after flowering)
for paddy, a
grain from India.
Math 321 - Dr. Minnotte
514
171
R-Sq = 79.4%
R-Sq(adj) = 76.2%
Analysis of Variance
Source
Regression
DF
SS
MS
2084779
1042390
25.08
0.000
41568
Error
13
540388
Total
15
2625168
515
516
Even most statisticians deal with pregenerated statistics in journals and the
news far more frequently then they are
called upon to compute them themselves.
517
172
519
518
520
173
This means:
521
522
523
174
524
525
526
175
527
528
529
176
530
531
532
177
77% extremely (58%) + quite (19%), 22% not that (7%) + not
at all (15%)
Math 321 - Dr. Minnotte
533
From Republican
congressman
John Culbersons
web page:
534
535
178
536
537
538
179
539
540
Dangers of Inference
Garbage In Garbage Out
541
180
Data snooping
542
543
544
181
545
546
547
182