Professional Documents
Culture Documents
Way Too
No Data Some Data Much Data
Inferential Descriptive
Probability Statistics Statistics
• comparing proportions
• paired vs two-sample (again)
• sample size calculations
• hypothesis testing
• power transformations
• the other two-sample test
• the k-sample problem
Pr{“Y”(0)|“Y”(-1)}
= this year's retention rate
Pr{“Y”(0)|“N”(-1)} re-enlistment “no”
= this year's re-enlistment rate
Pr{“N”(0)|“Y”(-1)}
= this year's de-enlistment rate
= 1–Pr{“Y”(0)|“Y”(-1)} = 1-retention rate equilibrium RI/(1-RI)
Pr{“Y”(0)|“N”(-1)}
conversion rates Pr{“N”(0)|“Y”(-1)}
Pr{“Y”(0)|“N”(-1)} and Pr{“N”(0)|“Y”(-1)}
Stat 110 bheavlin@stat.stanford.edu
Schredder-Schredder chess match
Intel=L Intel=W
AMD=W Draw AMD=L
AMD white 16 44 11 Intel black
AMD black 11 40 19 ntel white
0.3
0.2
0.1
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
standard normal z
Stat 110 bheavlin@stat.stanford.edu
…p-value version
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic z = d / (σ/n1/2) = d × n1/2/σ
One-sided p-value:
0.4
0.2
= P( d /(σ/n )> zpv |Δ=0)
1/2
0.1
= 1– Φ(d /(σ/n1/2)) 0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
=StatΦ(–d×n
110
1/2
/σ) standard normal z
bheavlin@stat.stanford.edu
…two-tailed p-value
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic z = d / (σ/n1/2) = d × n1/2/σ
Two-sided p-value:
p2-value = P(|zobs| > zpv | Ho ) 0.4
=1–[Φ(|d|/(σ/n1/2))–Φ(-|d|/(σ/n1/2))] 0.2
= 2×Φ(-|d|×n1/2/σ) 0.1
= 2×p1-value 0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
standard normal z
Stat 110 bheavlin@stat.stanford.edu
one-sample mean, σ unknown, p-value
Null hypothesis Ho : Δ=0.
Alternative HΔ : Δ=ΔA.
Test statistic t = d / (s/n1/2) = d × n1/2/s
Rejection region: |t| > tdf=n-1 (α/2)
0.4
0.3
Two-sided p-value:
0.2
p2-value = P(|tobs|>tpv |Ho )
0.1
= P( |d|/(s/n1/2)> tpv|Δ=0) 0
= 2×T(–|d|×n1/2/s, df=n-1) -6 -5 -4 -3 -2 -1 0
t(df=4)
1 2 3 4 5 6
mean = –7.20 t = d √n / s
stdev = 11.52 = –7.2×√5 / 11.52 = –1.40
t(df=4,0.975)=2.776 p-value = 0.117
Stat 110 bheavlin@stat.stanford.edu
Wafer position sequence effects
120
How to adjust
for this effect 100
in the 1st 12
80
vs 2nd 12 split
experiment? 60
zig-zag
even-vs-odd
40 early effect
wafer
20
effect
0
0 5 10 15 20 25
wafer processing sequence
Stat 110 bheavlin@stat.stanford.edu
20 concurrent unsplit lots
lot split 01-12 13-24 lot split 01-12 13-24
6 no splits 62.5 58.8 17 no splits 66.7 72.9
7 no splits 50.5 30.6 18 no splits 75.4 68.6
8 no splits 72.5 71.6 19 no splits 78.3 81.4
9 no splits 86.0 73.8 20 no splits 75.5 77.8
10 no splits 68.6 59.3 21 no splits 84.0 73.6
11 no splits 76.6 78.2 22 no splits 79.6 78.5
12 no splits 55.6 44.3 23 no splits 77.9 74.7
13 no splits 64.6 71.3 24 no splits 64.8 61.7
14 no splits 73.5 77.5 25 no splits 69.6 70.2
15 no splits 81.3 77.7 26 no splits 70.7 71.4
16 no splits 66.0 3.1
Stat 110 bheavlin@stat.stanford.edu
Adjusting for the wafer position bias
delta
-30 -20 -10 0 10 20
df 24
no splits
so 0.4
0.3
Δ√n/σ =1
z1–β = zα – ΔA×n1/2/σ or 0.2
0.1
–zβ = zα – ΔA×n1/2/σ or 0
0.4
ΔA×n /σ = zα + zβ
1/2
or 0.3 Δ√n/σ =2
0.2
n 1/2
= ( zα + zβ ) ×σ / ΔA 0.1
0
n = ( zα + zβ )2×σ2/ ΔA2
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Guenther’s refinement:
2
n = 2[zα/2 + zβ]2 σ2/(μ1─μ0)2 + zα/2/4
power=0.5 power=0.9
non- non-
alpha Δ/σ n* Guenther central n* Guenther central
0.05 0.25 122.93 123.89 123.88 336.24 337.2 337.2
0.05 0.5 30.73 31.69 31.71 84.06 85.02 85.03
0.05 0.75 13.66 14.62 14.67 37.06 38.32 38.34
0.05 1 7.68 8.64 8.73 21.01 21.98 22.02
0.05 1.25 4.92 5.88 6.02 13.45 14.41 14.48
0.05 1.5 3.41 4.37 4.57 9.34 10.3 10.4
0.05 1.75 1.92 2.88 3.17 5.25 6.21 .39
0.3
one-sided p-value:
p1-value 0.2
0.0
= 1– χ2Inv(df=ν, νs2/σo2 ) 0 1 2 3 4 5 6 7 8
Stat 110 bheavlin@stat.stanford.edu
Two-sample variance problem
Null hypothesis Ho : σ1 = σ2
Alternative HA : σ1 ≠ σ 2
Test statistic and s12 / s22
rejection region < F(ν1,ν2, 0.025) or
> F(ν1,ν2, 0.975)
two-sided p-value:
Label so that s1 > s2
p2-value
= 2×P( s12/s22 > F Inv(ν1,ν2,1–p2/2) | Ho)
= 2×F(ν1, ν2, s12/s22)
Stat 110 bheavlin@stat.stanford.edu
CPU times (reprise)
3 3 3
-3 -3 -3
0 1 2 3 4 5 0 1 2 -6 -5 -4 -3 -2 -1 0 1 2 3
raw
data group mean stdev
1.46 0.58 4.31 1.02 1 2.89 2.62
1.30 8.24 3.51 6.87 2 3.56 4.10
5.92 1.86 1.41 1.70 3 3.08 1.50
4 3.20 3.20
0.17 2.92 0.91 0.43 5 1.21 0.95
1.43 1.44 4.49 4.21 6 2.00 0.80
2.02 1.65 1.40 .40 7 2.27 1.94
8 2.01 .96
Stat 110 bheavlin@stat.stanford.edu
a few power transformations
5 1.25
4
1
stdev (linear)
stdev (sqrt)
3
0.75
2
0.5
1
linear sqrt
0 0.25
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 1.2 1.4 1.6 1.8
mean (linear) mean (sqrt)
2
stdev 1.75
1.5
stdev (log)
1.25
0.75
mean log
0.5
-0.5 0 .5 1 1.5
mean (log)
Stat 110 bheavlin@stat.stanford.edu
Why power transformations?
Theoretical reasons
• align physical relationships to (linear) statistical models.
Empirical reasons
• to reduce correlations of group variances to group
means.
• reduces the influence of large values without making
them into outliers
• reduces the skewness in right-skewed data (λ<1).
• resolve an ambiguity in scale, (e.g a rate vs its
reciprocal).
Preference order:
• λ = 0 (logs), 1/2 (square roots), -1 (inverses),
1/3 (cube roots~logs with zeros)
Stat 110 bheavlin@stat.stanford.edu
Box-Cox transformations
“poor man’s” Box-Cox
procedure
2. For each group, calculate
the mean, and the
standard devation.
3. Plot the log(stdev) vs log
(mean).
4. Estimate the slope, say r.
What are they? 5. The recommended
Response y → yλ power for transforming
Note: y → (yλ ─1)/ λ the raw data is 1–r,
@1 equals 0 with slope 1 (suitably rounded).
deviation is proportional 2
log stdev
1
0
Box-Cox plots log(σ(μ)) vs -0.5
log(μ): 0 .5 1 1.5 2
log mean
log(σ(μ))= 1×log(μ)+log(σo),
slope = 1.24
slope r is 1, 1–r=0, so
transform by taking logs 1–r = –0.24, suggests logs
of raw data. or reciprocal sqrts.
Stat 110 bheavlin@stat.stanford.edu
linear vs log: plots of transformed data
Ra226 (linear) log2 Ra226
0 1 2 3 4 5 6 7 8 9 -3 -2 -1 0 1 2 3
g2
g2
A A
B B
C C
D D
E E
F F
G G
H H
linear logarithms
Stat 110 bheavlin@stat.stanford.edu
Mis-calibration:
thickness
Target thickness
β (mean) is β to
target
thickness deviation is (β – b)to is
b
actual proportional to
time mean.
to
So multiplicative relationships tend to promote
constant coefficients of variation, and log
transforms.
Stat 110 bheavlin@stat.stanford.edu
sums of small positive errors
actual thickness
Examples:
= ( Σ bi Δt i ) Poisson mean = λ
with variance with variance = λ
= ( Σ Δti2 Var(bi) ) sums of independent
Poissons are Poisson
= (Δt Var(b)) ΣiΔti
= σbΔ2 to so variance is Chi-square (gamma):
proportional to square root
mean =ν
of mean.
variance = 2 ν
sums of independent chi-
squares still chi-squares
Stat 110 bheavlin@stat.stanford.edu
Why Box-Cox works:
Background theory: Setup:
g(X) ≈ g(μ) + g'(μ)(X – μ ) or log( σ(μ) ) = k + r log(μ) or
g(X) – g(μ) ≈ g'(μ)(X – μ ) so log(σ2(μ)) = 2k + 2rlog(μ) or
E(g(X) – g(μ))2 σ2(μ)) = c μ2r
≈ g'(μ)2 E(X – μ)2 so
Var(g(X)) ≈ g'(μ)2 Var(X) Suppose g(x) = x1– r then
g'(x) = (1–r) x– r or
g'(x)2 = (1–r)2 x–2r so
Var(g(X)) ≈ g'(μ)2 Var(X)
= (1–r)2 μ–2r c μ2r
≈ constant with respect to μ
Stat 110 bheavlin@stat.stanford.edu
The other two-sample t-test
Stat 110
(n1–1) + (n2–1)
bheavlin@stat.stanford.edu
issues
df = νg = ng –1 4 20
–7.2 – 3.49
= -1.988 = t with df=4.735
[ 28.902 ]1/2
meas-to-meas: σmeas
(repeatability) +m
total variation:
2 2
σtotal = [σday +σmeas]1/2
d+m
(reproducibility)
Stat 110 bheavlin@stat.stanford.edu
Estimating these two variances