Professional Documents
Culture Documents
bheavlin@stat.stanford.edu
Statistics
Numerical Graphical
Stat 110 bheavlin@stat.stanford.edu
Numerical measures scale better as #groups grow
# points/group # groups
stem&leaf, 0.8
0.6
moderate-to-high 1-2
0.4
Histogram 0.2
0.0
1.0
0.8
0.6
0.4
Bin width
hn =1,2,5 x 10k
hn = 2 x IQR / n1/3 or
hn = 1.66 x stdev x [ loge(n)/ n ]1/3
MAD = 0.8908
medAD
Stat 110 = 0.63 bheavlin@stat.stanford.edu
A statistical “trilemma”
simple
efficient robust
y–y
z =
s
“fence”
• IQR = 75%ile – 25%ile
• step = 1.5 × IQR
• lower inner fence = min s.t. > 25%─step
• upper inner fence = max s.t. < 75%─step
Stat 110 bheavlin@stat.stanford.edu
Statistical rules for detecting outliers
Boxplot rule:
• beyond the inner fence, “suspect.” ~7000ppm
• beyond the outer fence, “highly suspect.” ~1ppm
Why boxplots? 20
10
• Visual presentation of a 0
Multiply nested Cu 2
label scale.
3
• Horizontal format.
• Boxplot by group. 1
• (Optional jitter).
Fe 2
Pb 2
3
Stat 110 bheavlin@stat.stanford.edu
Problem 2.28
■ (outlier)
0.16
0.2
0.14 Pb 0.18 Cu 0.34
0.32 Fe
0.12 0.16 0.3
0.28
0.1 0.14 0.26
0.12 0.24
0.08 0.22
0.1
0.2
0.06
0.08 0.18
0.04 0.16
0.06
0.14
0.02 0.04 0.12
0.1
0
0.25
Pb
0.15
quantiles of 0.1
Cu
another. 0.05
0.35
• Linear pattern 0.3
“shape.” 0.15
• #group 1 need not .05.1 .2.25.3 .4 .05 .1 .15 .2 .15 .2 .25 .3 .35
equal #group 2…
0.25
0.15 Pb
0.15
Cu
0.05
0.1
0.2
0.15
0.05 Cu
0.1
r = Σi ( yi – y
sy
)( xi – x
sx
) Pb 1.00 0.36 0.31
–1 ≤ r ≤ 1
Fe 0.31 0.70 1.00
-1.0
benchmarks 1.5
0.5
"Stream"
-0.5 arch04
-1.5
notes: 0.5
"Math"
-0.5 arch03
• green=AMD
-1.5
• blue =Intel
0.5
"Games"
-0.5 arch01
-1.5
3.6
3.4
3.2 log10 MHz
3
2.8
-1.0 .0 .5 1.5 -1.5-0.5 .5 1.5 -1.5 -0.5 .51.0 -1.5 -0.5 .5 2.8 3 3.2 3.4 3.6