You are on page 1of 47

Introduction to Error Analysis

Part 1: the Basics


Andrei Gritsan
based on lectures by Petar Maksimovi c
February 1, 2010
Overview
Denitions
Reporting results and rounding
Accuracy vs precision systematic vs statistical errors
Parent distribution
Mean and standard deviation
Gaussian probability distribution
What a 1 error means
Denitions

true
: true value of the quantity x we measure
x
i
: observed value
error on : difference between the observed and true
value, x
i

true
All measurement have errors true value is unattainable
seek best estimate of true value,
seek best estimate of true error x
i

One view on reporting measurements (from the book)


keep only one digit of precision on the error everything
else is noise
Example: 410.5163819 4 10
2
exception: when the rst digit is 1, keep two:
Example: 17538 1.7 10
4
round off the nal value of the measurement up to the
signicant digits of the errors
Example: 87654 345 kg (876 3) 10
2
kg
rounding rules:
6 and above round up
4 and below round down
5: if the digit to the right is even round down, else round
up
(reason: reduces systematic erorrs in rounding)
A different view on rounding
From Particle Data Group (authority in particle physics):
http://pdg.lbl.gov/2009/reviews/rpp2009-rev-rpp-intro.pdf
between 100 and 354, we round to two signicant digits
Example: 87654 345 kg (876.5 3.5) 10
2
kg
between 355 and 949, we round to one signicant digit
Example: 87654 365 kg (877 4) 10
2
kg
lie between 950 and 999, we round up to 1000 and keep two
signicant digits
Example: 87654 950 kg (87.7 1.0) 10
3
kg
Bottom line:
Use consistent approach to rounding which is sound and
accepted in the eld of study, use common sense after all
Accuracy vs precision
Accuracy: how close to true value
Precision: how well the result is determined (regardless
of true value); a measure of reproducibility
Example: = 30
x = 23 2 precise, but inacurate
uncorrected biases
(large systematic error)
x = 28 7 acurate, but imprecise
subsequent measurements will scatter around
= 30 but cover the true value in most cases
(large statistical (random) error)
an experiment should be both acurate and precise
Statistical vs. systematic errors
Statistical (random) errors:
describes by how much subsequent measurements
scatter the common average value
if limited by instrumental error, use a better apparatus
if limited by statistical uctuations, make more
measurements
Systematic errors:
all measurements biased in a common way
harder to detect:
faulty calibrations
wrong model
bias by observer
also hard to determine (no unique recipe)
estimated from analysis of experimental conditions and
techniques
may be correlated
Parent distribution
(assume no systematic errors for now)
parent distribution: the probability distribution of results if
the number of measurements N
however, only a limited number of measurements: we
observe only a sample of parent dist., a sample distribution
prob. distribution of our measurements only approaches
parent dist. with N
use observed distribution to infer the parameters from the
parent distribution, e.g.,
true
when N
Notation
Greek: parameters of the parent distribution
Roman: experimental estimates of params of parent dist.
Mean, median, mode
Mean: of experimental (sample) dist:
x
1
N
N

i=1
x
i
. . . of the parent dist
lim
N
_
1
N

x
i
_
mean centroid average
Median: splits the sample in two equal parts
Mode: most likely value (highest prob.density)
Variance
Deviation: d
i
x
i
, for single measurement
Average deviation:
x
i
= 0 by denition
|x
i
|
but, absolute values are hard to deal with analytically
Variance: instead, use mean of the deviations squared:

2
(x
i
)
2
= x
2

2

2
= lim
N
_
1
N

x
2
i
_

2
(mean of the squares minus the square of the mean)
Standard deviation
Standard deviation: root mean square of deviations:

2
=
_
x
2

2
associated with the 2nd moment of x
i
distribution
Sample variance: replace by x
s
2

1
N 1

(x
i
x)
2
N 1 instead of N because x is obtained from the same
data sample and not independently
So what are we after?
We want .
Best estimate of is sample mean, x x
Best estimate of the error on x (and thus on is square root
of sample variance, s

s
2
Weighted averages
P(x
i
) discreete probability distribution
replace

x
i
with

P(x
i
)x
i
and

x
2
i
by

P(x
i
)x
2
i
by denition, the formulae using are unchanged
Gaussian probability distribution
unquestionably the most useful in statistical analysis
a limiting case of Binomial and Poisson distributions (which
are more fundamental; see next week)
seems to describe distributions of random observations for
a large number of physical measurements
so pervasive that all results of measurements are always
classied as gaussian or non-gaussian (even on Wall
Street)
Meet the Gaussian
probability density function:
random variable x
parameters center and width :
p
G
(x; , ) =
1

2
exp
_

1
2
_
x

_
2
_
Differential probability:
probability to observe a value in [x, x + dx] is
dP
G
(x; , ) = p
G
(x; , )dx
Standard Gaussian Distribution:
replace (x )/ with a new variable z:
p
G
(z)dz =
1

2
exp
_

z
2
2
_
dz
got a Gaussian centered at 0 with a width of 1.
All computers calculate Standard Gaussian rst, and then
stretch it and shift it to make p
G
(x; , )
mean and standard deviation
By straight application of denitions:
mean = (the center)
standard deviation = (the width)
This makes Gaussian so convenient!
Interpretation of Gaussian errors
we measured x = x
0

0
; what does that tell us?
Standard Gaussian covers 0.683 from 1.0 to +1.0
the true value of x is contained by the interval
[x
0

0
, x
0
+
0
] 68.3% of the time!
The Gaussian distribution coverage
Introduction to Error Analysis
Part 2: Fitting
Overview
principle of Maximum Likelihood
minimizing
2
linear regression
tting of an arbitrery curve
Likelihood
observed N data points, from parent population
assume Gaussian parent distribution (mean , std.
deviation )
probability to observe x
i
, given true ,
P
i
(x
i
|, ) =
1

2
exp
_

1
2
_
x
i

_
2
_
probability to have measured

in this single measurement,


given observed x
i
and
i
, is called likelihood:
P
i
(

|x
i
,
i
) =
1

2
exp
_

1
2
_
x
i

_
2
_
for N observations, total likelihood is
L P(

) =
N
i=1
P
i
(

)
Principle of Maximum Likelihood
maximizing P(

) gives

as the best estimate of


(the most likely population from which data might have come is
assumed to be the correct one)
for Gaussian individual probability distributions
(
i
= const = )
L = P() =
_
1

2
_
N
exp
_

1
2

_
x
i

_
2
_
maximizing likelihood minimizing argumen of Exp.

_
x
i

_
2
Example: calculating mean
cross-checking...
d
2
d

=
d
d

_
x
i

_
2
= 0
derivative is linear


_
x
i

_
= 0

= x
1
N

x
i
The mean really is the best estimate of the measured quantity.
Linear regression
simplest case: linear functional dependence
measurements y
i
, model (prediction) y = f(x) = a +bx
in each point, y(x
i
) = a + bx
i
( special case f(x
i
) = const = a = )
minimize

2
(a, b) =

_
y
i
f(x
i
)

_
2
conditions for a minimum in 2-dim parameter space:

2
(a, b) = 0

b

2
(a, b) = 0
can be solved analytically, but dont do it in real life
(e.g. see p.105 in Bevington)
Familiar example from day one: linear t
A program (e.g. ROOT) will do minimization of
2
for you
x label [units]
1 2 3 4 5

y

l
a
b
e
l

[
u
n
i
t
s
]

1
1.5
2
2.5
3
3.5
4
4.5
5
example analysis example analysis
This program will give you the answer for a and b
(= )
Fitting with an arbitrary curve
a set of measurement pairs (x
i
, y
i

i
)
(note no errors on x
i
!)
theoretical model (prediction) may depend on several
parameters {a
i
} and doesnt have to be linear
y = f(x; a
0
, a
1
, a
2
, ...)
identical approach: minimize total
2

2
({a
i
}) =

_
y
i
f(x
i
; a
0
, a
1
, a
2
, ...)

i
_
2
minimization proceeds numerically
Fitting data points with errors on both x and y
x
i

x
i
, y
i

y
i
Each term in
2
sum gets a correction from the
x
i
contribution:

_
y
i
f(x
i
)

y
i
_
2

(y
i
f(x
i
))
2
(
y
i
)
2
+
_
f(x
i
+
x
i
)f(x
i

x
i
)
2
_
2
x
y(x)
f(xsx)
error on x
contribution
to y chi2
f(x)
f(x+sx)
Behavior of
2
function near minimum
when N is large,
2
(a
0
, a
1
, a
2
, . . .) becomes quadratic
in each parameter near minimum

2
=
(a
j
a

j
)
2

2
j
+ C
known as parabollic approximation
C tells us about goodness of the overall t (function of all
uncertaintines + other {a
k
} for k = j
a
j
=
j

2
= 1
valid in all cases!
parabollic error is the curvature at the minimum

2
a
2
j
=
2

2
j

2
shapes near minimum: examples
better errors
worse fit
a

2
asymmetric errors
better fit
Two methods for obtaining the error
1.

2
j
=
2

2
a
2
j
2. scan each parameter around minimum while others are xed
until
2
= 1 is reached
method #1 is much faster to calculate
method #2 is more generic and works even when the shape
of
2
near minimum is not exactly parabollic
the scan of
2
= 1 denes a so-called one-sigma contour.
It contains the truth with 68.3% probability
(assuming Gaussian errors)
What to remember
In the end the t will be done for you by the program
you supply the data, e.g. (x
i
, y
i
)
and the t model, e.g. y = f(x; a
1
, a
2
, ...)
the program returns a
1
=
1

1
, a
2
=
2

2
,...
and the plot with the model line through the points
You need to understand what is done
In more complex cases you may need to go deep into code
Introduction to Error Analysis
Part 3: Combining measurements
Overview
propagation of errors
covariance
weighted average and its error
error on sample mean and sample standard deviation
Propagation of Errors
x is a known function of u, v. . .
x = f(u, v, ...)
assume that most probable value for x is
x = f( u, v, ...)
x is the mean of x
i
= f(u
i
, v
i
, ...)
by denition of variance

x
= lim
N
_
1
N

(x
i
x)
2
_
expand (x
i
x) in Taylor series:
x
i
x (u
i
u)
_
x
u
_
+ (v
i
v)
_
x
v
_
+
Variance of x

2
x
lim
N
1
N

_
(u
i
u)
_
x
u
_
+ (v
i
v)
_
x
v
_
+
_
2
lim
N
1
N

_
(u
i
u)
2
_
x
u
_
2
+ (v
i
v)
2
_
x
v
_
2
+2(u
i
u)(v
i
v)
_
x
u
__
x
v
_
+
_

2
u
_
x
u
_
2
+
2
v
_
x
v
_
2
+ 2
2
uv
_
x
u
__
x
v
_
+
This is the error propagation equation.

uv
is COvariance. Describes correlation between errors on u
and v.
For uncorrelated errors
uv
0
Examples
x = u + a
where a =const. Thus x/u = 1

x
=
u
x = au + bv
where a, b =const.

2
x
= a
2

2
u
+ b
2

2
v
+ 2ab
2
uv
correlation can be negative, i.e.
2
uv
< 0
if an error on u counterballanced by a proportional error
on v,
x
can get very small!
More examples
x = auv
_
x
u
_
= av
_
x
v
_
= au

2
x
= (av
u
)
2
+ (au
v
)
2
+ 2a
2
uv
2
uv

2
x
x
2
=

2
u
u
2
+

2
v
v
2
+ 2

2
uv
uv
2
x = a
u
v

2
x
x
2
=

2
u
u
2
+

2
v
v
2
2

2
uv
uv
2
etc., etc.
Weighted average
From part #2: calculation of the mean
minimizing

2
=

_
x
i

i
_
2
minimum at d
2
/d

= 0, but now
i
= const.
0 =

_
x
i

2
i
_
=

_
x
i

2
i
_

_
1

2
i
_
so-called weighted average is

_
x
i

2
i
_

_
1

2
i
_
each measurement is weighted by 1/
2
i
!
Error on weighted average
N points contribute to a weighted average

straight application of the error propagation equation:

2
i
_

x
i
_
2
_

x
i
_
=

x
i

(x
j
/
2
j
)

(1/
2
k
)
=
1/
2
i

(1/
2
k
)
putting both together
1

=
_

2
i
_
1/
2
i

(1/
2
k
)
_
2
_
1
=

2
k
Example of weighted average
x
1
= 25.0 1.0
x
2
= 20.0 5.0
error

2
=
1
1/1 + 1/5
2
= 25/26 0.96 1.0
weighted average
x =
2
_
25
1
2
+
20
5
2
_
=
25
26

25 25 + 20 1
25
= 24.8
result: x = 24.8 1.0
morale: x
2
practically doesnt matter!
Error on the mean
N measurements from the same parent population (, )
from part #1: sample mean

and sample standard


deviation are best estimators of the parent population
but: more measurements still gives same :
our knowledge of shape of parent population improves
and thus of original true error on each point
but how well do we know the true value? (i.e. ?)
if N points from same population with :
1

2
=
N

N
Standard deviation of the mean, or standard error.
Example: Lightness vs Lycopene content, scatter plot
Lightness
30 40 50 60
L
y
c
o
p
e
n
e

c
o
n
t
e
n
t
50
60
70
80
90
100
110
120
lyc:Light
Example: Lightness vs Lycopene content: RMS as Error
Lightness
30 35 40 45 50 55
L
y
c
o
p
e
n
e

c
o
n
t
e
n
t
60
65
70
75
80
85
90
95
100
105
Lightness vs Lycopene content -- spread option
Points dont scatter enough the error bars are too large!
Example: Lightness vs Lycopene content: Error on Mean
Lightness
30 35 40 45 50 55
L
y
c
o
p
e
n
e

c
o
n
t
e
n
t
60
65
70
75
80
85
90
95
100
Lightness vs Lycopene content
This looks much better!
Introduction to Error Analysis
Part 4: dealing with non-Gaussian cases
Overview
binomial p.d.f.
Poisson p.d.f.
Binomial probability density function
random process with exactly two outcomes (Bernoulli
process)
probability for one outcome (success) is p
probability for exactly r successes (0 r N) in N
independent trials
order of successes and failures doesnt matter
binomial p.d.f.:
f(r; N, p) =
N!
r!(N r)!
p
r
(1 p)
Nr
mean: Np
variance: Np(1 p)
if r and s are binomially distributed with (N
r
, p) and
(N
s
, p), then t = r + s distributed with (N
r
+ N
s
, p).
Examples of binomial probability
Binomial distribution always shows up when data exhibits
binary properties:
event passes or fails
efciency (an important exp. parameter) dened as
= N
pass
/N
total
particles in a sample are positive or negative
Poisson probability density function
probability of nding exactly n events in a given interval of
x (e.g., space and time)
events are independent of each other and of x
average rate of per interval
Poisson p.d.f. ( > 0)
f(n; ) =

n
e

n!
mean:
variance:
limiting case of binomial for many events with low
probability:
p 0, N while Np =
Poisson approaches Gaussian for large
Examples of Poisson probability
Shows up in counting measurements with small number of
events
number of watermellons with circumference
c [19.5, 20.5]in.
nuclear spectroscopy in tails of distribution
(e.g. high channel number)
Rutherford experiment
Fitting a histogram with Poisson-distributed content
Poisson data require special treatment in terms of tting!
histogram, i-th channel contains n
i
entries
for large n
i
, P(n
i
) is Gaussian
Poisson =

approximated by

n
i
(WARNING: this is what ROOT uses by default!)
Gaussian p.d.f. symmetric errors
equal probability to uctuate up or down
minimizing
2
t (the true value) is equally likely to be
above and below the data!
Comparing Poisson and Gaussian p.d.f.
Expected number of events, x
-2 0 2 4 6 8 10 12 14 16 18
P
r
o
b
(
5

|

x
)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Expected number of events, x
0 2 4 6 8 10 12 14
C
u
m
m
u
l
a
t
i
v
e

P
r
o
b
a
b
i
l
i
t
y
0
0.1
0.2
0.3
0.4
0.5
0.6
5 observed events
Dashed: Gaussian at 5 with =

5
Solid: Poisson with = 5
Left: prob.density functions (note: Gauss can be < 0!)
Right: condence integrals (p.d.f integrated from 5)

You might also like