You are on page 1of 36

Modern Methods of

Data Analysis
Lecture V (12.11.07)

Contents:
Central Limit Theorem
Uncertainties:
concepts, propagation and properties

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Central Limit Theorem
Consider the sum X of n independent variables ,
with i = 1,2,3, ..., each taken from a different distribu-
tion with expectation value and variance

Then the distribution for has the following


properties:

Its expectation value is

Its variance is

It becomes Gaussian distributed for n

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Uniform distribution

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Uneven Distribution

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Exponential Distribution

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


But ....
In order to work
No source contributes significant to overall
variance
multiple scattering Distribution with few
events in large tails
need many more
statistic to converge.

CLT unfortunately
doesn't work for
many physics
applications!

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Application: Many repeated Measurements

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


m(B0) = 5279.63 0.53 (stat) 0.33 (sys)
CDF has a mass resolution of 16 MeV
the reconstructed mass of a single B meson is spread
around the true B mass with =16 MeV
The B mass can be measured with way better precision

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Law of large numbers ...

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Weighted Mean (I)
Combining measurements with different
uncertainties: Twice time measurement of the
seed of a car:

v1 = 67 4 m/s
v2 = 62 2 m/s

Uncertainty on mean is larger than


single uncertainty ... ???
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Weighted Mean (II)
To reach = 2 m/s, 4 single measurements
with = 4 m/s are needed. Therefore this
measurement should get 4 times the weight!

More general:

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Weighted Mean (III)

v1 = 67 4 m/s
v2 = 53 2 m/s

????

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


New Scientist, 31 March 1988

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Neutron Lifetime
(PDG 2006)

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Particle Data Group: World Average
1) Calculate weighted mean
2) Calculate

there are 3 cases to consider


- < 1: all fine, simple weighted mean is OK
- 1: Depending on the reason, either
make no average at all, or quote calculated average and
make educated guess of error, taking into account known
problems with data
- > 1: Errors on some or all measurements
may have been underestimated, scale all of them with:

to compute S, reject ones with larger uncertainties


Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Ideogram

http://pdg.lbl.gov/2007/reviews/textrpp.pdf
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Reminder: Covariance
covariance of two variables x, y is defined as:

if two variables are uncorrelated cov(x,y) = 0


If cov(x,y)=0 x,y are uncorrelated
Correlation is defined as:

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Properties Correlations
Example of correlations for two random variables x and y
the covariance V(x,y) or cov(x,y) can be represented by a
matrix (often called error matrix)

General case of correlations for n random variables, there is


covariance between each pair of variables

analogous, correlation matrix is defined

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Error Propagation (I)
x=
Vi,j and i known
y(x) is function of
first order Taylor expansion ...

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Error Propagation (II)

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Gaussian error propagation
Error estimates for functions of several correlated
variables :

Additional terms accounting


Normal errors for correlations
for uncorrelated variables
Special case, uncorrelated variables:

This is called Gaussian error propagation, however has


nothing to do with Gaussian distributions
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
And the same in more dimensions

(A is Jacobi matrix)

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Example: Track Parametrization (I)

Typical tracking chamber measures 3D, due to


symmetry uses cylindrical coordinates: (r,,z)

Want to know what are the uncertainties in Cartesian


coordinates (x,y,z): x = r cos; y = r sin

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Example: Track Parametrization(II)

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Exercise:
x,y are two random variables with the
correlation matrix V:

a = 3/7 x +1/7 y; b = 1/7 x - 2/7y;

Give the correlation matrix of a,b


(note this time it is not an approximation ...)

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Orthogonal Transformation
For n variables one can always find a linear
transformation such that the in the
covariance matrix of the new set of variables
is diagonal.

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Exercise: Some standard formulae

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Example: Measure Asymmetry (I)
Define A = (F-B)/(F+B) an asymmetry of events in
forward and backward hemisphere of the detector
(e.g. @ LEP). Here F (B) is the number of events
in forward (backward) hemisphere

Case I: If the events in forward and backward hemisphere


are uncorrelated then:

if errors are individually errors are dominated


Poisson distributed for by smaller counting rate
F and B, then
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Example: Measure Asymmetry (II)
case II: events in forward and backward hemisphere are
completely anti-correlated, N is thus fixed

- distribution of events F and B are then given by Binomial


distribution; let p be probability that event is in forward h.s.:

- this is the same expression as before demonstrates


relationship between Poisson & Binomial
- either Poisson prob of obtaining N events altogether times
binomial prob. of having F events in forward
- or: two independent Poisson prob. in the number of
backward and forward events
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Repetition: Histogram Interpretation of bins

A histogram can be equivalently regarded as:

1. A Poisson distribution in the overall number of


events N and the corresponding multinomial
distribution of obtaining events in
each bin

2. An independent Poisson distribution of the


number of entries in each bin
of the histogram.

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Exercise: Detector Efficiency
Compute the error on measuring detector efficiency
in two different ways:

Binomial distribution; p: probability to detect traversing


particle, N number of events

Using error propagation, : number of detected events


:number of not detected events, treat as
independent variables

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Be aware ....
The approximation using Taylor expansion
breaks down if the function is significantly not
linear in the region 1 around the mean value.
Example: momentum estimate in B field; p ~ 1/

10 % momentum bias!

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


st
Failure of (1 order) error propagation
experiment had data to look for non-zero mass of electron
neutrino
Quantity R was measured:

Don't need to know details. Sufficient to know: a, b, c, d, e


are measured quantities and K is constant. If R < 0.42, then
neutrino must have mass.
The experiment measured R=0.165 and found with error
propagation (R) = 0.073 -> 3 evidence for neutrino mass!
however, the formula is highly non-linear ...
1st order error propagation not applicable!

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


What to do instead?
Use MC methods!
Throw Gaussian distributed values for a,b,c,d,e
Compute R; repeat toy many times and check how
often you are below the measured value of R.
In example this happens in 4% of the cases. This
is the so-called p-value of this result.
In many physics analysis, simple Gaussian error
propagation not valid or too complicated ...
(e.g. highly non linear functions, many correlated variables)

p-value MC method always works!

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

You might also like