You are on page 1of 13

Introduction to Statistics and Error Analysis II

Physics116C, 4/14/06 D. Pellett


References: Data Reduction and Error Analysis for the Physical Sciences by Bevington and Robinson Particle Data Group notes on probability and statistics, etc.online at http://pdg.lbl.gov/2004/reviews/contents_sports.html (Reference is S. Eidelman et al, Phys. Lett. B 592, 1 (2004))
Any original presentations copyright D. Pellett 2006

Topics

Propagation of errors examples

Background subtraction Error for x = a ln(bu)

Estimates of mean and error method of maximum likelihood and error of the mean Weighted mean, error of weighted mean Condence intervals Chi-square (2) distribution and degrees of freedom () Histogram 2

Format for lab writeup Next - least squares t to a straight line and errors of parameters
2

Brief Article of Errors Overview: Propagation

Brief overview

Brief Article The Author

Suppose we have x = f(u,v) and u and v are uncorrelated The Author April 8, 2006 random variables with Gaussian distributions. Expand f in a TaylorsApril series around x = f(u0,v0) where u0,v0 8, 2006 0 are the mean values of u and v, keeping the lowest order terms:
x x x0 = f u u + f v v

The distribution of xis fdistribution in u and v. f a bivariate u + v x = Under suitable conditions u 2(see Bevington v 2 Ch. 3) we can f f 2 (the 2 deviation 2 x) by approximate standard of x u + v x
x 2

u f 2 2 u + u

v f 2 2 v v

Bevington Problem 2.8 Solution

of a binomial distribution. For a total of 1035 cars, we know that on a of people go right, 25% left. On this particular day 752 went right, so le right a success. Lets assume the 75% probability is very well established 1035! PB (x; 1035, 0.75) = x!(1035 x)! 3 4
x

2.8) There are two at discrete choices with a choices, xed probability forleft; eachp=0.75 choice, so this is an example Cars fork in road: two right or for right of a binomial distribution. For a ofnumber 1035 cars, know that typical day 75% (776.25) where x total is the of we cars going to on thearight. on a typical day. of people go right, 25% left. On this particular day 752 went right, so lets call a car going to the right a success. Lets assume the 75% probability is very well established. Thus,

1 4

1035

We are supposed to nd the probability of getting a result as far or farther from the mean (assume x = 752 cars turned right on a particular day the deviation can be less than or more than the mean). This would require summing this ugly It is convenient to transform this to a unit normal distribution in the var function from 0 to 752 and 800 to 1035 (or one minus the sum from 753 to 799). Since n 1 has = 0 with and = 1. aSince we day? allow deviations in either direction fro Is this consistent being typical and x 1, we can approximate the Binomial with a Gaussian if we are not too near x = n. The 776.25|/13.9 = 1.745. The probability for falling inside the in |z |by = |752 standard deviation is given = npq = 13.9 so the mean is (1035 776.25) /13.9 = 19 from a this unitwith normal is given Table C.2large of Bevington as 0.9 Find for probability for a distribution deviation from thein mean this or n = 1035. We can appoximate a Gaussian distribution.

x the mean). This would re the deviation can be less than 3or xmore than 1 1035 1035! PB (x; 1035, 0 .75) = function from 0x to 752 and (or one minus the sum from 7 !(1035 x)!800 4 to 1035 4 and x 1, we can approximate the Binomial with a Gaussian if we are where x is the number of standard cars goingdeviation to the right. is given by = npq = 13.9 so the mean is (1035 7 Mean = np = 1035x0.75 = 776.25, n = 1035. We can appoximate this with a Gaussian distribution.

Binomial distribution 1035 cars We are supposed to nd the probability of getting a result as far or farthe

thisthis being a unit typical daydistribution is 8.1%. in the variable z = (x )/ . This It is convenient to transform to a normal has = 0 and =Could 1. Since we allow deviations in either direction from the mean, our limit is sum PB from 0 to 752 and from 800 to 1035 |z | = |752 776.25|/13.9 = 1.745. Thealso probability falling inside the interval 1.745 < z < 1.745 in La You could use thefor Gaussian cumulative distribution function for a unit normal distribution is given in with Table C.2 of Bevington asx 0.large, 9190 (using interpolation). in the gure below): Can Dist approximate Gaussian since n, x still many Thus the probability of exceeding 1.745 from the mean is 1-0.9190 = 0.081 and the probability of less that n. (19 to be exact.) this being a typical day is 8.1%.
4

Thus the positive probability exceeding 1.745 from the mean is 1-0.9190 = 0.0 larger, either orof negative

n be less than or more than the mean). This would require summing this ugly the deviation can be less than or more than the me to 752 and 800 to 1035 (or one minus the sum from 753 to 800 799). Since 1 function from 0 to 752 and to 1035 (orn one min 2.8 Solution (continued) an approximate Bevington the Binomial with aand Gaussian if we not too near x = n. with Thea x 1, we canare approximate the Binomial standard is given by /= npq 9 so th on is given by = npq = 13.9 so the mean deviation is (1035 776.25) 13.9 == 1913 .from n = 1035. We can appoximate this with a Gaussian d n appoximate this with a Gaussian distribution. Use Gaussian approximation with same , .

We are supposed to nd the probability of getting a r

o transform this Transform to a unit normal distribution in the variable interval to unit normal distribution in z = (x )/ . This has = 0 and = 1. Since we allow deviations in = 1. Since we allow in |z either direction mean, our limit for is | = |752 776.25|/from 13.9 =the 1.745. The probability Limits deviations of interval z, 5|/13.9 = 1.745. The probability for for falling inside the interval 1.given 745 < < 1.745 a unit normal distribution is inzTable C.2 o Probability for z within this interval given in Table C.2 of Thus the probability of exceeding 745 from the me l distribution is given in Table C.2 of Bevington as 0.9190 (using1.interpolation). Bevington (probabilitythis of being within 1.745 8.1%. of ): being a typical day is lity of exceedingP(-1.745 1.745 from the mean is 1-0.9190 = 0.081 and the probability of < z 1.745) = 0.9190 cal day is 8.1%. couldday also Gaussian cumulative distribu Probability that this is You a typical = use 1- Pthe = 0.081 (or 8.1%)

It is convenient to transform this to a unit normal dis

Condence interval: the above interval is a in 91.9% condence se the Gaussian cumulative distribution function LabVIEW (called Normal interval for z e below): z should lie within this interval 91.9% of the time. Larger probability interval required to exclude hypothesis

Dist in the gure below):

3 would correspond to a 99.73% condence interval

Be careful: distribution might not be truly Gaussian


This calculates
1.745
5 1

|z | = |752 776.25|/13.9 = 1.745. The probability for falling inside the interval 1.745 for a unit normal distribution is given in Table C.2 of Bevington as 0.9190 (using in Bevington Thus the probability of exceeding 12.8 .745 (concluded) from the mean is 1-0.9190 = 0.081 and the p this being a typical day is 8.1%.

You could also use the cumulative distribution function to nd P You could also use the Gaussian cumulative distribution function in LabVIEW (cal available in LabVIEW: Dist the is gure below): inThis

This calculates P (z 1.745) =

1.745

1 1 exp z 2 = 0.0405 2 2

so the probability of this being a typical day is 2(.0405)=0.081, as above.

2 2 f f Test of a Distribution x 2 u 2 + v 2 u v N f 2 f x u + v x x x = i i0 Dene 2 u v i n! i=1 PB (x; n, p) = px (1 p)nx f f x!(n x)! u + v xi, it can be shown x x x0 Gaussian = 2 2 For variables that this f f u random v degrees x 2with u 2 + of freedom. v 2 follows the chi-square distribution u v N 2 x i i 2 2 2 (i.e., mean value) for The expectation value is f f i 2 2 2 x u + v i =1 n! x nx u v P ( x ; n, p ) = p (1 p ) B If this is based on a t where some parameters x!(n are x)! determined

April 19, 2006 Chi-Square

n! PB (x; n, p) = px (1 p)nx x!(n x)!

PB (x; n, p) =

from the t, the number of degrees of freedom is reduced by the n!tted x number of parameters nx
x!(n x)!
N

2 x i i 2 evaluate the goodness of One can set condence intervals in to i

p (1 p)

i=1

i=1 as was done for Gaussian probability. (See Table C.4 in Bevington)

xi i 2 distributions i can be found from tables, graphs or LabVIEW VIs


2

Can be approximated by Gaussian under suitable conditions


7

Comments on 2 of Histogram
N

2
i=1

(Ni ni )2 ni

Can model this with N individual mutually independent binomials, so long as a xed total is not required. Then normalize to the actual Ntotal, using 1 degree of freedom in the t (N = number of bins) (see discussion in Bevington, Ch. 3 and Prob. 4.13 solution on next page)

For small ni, large Ntotal, large number of bins, ni is approx. Poisson
>5 OK) or N >>1 2 2 distribution if all ni >>1(ni

For xed Ntotal, model with multinomial distribution (see p. 12) But ni are not mutually independent with multinomial

153 , 08 1/4.665 0.46 = 4.665 =

4.13) I made a LabVIEW VI (see gures on next page) to plot the histogram, calculate the Gaussia comparison values and nd 2 according to Eq. 4.33 of Bevington:
4.13) I made a LabVIEW VI (see gures on next page) to plot the histogram, calculate the Gaussian n comparison values and nd 2 according to[h Eq. (xj4.33 ) of NBevington: P (xj )]2

Histogram Chi-Square: Bevington Prob. 4.13 1/4.665 0.46


2 =
2

32.8 and

j =1

where n is the number of bins, N is the total number of trials, h(xj ) is the contents of the j th b is the number of bins, N is the total of trials, h(xj ) is the contents of thefor j thdetails). bin and N P (where xj ) isnthe expected contents from the number Gaussian distribution (see the text
and N P (xj ) is the expected contents from the Gaussian distribution (see the text for details).

j =1

P (x ) j )]2 [h(xj N ) N Pj (x N P (xj )

Assume the bins are small enough so so we can the integral ofp.d.f. the p.d.f. over Assume the bins are small enough we canapproximate approximate the integral of the over the bin the b with the with central value of the times the Then the central value of p.d.f. the p.d.f. times thebin bin width. width. Then
P( xj ) = N N PN (x j) = N p( (x x) dx NN p(x ) x j p ) dx p ( x j )x xj

xj

where width.

where p(xj ) is the Gaussian p.d.f. evaluated at the center of the j th bin and x = 2 is the bin p(x width. j ) is the Gaussian p.d.f. evaluated at xj , the center of the j th bin and x = 2 is

the b

This j and are given but the total number of trials is taken to agree with the experiment (200 trials The expectation value for 2 equals the number of degrees of freedom, < 2 >= 12. The resulting This represents one constraint and reduces of degrees of freedom by 1. We have 1 2 = 8.28 (disagrees with the answer in the the booknumber but was checked independently). The probability 9 bins to compare with the Gaussian = 12 degrees of freedom. for exceeding this value of 2 is and 0.76 (calculated by LabVIEW but agrees with interpolated value

This analysis assumes the contents of each j is xjh an 2independent measurement. 1 histogram 1 bin p(xj )= exp to agree with . and are given but the total number of trials is taken the experiment (200 trials). 2 2 This represents one constraint and reduces the number of degrees of freedom by 1. We have 13 bins to compare with Gaussian = 12 degrees of freedom. analysis assumes the the contents of and each histogram bin h is an independent measurement.

The VI front panel with the results and the diagram which produced them are shown in the following Histogram Chi-Square Result gures.

10

where p(xj ) is the Gaussian p.d.f. evaluated at the center of the j th bin and x = 2 is the bin width. This analysis assumes the contents of each histogram bin hj is an independent measurement. and are given but the total number of trials is taken to agree with the experiment (200 trials). This represents one constraint and reduces the number of degrees of freedom by 1. We have 13 bins to compare with the Gaussian and = 12 degrees of freedom. The expectation value for 2 equals the number of degrees of freedom, < 2 >= 12. The resulting 2 = 8.28 (disagrees with the answer in the book but was checked independently). The probability for exceeding this value of 2 is 0.76 (calculated by LabVIEW but agrees with interpolated value 2> = 0.69 (expectation value of 1). This is not a bad t. Of from Table C.4). The reduced 2 < 2 course, the distribution is only valid for underlying Gaussians and this is not a good assumption for the bins with low occupancy.

x Histogram Chi-Square: Comments


j

N P (xj ) = N

p(x) dx N p(xj )x

11

Multinomial Distribution
Histogram with n bins, N total counts (partition N events into n bins), xi counts in ith bin, with probability pi to get a count in ith bin

N! xn 1 x2 . . . p p px P (x1 , x2 , . . . , xn ) = n x1 !x2 ! . . . xn ! 1 2
n n

with
i=1

xi = N,
i=1

pi = 1 .

Then

i = N pi ,

i 2 = N pi (1 pi ),

ij 2 = N pi pj

Reference:

http://mathworld.wolfram.com/MultinomialDistribution.html

12

Complete Lab Writeup

Similar to research report. Outline as follows:

Abstract (very brief overview stating results) Introduction (Overview and theory related to experiment) Experimental setup and procedure Analysis of data and results with errors

Graphs should have axes labeled with units, usually points should have error bars and the graph should have a caption explaining briey what is plotted

Comparison with accepted values, discussion of results and errors; conclusions, if any. References

Have draft/outline of paper and preliminary calculations at lab time Wednesday

13

You might also like