You are on page 1of 7

Statistical Methods of Data Analysis - WS 00/01 Chapter 2

Ian C. Brock 13th October 2003 13 : 05

Contents
2 Characterising Distributions 2.1 Average Values . . . . . . . . 2.1.1 Mean . . . . . . . . . . 2.1.2 Mode . . . . . . . . . . 2.1.3 Median . . . . . . . . 2.1.4 Geometric Mean . . . 2.1.5 Harmonic Mean . . . . 2.2 Measuring the Spread . . . . . 2.2.1 Variance . . . . . . . . 2.2.2 Standard Deviation . . 2.2.3 R.M.S. and FWHM . . 2.3 Skewness, Curtosis, Moments 2.4 Correlations, Covariance . . . 2.5 Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 2 3 3 3 3 4 5 5 6 7

Characterising Distributions

Know how to display the data, but want to do more than that. Want some numbers that characterise (describe) the distributions.

2.1
2.1.1

Average Values
Mean

Simplest, most common is the mean. Take a data sample: {x1 , x2 , x3 , ..., xn }

Lecture 1, 18/10/2000

2.1: Average Values

Usually try to use xn for a measurement. Mean is dened as: x = 1 N


N

xi
i=1

Can also dene mean of any function of the measurements: f = 1 N


N

f (xi )
i=1

Typical functions are f = ax, f = x2 If you have each measurement available, this can be done. More common is to have binned data. Thus we have bin j that corresponds to measurement value xj and we have nj measurements in this bin. Total number of bins is N bin. Then: x = N =
j =1

1 N

N bin

nj xj
j =1

with

N bin

nj 1 N
N bin

f =

nj f (xj )
j =1

Symbol x is used to denote the mean of a sample of measurements. 2.1.2 Mode

Most common value - implies binned data (or a theoretical distribution). Simple to dene with large statistics. With lower statistics, problem is uctuations. Transparency Distribution of heights Useful for asymmetric (skew) distributions. 2.1.3 Median

Value for which half data points are below and half data points are above. Basically simple. Problem is if number of measurements is odd - median is the middle value. If data is binned, then probably have to take central value of bin, such that not more than half data are below and not more than half are above.

Characterising Distributions

Lecture 1, 18/10/2000

2.2: Measuring the Spread

2.1.4

Geometric Mean n x1 x2 x3 ...xn

Centre of gravity of distribution:

Useful for interpolating exponential growth etc., e.g. population growth. Transparency Compare mean, mode, median If data are symmetric mean, mode, median all coincide. If not simple rule of thumb if data not too skew is Mean Mode = 3(Mean Median) 2.1.5 Harmonic Mean N 1/x1 + 1/x2 + 1/x3 + ... + 1/xn

2.2

Measuring the Spread

Know how to say where the middle of the distribution is. Obvious thing want to know next is the width/spread. How can one characterise that? First thought may be to calculate the mean deviation from the mean: 1 N
N i=1

(xi x) =

1 1 xi N N = xx = 0

i.e. no good. Could consider 1 N

| xi x |
i=1

average absolute deviation, but this is not so easy to handle mathematically. 2.2.1 Variance

Much better quantity is: V (x) = 1 N


N

(xi x)2
i=1

Characterising Distributions

Lecture 1, 18/10/2000

2.2: Measuring the Spread

mean square deviation, called the variance. For any function f (xi ): V (f ) = 1 N
N

(f (xi ) f )2
i=1

For data analysis, it is preferable to loop over the data only once, and calculate the mean and variance at the same time - also possible: V (x) = = = = 1 N 1 N 1 N x2
N

(xi x)2
i=1 N 2 (x2 i 2xi x + x ) i=1 N 2 2 x2 i 2N x + N x i=1

(x)2

i.e. mean square - square of the mean. Fine, but be careful if you use your computer and/or calculator to do this - they have limited precision. Try it with: -2, -1, 0, +1, +2 998, 999, 999, 1000, 1000, 1000, 1001, 1001, 1002 999998, 100000, 1000002 Transparency Calculating variance on the computer In such cases, better to get a rough estimate of the mean and then subtract that from all values to keep the absolute numbers down: (x x0 )2 = (x x)2 + (x x0 )2 End of Lecture 1

2.2.2

Standard Deviation

The square root of the variance is called the standard deviation: = V (x)

Called the standard deviation, because, as we will see it is the typical deviation that one expects to see from the measured average value. Most (68%) of the measurements are within 1 and a few ((95 68)%).

Characterising Distributions

Lecture 2, 25/10/2000

2.3: Skewness, Curtosis, Moments

2.2.3

R.M.S. and FWHM

R.M.S. is actually the same as the standard deviation, if standard deviation is dened as above (later mention that there are other denitions). It describes exactly what one does: calculate the mean square deviation and then take the square root. For distributions that have long tails, the r.m.s. or do not describe the core of the distribution well, as the square of the deviation goes into the calculation. In same way as one the can use mode for average value, can dene the spread by calculating the full width at half-maximum (FWHM). Problem comes with low statistics (same as for mode). Value of maximum uctuates, means that value at which one measures the width uctuates. Also uctuations in the data around the half-maximum point also lead to errors. For a Gaussian: FWHM = 2.35 Can also follow a similar denition as median and divide the data into quartiles. Dene width as value from which quarter of data is below to value at which quarter of data is above.

2.3

Skewness, Curtosis, Moments

What do we do if the data is asymmetric, or how can we tell if it is asymmetric? Take the next order! 1 (xi x)3 = N 3 i 1 (x x)3 = 3 Tail extends to the right skewness is positive, to the left skewness is negative. As we usually worry more about the error on a measurement, skewness not often used, but it is a useful measure to decide if a distribution is symmetric or not. Fourth power is called curtosis: c = 1 (x x)4 3 4

Both skewness and curtosis are dimensionless - that why 3 and 4 are there. More generally: 1 xi r N i is called the rth moment and 1 N (xi x)r
i

is called the rth central moment. Skewness and curtosis are eectively the 3rd and 4th moments with some constants included. The 3 in curtosis is to ensure that the curtosis of a Gaussian is 0. Characterising Distributions Lecture 2, 25/10/2000

2.4: Correlations, Covariance

2.4

Correlations, Covariance

What about if we have 2 variables, height and weight, height and IQ, ...? We know how to plot them: scatter plot, 2-D histogram, lego plot, ... Can also clearly calculate the mean and spread of each variable on its own, but how do we characterise dierences: Transparency Dierent correlation coecients Data sample consists of pairs of values: (x1 , y1 ), (x2 , y2 ), (x3 , y3 ), ..., (xn , yn ) Dene x, y, V (x), V (y ) exactly as before. The covariance tells you whether they are correlated, i.e. does one value depend on the other and if so how much? cov (x, y ) = 1 (xi x)(yi y ) N i 1 xi yi = xi yi N i N N = xy x y

If xy = x y variables are independent of each other. Can try to understand this by considering a point (value) (xi x) that is positive. If (yi y ) is just as often positive as it is negative then: (xi x)(yi y ) = 0
i

Covariance can also be seen as the generalisation of the variance. If a positive (xi x) means that (yi y ) is more often positive than negative then the variables are positively correlated. If a positive (xi x) means that (yi y ) is more often negative than positive then the variables are negatively correlated. Covariance carries dimensions. Much more useful is to scale it by the standard deviation: = cov (x, y ) x y xy x y = x y

is called the correlation coecient: 1 1. -1 100% anti-correlated weight/stamina 0 uncorrelated height/IQ +1 100% correlated height/weight Correlation coecient is independent of the scale and also of a shift in the zero point of either or both of the variables. Note that if x and y are correlated, then this also aects their variances: If y is xed then V (x) is small, if y varies then V (x) is larger for correlated variables. Characterising Distributions Lecture 2, 25/10/2000

2.5: Covariance Matrix

2.5

Covariance Matrix

Generalize things to more than 2 variables. Must rst modify notation a bit. One measurement now consists of n elements with values: x(1) , x(2) , x(3) , ..., x(n) Can dene a covariance between each pair of variables: cov (x(i) , x(j ) ) = x(i) , x(j ) x(i) x(j ) These form the elements of an n n symmetric matrix: Vij = cov (x(i) , x(j ) ) Called various names covariance matrix, or variance matrix, or error matrix. Note that the diagonal elements are just the variances, so this matrix is a generalisation of thevariances that we have already met. Correlation matrix is a dimensionless form of the above, obtained by dividing by the standard deviations: ij = cov (x(i) , x(j ) ) i j

Characterising Distributions

Lecture 2, 25/10/2000

You might also like