You are on page 1of 3

Notes on statistical methods

November 14, 2014


These notes are intended to cover the most general concepts that will be used in a recurring
manner throughout any course on Statistical Physics. This is the field of Physics which describes
Thermodynamics by adopting a microscopic point of view, assuming that every single particle
of a macroscopic system will follow Newtons law. With this end, an extensive use of probability
theory is required and that is why this introductory chapter is needed. That way, we start with
a brief survey of the most relevant concepts of probability theory. The starting point will be
a rudimentary definition of probability (without entering in deep mathematical details), that
will be followed by some of the most famous properties of this measurement of the occurrence
of a random event. After that a discussion on conditional probabilities and mean values
will be carried out. Given that our ultimate purpose in this course is the study of macroscopic
systems, it is natural to extend the above study to the continuous case. Then, we will introduce
a quantity which will play a prominent role in following chapters, the characteristic function.
We will then present some examples of distributions that are of general interest, such as the
binomial or the Normal distributions. Finally the central limit is stated and commented.

Statistical concepts

Nature is full of examples of random variables. Even when we cannot predict the particular
value that one of this variables will take, for no matter what reasons, they can be somehow
characterised. Of course, whenever we want to characterise the properties of a random variable
a mental abstraction must be made to consider an assembly (or ensemble) of a very large number
(strictly speaking infinite) number of identical systems, N . Thus, the probability of occurrence
of a given value can be defined as the relative frequency with respect to this particular ensemble.
Let us consider the particular random variable, X, to be the number resulting from rolling a
dice. In the case of an ideal dice, one would expect that any number should appear in the end
with the same frequency. This is actually the probability that we (a priori ) think each of the
outcomes X = {1, 2, 3, 4, 5, 6} should have. Each time we roll the dice we constitute one of the
N systems of the vast ensemble. To fix some notation, lets call absolute frequency the number
of times we get a particular result, e.g. x = {1}. When this number is divided by the number
of total systems forming the ensemble we get the relative frequency. Imagine we consider a
system of N = 103 dices and the result X = 1 has an absolute frequency of 150. Therefore, the
relative frequency would be
p(x = {1}) = 15/N
When N is large enough we can call this ratio the experimental probability. While this definition
of probability has severe limitations, it serves as a good starting point for our purpose. In order
to avoid these problems, we will continue introducing a more exact definition of the probability
(actually this is called the axiomatic definition of probability) that leaves aside the problem of
1

determining the exact relative frequency with respect to the ensemble, assuming that this
somehow can be done.
Firstly, we will denote in general (unless we specify other particular notation for some
reason) the random variables by uppercase letters, e.g. X, and their particular values by
the same lowercase, e.g. x. Each of the possible values a random variable can take is called
elementary event or simply event. The set of all possible outcomes or events of that variable is
called sample space, that we will denote by . That way, each of the elementary events, ei ,
will be postulated to have a (a priori) probability, p(x = {ei }) p(ei ). This probability will
fulfil the following properties (that are characteristic of a measure):
They are null or positive numbers, p(ei ) 0 for all ei
P
They are normalised, p() = i p(ei ) = 1
The probability of any set of elementary events to occur is equal to the sum of the
probabilities of each elementary event (mathematically called -additivity):
p(A B) = p(A) + p(B),
with A, B and A B =
When we write p(A B), it can be read as the probability of A or B to occur. In the following
we will also see p(A B) which means, the probability of A and B to occur simultaneously.
From this definition we can deduce straightforwardly the following consequences:
p() = 0
p(A) p(B) if A B
p(x) [0, 1], x
Thus, we see that this definition does not care about where the number p(ei ) comes from but
about its fundamental properties as a measurement of the occurrence. The numbers therefore must be deduced from external arguments that do not affect the measure itself, such as
symmetry or homogeneity reasons, or simply the ignorance.
Thus, when an event xi has a greater probability than other xj to occur what we are saying
in reality is nothing but,
p(xj ) < p(xi ).
Moreover, if two events are equally probable then p(xi ) = p(xj ). When the sample space is
formed by a single event, then such an event has probability one. Finally, when we say that an
event is not possible, we are actually saying that p(xi ) = 0.
As we can imagine from the above discussion, the fact of determining the set of elementary
events associated to a random variable can lead us to non trivial problems. From the most
practical point of view, it is customary to choose them as those that make easier the process
of assigning probabilities a priori.
Another property that we will need to bear in mind is the concept of independence. Two
events A and B are independent if and only if their joint probability equals the product of their
probabilities:
p(A B) p(A, B) = p(A) p(B)
(1)
As we will see below, this can be directly related with the concept of conditional probabilities.
2

Random (so-called stochastic) variables

Lets consider a general process, X, and we will assume that it is possible to relate any of the
elementary events, ei , with a number xi . As we said before, the outcomes from the stochastic
process X are denoted as x. Then, the variable x thus defined can be called random variable.
Hereinafter the probability of obtaining xi will be written as W (xi ).
For instance, a random variable x could be the sum of the results of rolling two dices.
That way, the probability of each possible value xi will equal the sum of probabilities of the
corresponding elementary events producing such a sum. For instance,
W (x = 10) = p(6, 4) + p(4, 6) + p(5, 5)
Hence, the probability of obtaining any of the values between xi and xj would be given by the
sum,
j
X
W (xi x xj ) =
W (xk )
(2)
k=i

recalling the normalisation condition


X

W (xi ) = 1

(3)

The most likely (or probable) value is that with a highest probability. Such a value will be
represented by x
e. Coming back to the example of the dices, the most probable sum will be
x = 7, event which has a probability of W (x = 7) = 6/36 = 1/6.
Moreover, we can define two different variables by using the same process, x, y. The couple
of values xk , yl then has a probability distribution W (xk , yl ) which can be deduced from the a
priori probability distribution for each of the elementary events.
Given the joint probability W (xk , yl ) we can compute the probability to obtain xk , no matter
what value yl we get by performing the sum,
X
W (xk , yl ),
(4)
Wx (xk )
l

where Wx is commonly known as marginal probability. Furthermore, if the following identity is


fulfilled,
W (xk , yl ) = W (xk ) W (yl )
(5)
the variables x and y are statistically independents.

Conditional probability

You might also like