Professional Documents
Culture Documents
Probability theory is concerned with the description and calculation of the properties of
random phenomena, as occur in games of chance, computer and telecommunications
systems, financial markets, electronic and optical circuits and many other random
systems.
Although such systems are random, in the sense that it is difficult or impossible to
predict exactly how the system will behave in the future, probability theory can provide
characterisation of the type of randomness involved and yield useful measures, such as
average values of system parameters or the likelihood of certain events occurring in the
future.
Many random phenomena can be modelled by the notion of a random experiment, for
example:
Recording the output voltage of a noise generator
Observing the daily closing price of crude oil
Measuring the number of packets queueing at the input port of a network router
Each different random experiment E defines a its own particular sample space, event
space and probability measure, which collectively form an abstract probability space for
the random experiment.
Considering this example where the sample space is discrete (countable), it may appear
unnecessary to define events to which probabilities are assigned. Why not simply
assign probabilities directly to outcomes in the sample space?
However, if events are defined as intervals of the real line (e.g. [0,5]), the events can
have non-zero probability values (e.g. the probability of an outcome occurring within
the interval [0,5] will be non-zero).
So that we can form a useful theory for all random experiments (particularly those with
uncountable sample spaces), the probability measure is only defined on specified
subsets of the sample space (the events) rather than on individual outcomes in the
sample space.
Note that this stipulation does not preclude us from defining events consisting of a
single outcome, but we draw the distinction between an outcome ω ∈ Ω (an element of
Ω) and an event {ω} ⊂ Ω (a subset of Ω).
The definition of the event space as a σ-field further specifies which subsets of Ω can
belong to the same event space. That is, there is a certain relationship between the
subsets of the sample space Ω that are chosen as events in the event space.
The properties of a σ-field (and so of any event space) ensure that if events A and B
have probabilities defined then logical combinations of these events (e.g. the outcome
is in either A or B) are also events in the event space and so also have probabilities
defined. Any subset of Ω that does not belong to the event space of a random
experiment will simply not have a defined probability.
We next look at the sample space, event space and probability measure in some detail.
A sample space Ω is the non-empty set of all outcomes (also known as sample
points, elementary outcomes or elementary events) of a random experiment E.
The sample space takes different forms depending on the random experiment in
question. We have seen an example of a finite sample space {H, T }, in the case of the
coin tossing random experiment, and also an uncountable sample space (a interval of
the real line [0, 10]) in the case of the random number experiment.
Example 1
A finite sample space Ω = {ak : k = 1, 2, ..., K}. Specific examples are:
A binary space {0, 1}
A finite space of integers {0, 1, 2, ..., k − 1}. (Also denoted Zk ).
Example 2
A countably infinite space Ω = {ak : k = 1, 2, ...}. Specific examples are:
All non-negative integers {0, 1, 2, ...}, denoted Z+
All integers {..., −2, −1, 0, 1, 2, ...}, denoted Z
Example 3
An uncountably infinite space. Examples are the real line R or intervals of R such as
(a, b), [a, b), (a, b], [a, ∞), (−∞, ∞).
Example 4
A space consisting of k-dimensional vectors with coordinates taking values in one of
the previously described spaces. The usual name for such a vector space is a product
space. For example, let A denote one of the abstract spaces previously considered.
Define the cartesian product Ak as:
Example 5
Let A be one of the sample spaces in examples 1-3. Form a new sample space
consisting of all waveforms (or functions of time) with values in A (e.g. all real valued
time functions). This space is a product space of infinite dimension. For example:
Exercise 1
Specify appropriate sample spaces that model the outcomes of the following ran-
dom systems: (i) tossing a coin where a head is assigned a value of 1 and a tail
a value of 0 (ii) rolling a die (iii) rolling three dice simultaneously (iv) choosing a
random coordinate within a cube (v) an infinite random binary waveform.
1 If F ∈ F then also F c ∈ F
n
[
2 If for some finite n, Fi ∈ F, i = 1, 2, ..., n then also Fi ∈ F
i=1
∞
[
3 If Fi ∈ F, i = 1, 2, ... then also Fi ∈ F
i=1
These properties specify that an event space is a σ-field (or σ-algebra) over Ω.
Note that the definition of the σ-field, as above, specifies only that the collection be
closed under complementation and countable unions. However, these requirements
immediately yield additional closure properties. The countably infinite version of De
Morgans’s ’Laws’ of elementary set theory require that if Fi , i = 1, 2, ... are all
members of a σ-field then so is:
∞ [∞ c
c
\
Fi = Fi
i=1 i=1
Thus the σ-field properties imply that the collection of events in an event space is
closed under all set-theoretic operations (union, intersection, complementation,
difference, etc.) so that performing set operations on events must result in other
events inside the event space.
It follows by similar set-theoretic arguments that any countable sequence of any of the
set-theoretic operations (union, intersection, complementation, difference, symmetric
difference, etc.) performed on events in an event space must yield other events in the
event space.
We next turn to the question of how such event spaces may be constructed.
Given a countable sample space Ω, the collection of all subsets of Ω is a σ-field (and
thus a valid event space).
Such a collection of all possible subsets of a sample space is called the Power Set P
of the space.
The power set is the largest possible event space since it contains all subsets of Ω.
Note that, a finite sample space with n elements has a power set with at most 2n
elements.
For example, the power set of the binary sample space Ω = {0, 1} is
Although the power set of the sample space automatically yields a valid event space, it
is possible to find a smaller event space, given some set of events of interest.
For example, consider the experiment of tossing two coins together in a game where
we are only interested in the event of tossing one head and one tail. Denoting a head
as 1 and a tail as 0, the appropriate sample space is:
Ω = {0, 1}2 = {(0, 0), (0, 1), (1, 0), (1, 1)}
The event space for the experiment can be defined as the power set of Ω:
Can we find a smaller event space for this random experiment containing the event of
interest A = {(0, 1), (1, 0)}?
We can in fact generate the smallest event space (σ-field) G that contains A.
For our example, if we start with the event of interest A = {(0, 1), (1, 0)} and apply
the rules of the σ-field (all complements and countable unions are also in the field)
iteratively we arrive at the event space:
G = {A, Ac , A ∪ Ac , A ∩ Ac }
= {{(0, 1), (1, 0)}, {(0, 0), (1, 1)}, {(0, 1), (1, 0), (0, 0), (1, 1)}, Ø}
We note that in this instance the chosen family of events of interest consisted of a
single event A. In general, the family may contain many events.
To give a more precise definition of a generated field we say that, given a family of
events A of interest, we may find the σ-field G generated by A by taking the
intersection of all σ-fields on Ω that contain A, that is:
\
G= {∀ F : F is a σ-field with A ⊂ F}
Exercise 2
What is the power set of Ω = {1, 2, 3, 4}?
Given Ω = {1, 2, 3, 4}, find the σ-field (event space) generated by the family of
events A = {{1}, {3, 4}}.
Although the notion of a generated σ-field has been introduced in the context of a
countable sample space, it is more usual to take the power set as the de facto event
space for countable sample spaces. Generated fields are most useful when defining
event spaces on uncountable sample spaces (for example the real line).
In the uncountable case, a mathematical technicality arises with some subsets of the
sample space (i.e. some elements of the power set). There can exist some subsets
which due to their complicated structure cannot be assigned a meaningful probability
measure and thus are not valid events. The approach, instead, is to start with a set of
simple subsets of the sample space which are known to be measurable and generate a
measurable event space from these. This leads us to the notion of a Borel field.
Given a family of events S = {(−∞, x] : x ∈ R}, we may generate from these events
a σ-field B(R), called the Borel Field on R.
Although, this set of subsets of the real line B(R) is a smaller set than the power set of
the real line, it is large enough not to restrict a useful theory of probability for real
sample spaces.
We note that any such family of intervals (e.g. S 0 = {(y, ∞) : y ∈ R}) will generate
the same Borel Field. To illustrate this point, consider the intervals (a, ∞) ∈ S 0 and
(−∞, ∞) ∈ S 0 , then the set (−∞, ∞) − (a, ∞) = (−∞, a], in the generated σ-field, is
also in S.
Similar to our previous definition of a generated field, the Borel Field B may be
concisely defined as the σ-algebra generated by the set of all intervals:
\
The Borel Field B = {∀ F : F is a σ-field containing all intervals}
Ω = R is often a natural choice of sample space for many random systems and the
Borel field B(R) on the real line is the usual choice of event space in this case.
The structure of the Borel field, being generated from intervals, makes is easier to
specify a probability measure on the set of events. By specifying probabilities on the
intervals, we are assured that all events in the event space will have probabilities
defined.
We note that it is also possible to form a Borel field on a subset of the real line (e.g.
R+ ). It is also possible to form a Borel field on real product spaces.
We can see a relationship between the definition of the event space and the definition
of the probability measure.
The structure of the event space ensures that any countable series of set operations on
a set of events is also in the event space. The probability axioms ensure that knowing
the probability of the original set of events, the probability of the resulting set can be
calculated.
Examples of useful properties of the probability measure that can be derived from
these axioms:
(a) P(F c ) = 1 − P(F )
(b) P(F ) ≤ 1
(c) P(Ø) = 0
Other concepts related to the probability measure are reviewed below.
Conditional Probability
Given a probability space (Ω, F, P) and two events A and B ∈ F, the conditional
probability of A given B is defined by:
T
P(A B)
P(A|B) = , P(B) 6= 0
P(B)
(Ω0 = B, F 0 = {∀ (F ∩ B) : F ∈ F}, P0 )
Independence
P(A ∩ B) = P(A)P(B)
Note that, when P(A) 6= 0 and P(B) 6= 0, this condition implies that:
Let {B1 , ..., Bn } be events that form a partition of the sample space, that is
∪{∀Bi } = Ω
Bi ∩ Bj = Ø ∀i 6= j
Then
n
X n
X
P(A) = P(A ∩ Bi ) = P(A|Bi )P(Bi )
i=1 i=1
We have seen an example of a simple random experiment (tossing a fair coin) where
the value of the probability measure P can easily be specified explicitly for every event
in the event space.
For more complex probability spaces it is difficult to specify the set function P directly.
The notion of a probability function becomes useful for specifying P, in an indirect
way.
p(ω) ≥ 0, all ω ∈ Ω
and
X
p(ω) = 1
ω∈Ω
This set function P is a valid probability measure for the probability space (Ω,F,P) as
it satisfies the axioms and specifies a probability for all events in the event space F.
A function p(ω), with the properties specified above, is called a probability mass
function (pmf). It is a more easily specified point function from which the set
function P is induced.
The Binary pmf: Ω = {0, 1}; p(0) = 1 − ρ, p(1) = ρ, where ρ ∈ (0, 1) is a parameter.
The Geometric pmf: Ω = {1, 2, 3, ...} and p(k) = (1 − ρ)k−1 ρ; k = 1, 2, 3... where
ρ ∈ (0, 1) is a parameter.
λk e−λ
p(k) = where λ is a parameter in (0, ∞)
k!
Exercise 3
Show that the function p(k) = (1 − ρ)k−1 ρ; k = 1, 2, 3, ... and ρ ∈ (0, 1) is a
parameter, satisfies the properties of a probability mass function (pmf).
Given a sample space (Ω = {1, 2, 3, ...}, P(Ω), P) where P is induced by the pmf
p(k), what is the probability of the event F = {1, 2, 3, 4}.
In the case of a probability space (Ω, F, P) with an uncountably infinite sample space
(e.g. R) can we make a similar simplification to specification of the probability
measure P? For example, considering the probability space (R, B(R), P), can we find a
function that induces P?
f (r) ≥ 0, all r ∈ R
Z
f (r) dr = 1
Ω
We now have an expression for the probability measure P, a difficult to specify set
function, in terms of a more easily specified point function f (r).
Like a pmf, a pdf is defined only for points in Ω and not for sets (events). The pmf
relates to a countable sample space and is summed over all points in an event to
produce its probability. The pdf relates to an uncountable sample space and is
integrated over all points in an event to produce its probability.
The pdf of a given probability measure does not always exist. If it does exist, then it is
unique.
We will discuss probability measures further in the next section on random variables.
1 2 /2σ 2
The Gaussian pdf: f (r) = (2πσ 2 )− 2 e−(r−m) ;r∈R
Exercise 4
Show that the exponential function f (r) = 2e−2r ; r ∈ [0, ∞) satisfies the prop-
erties of a probability density function.
Given the probability space (R+ , B(R+ ), P) where P is induced by the pdf f (r),
find the probability of the event [0, 1].
Consider our example random experiment of tossing two coins simultaneously. The
probability space for the experiment is given as (Ω, F = P(Ω), P), where
Ω = {(T, T ), (T, H), (H, T ), (H, H)} and P(Ω) is the power set of Ω.
Suppose we are most interested in the probabilities of the number of heads turning up.
Define a mapping (a set function) X(ω) that maps the individual outcomes ω ∈ Ω to
the number of heads occurring:
X
(T, T ) −→ 0
X
(T, H) −→ 1
X
(H, T ) −→ 1
X
(H, H) −→ 2
We can also find the probabilities of combinations of values of the random variable.
For example, the probability of X(ω) > 0 is:
It appears from this that the range of X has an associated event space of its own with
each event corresponding to an event (and thus a probability) in the original event
space F.
So, we can view a random variable X as being a mapping from the original probability
space to an output probability space:
X
(Ω, F, P) −→ (ΩX , FX , PX )
under the condition that for every event in FX there must be a corresponding event in
the original domain event space F. In other words, the inverse mapping of any event
in the range event space of X must be an event in the original event space F. In the
case of our example, we can see that this requirement holds:
X −1
{0} −→ {(T, T )}
X −1
{1} −→ {(T, H), (H, T )}
X −1
{2} −→ {(H, H)}
X −1
{0, 1} −→ {(T, T ), (T, H), (H, T )}
X −1
{0, 2} −→ {(T, T ), (H, H)}
X −1
{1, 2} −→ {(T, H), (H, T ), (H, H)}
X −1
{0, 1, 2} −→ {(T, T ), (T, H), (H, T ), (H, H)}
X −1
{} −→ {}
Dr Conor McArdle EE414 - Probability & Stochastic Processes 33/60
Random Variables
Exercise 5
Consider the probability space (Ω, F, P) where Ω = {0, 1} and F = {Ø, Ω}.
Is the function X(ω) = ω a valid random variable? Explain your answer.
We have thus far considered the case where the original sample space is discrete and
so the random variable’s range is also discrete.
When the sample space is continuous, we have a continuous random variable X whose
range is ΩX = R (or a subset of R). We have seen previously that a suitable event
space for the real sample space is the Borel field over the reals and so the range event
space becomes FX = B(R) and probability measure on this range event space is
denoted PX .
again with the requirement that the inverse mapping of all events B ∈ B(R) must be
events in F. This leads us to the formal definition of a (real-valued) random variable.
We have noted earlier that the probability of an event in the range event space of the
random variable must be the same the probability as that of the inverse mapping of
the event. Thus, given the probability measure of the original space P, the probability
measure PX of the random variable can be derived, or in mathematical terms:
We now look at probability functions as they relate to random variables. As the range
space (R, B(R), PX ) is nothing other than a probability space, the concept of
probability functions must also apply to this space.
We have seen previously that, given a probability space (Ω, F, P) where Ω is discrete,
we can more easily describe P in terms of a probability mass function p(ω) where
giving an expression for the probability measure in terms of the pmf p(ω) as:
X
P(F ) = p(ω), for all F ∈ F
ω∈F
In a similar way, for a discrete random variable, we can describe PX for the random
variable X in terms of a pmf pX (x), x ∈ R, where pX (x) is derived from p(ω) as:
X
= p(ω)
ω : X(ω)=x
Let (Ω, F, P) be a discrete probability space with Ω = {1, 2, 3, ...}, F the power
set of Ω and P the probability measure induced by the geometric pmf:
Solution
X
pX (x) = p(ω)
ω:X(ω)=x
X X
⇒ pX (1) = p(ω) = (1 − ρ)ω−1 ρ
ω: ω even ω=2,4,...
∞ ∞
ρ X X
= ((1 − ρ)2 )ω = ρ(1 − ρ) ((1 − ρ)2 )ω
1−ρ
ω=1 ω=0
(1 − ρ) 1−ρ
=ρ =
1 − (1 − ρ)2 2−ρ
1−ρ
⇒ pX (0) = 1 −
2−ρ
If we specify the probability of all intervals in S then the probability of any event (any
set combination of the intervals) can be determined. This prompts the definition of the
cumulative distribution function:
Given the cdf of X, probabilities of any event can be determined, for example:
Pr(a < X ≤ b) = PX ((−∞, b] − (−∞, a]) = FX (b) − FX (a); where a ≤ b
We note some properties of the cdf FX (x):
FX (−∞) = 0
FX (∞) = 1
FX is non-decreasing and continuous from the right
Dr Conor McArdle EE414 - Probability & Stochastic Processes 39/60
Continuous Random Variables and Probability Functions
We have seen earlier that the probability measure P can also be expressed in terms of a
probability density function (pdf) when the sample space is real-valued. Thus we
also have the notion of a pdf of a random variable, that is the pdf inducing PX .
Z
fX (x) ≥ 0, ∀x and fX (x) dx = 1
R
We note the significance of the wording ’well defined integral’ in the above definition.
Although the cdf always exist, the pdf may not.
Considering events of the form (−∞, α], the pdf gives probabilities:
Z α
PX ((−∞, α]) = fX (x) dx, ∀α ∈ R
−∞
We now have two ways of expressing the probability of an event of the form (−∞, α],
the cdf and the pdf. Thus they can be related as follows:
Z α
FX (α) = PX ((−∞, α]) = fX (x) dx; α ∈ R
−∞
and also
d FX (α)
fX (α) = ; α∈R
dα
Rb
Also note that: Pr(a < X ≤ b) = FX (b) − FX (a) = a fX (x) dx
We have previously derived the pmf of the discrete random variable from the pmf in
the original (domain) probability space. Can we also derive the pdf of a continuous
random variable X, given a pdf for the original space?
X
(Ω = R, F = B(R), P) 7−→ (ΩX = R, FX = B(R), PX )
f given fX ?
Method:
Z
f (r) dr
r∈Ω :X(r)≤x
Assuming we can find the limits of integration (which requires evaluating X −1 ), the
pdf of X may then be calculated as:
Z
d
fX (x) = f (r) dr
dx
r∈Ω :X(r)≤x
Find the probability density function (pdf) that induces PX , given that P is induced
by the uniform pdf on [0, 1] (that is, f (r) = 1, ∀r ∈ [0, 1] and is 0 otherwise).
Solution
Z1
1 1 1 1
x 2 dx = x 2 = 1 ok
2 0
0
n
X
E[X] = xi p(i)
i=1
We note that the expected value of a random variable X may also be referred to as
the mean value of X or the first moment of the random variable X.
Example
Find the expected value of the discrete random variable X with range space Z+
and pmf given by pX (k) = (1 − ρ)ρk , 0 ≤ ρ < 1.
Solution
∞
X
E[X] = k pX (k)
k=0
∞ ∞
X
k δ X k
= (1 − ρ) k ρ = (1 − ρ)ρ ρ
δρ
k=0 k=0
δ 1 ρ
= (1 − ρ)ρ =
δρ 1 − ρ 1−ρ
Z
E[X] = xfX (x) dx
Example
Find the expected value of the continuous random variable X with range space
R+ and exponential pdf given by f (r) = λe−λr ; λ > 0.
Solution
Z ∞
E[X] = rλe−λr dr
0
∞ Z ∞
= −re−λr + e−λr dr
r=0 0
1 −λr ∞
=− e
λ r=0
1
=
λ
The expected value gives limited information about the distribution of a random
variable, as quite dissimilar random variables may have the same mean value.
We note that Var(X) = E[X 2 ] − E 2 [X] and that E[X 2 ] is referred to as the second
moment of X.
Example
Find the variance of the continuous random variable X with range space R+ and
exponential pdf given by f (r) = λe−λr ; λ > 0.
Solution
To model such systems, the notion of a stochastic process (or random process) is
useful.
The index set I may be discrete (e.g. I = Z+ ) or continuous (e.g. I = [0, ∞)). I is
usually interpreted as being time (either discrete or continuous).
We can view a stochastic process as being a mapping from each sample point ω ∈ Ω
to a function of time and note that:
X(t, ω) for a given ω is also called a trajectory or sample path of the random process.
We observe that the probability distribution governing the likelihood of different ω’s
dictates the likelihood of different trajectories that the output value of the stochastic
process will take over time.
We also observe that, at a given time t, X(t, ω) describes the likelihood of different
values (states) of the process. That is, at a given point in time (e.g. t = t1 ), the
random variable X(t1 , ω) has a cdf (or pmf if discrete) describing the likelihood of the
process being in different states at that time.
Consider a game where a coin is tossed repeatedly (ad infinitum) and the player’s score
is accumulated by adding 1 point when a head turns up and deducting 1 point when a
tail turns up. Let us describe this process as a stochastic process defined on a common
probability space (Ω, F, P).
A single outcome of the experiment is some infinite sequence of equally likely 1’s and
-1’s, that is the sample space is a product space:
We note that at any fixed value of t ∈ Z+ , we have a random variable. For example,
X(2, ω) is a random variable with associated pmf:
1 1 1
X(2, −2) = , X(2, 0) = , X(2, 2) =
4 2 4
Dr Conor McArdle EE414 - Probability & Stochastic Processes 55/60
Stochastic Processes: Classifications
1 The State Space, the set of possible values (or states) that X(t, ω) can take on.
The state space can either be (i) discrete (finite or countable set of states) or (ii)
continuous (values over continuous intervals).
2 The Parameter Space, the permitted times at which changes in state may occur.
The parameter space can either be (i) discrete (discrete time process) or (ii)
continuous (continuous time process).
Statistical Dependencies
Firstly let us consider possible probabilistic relationships between two random variables
X and Y . Consider the events (X ≤ x) and (Y ≤ y). The events are independent if
Where this is not the case, there is a statistical dependency between the events.
Where this is not the case, the probabilistic dependencies between X and Y can be
described in terms of joint probability functions.
The the notion of joint distributions and joint density functions can be extended to a
group of any number of random variables.
Consider the the stochastic process X(t, ω) as an infinite series of random variables
X(ti , ω) where i ∈ I an infinite index set. The the joint distribution function of these
random variables can be denoted:
We note that independent processes are somewhat trivial, given that the state of the
process does not evolve from (depend on) previous states. For (more interesting)
process that are not independent, the statistical dependence between states at different
times is expressed in the joint distribution function, however, in general, this function is
complex and so simpler mechanisms of specification are more useful.
An Ergodic Process is a stochastic process where a full description of the process can
be determined from a single (infinitely long) sample path of the process. This infers
that the behaviour of the process, after a long period of evolution, becomes
independent of the starting point of the process.
Exercise 6
Give a classification of the stochastic process described in the previous example
(the infinite coin tossing game).