Professional Documents
Culture Documents
2015/2016
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
1 / 38
Attention:
This document is a collection of material gather from the bibliography of the Curricular Unit.
Further details can be found in [2], [3] and [1].
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
2 / 38
Deterministic data:
data generated in accordance to known and precise laws.
Attributes:
the same data will be obtained, within the precision of the measurements, under repeated
experiments in well- defined conditions.
Random data:
it does not exists a precise mathematical law describe the data.
there is no possibility of obtaining the same data in repeated experiments, performed under
similar conditions.
A dataset is one realization (or one instance) of a set consisting of a possibly infinite number of
realizations of a generating process random or stochastic process.
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
3 / 38
Discrete Variables
Digital Channel Adapted from Example 3-4 in [3]
There is a chance that a bit transmitted through a digital transmission channel is received in
error. Let X equal the number of bits in error in the next four bits transmitted. The possible
values for X are {0, 1, 2, 3, 4}. Let us assume that the probabilities are:
P(X = 0) = 0.6561,
P(X = 1) = 0.2916,
P(X = 3) = 0.0036,
P(X = 4) = 0.0001
P(X = 2) = 0.0486,
The probability distribution of X is specified by the possible values along with the probability of
each.
ANADI
2015/2016
4 / 38
Let X a discrete (random) variable that takes the distinctive values x1 , x2 , ..., xn .
Definition
The probability mass function f , of a discrete variable X , is such that:
f (x) =
P (X = x)
0
if
if
x = xi
, i = 1, 2, ..., n
x 6= xi
Properties
0 f (xi ) 1
n
f (xi ) = 1
i=1
f (0) = 0.6561, f (1) = 0.2916, f (2) = 0.0486, f (3) = 0.0036, f (4) = 0.0001
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
5 / 38
Discrete Variables
Digital Channel
The probability that three or fewer bits are in error is given by P(X 3). The event that X 3 is
the union of the events {X = 0}, {X = 1}, {X = 2}, {X = 3} and all these events are mutually
exclusive. Therefore,
Definition
The cumulative distribution function F , of a discrete variable X , is such that:
k
F(x) = P(X x) =
f (xi ),
i=1
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
6 / 38
Discrete Variables
Properties
0 F(x) 1
F(x2 ) F(x1 ), x1 , x2 with x1 > x2 (F is monotonous not decreasing)
lim F (x) = 0 e lim F (x) = 1
x+
x<0
0,
0.6561, 0 x < 1
0.9477, 1 x < 2
F(x) =
0.9963, 2 x < 3
0.9999, 3 x < 4
1,
x4
F(x) is piecewise constant between the values x1 , x2 , . . . .
P(X xi ) can be determined from the jump at the value xi .
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
7 / 38
Discrete Variables
n=
ni
i=1
ni : absolute frequency
ni
fi = : relative frequency
n
Pi P(X = xi )
P(X): probability function
F(X): distribution function
See Table 1.5, p. 11 and Figure 1.5, p. 12 in [1].
ANADI
2015/2016
8 / 38
Discrete Variables
= E(X) = x f (x)
x
Variance of X :
Standard deviation of X :
p
2
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
9 / 38
x
0
1
1
2
1
2
2
3
Outcome
EOOO
EOOE
EOEO
EOEE
EEOO
EEOE
EEEO
EEEE
ANADI
x
1
2
2
3
2
3
3
4
2015/2016
10 / 38
Binomial Distribution
Description:
Probability of k successes in n independent and constant probability Bernoulli trials.
Sample space: {0, 1, . . . , n}.
Probability function: bn,p (k) P(X = x) =
n k
p (1 p)nk with k = 0, 1, 2, . . . , n.
k
bn,p (i).
i=0
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
11 / 38
Bernoulli Distribution Success or failure in one trial. A dichotomous trial is also called a Bernoulli
trial.
See B.1.1 in [1], p. 431.
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
12 / 38
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
13 / 38
Geometric Distribution Probability of an event occurring for the first time at the kth trial, in a
sequence of independent Bernoulli trials, when it has a probability p of
occurrence in one trial.
See B.1.3 in [1], p. 433.
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
14 / 38
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
15 / 38
[1] 0.9872048
For P(X > 3) we have two options: > 1-pbinom(3,10,0.1) or
> pbinom(3,10,0.1,lower.tail = F)
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
16 / 38
Multinomial Distribution Generalization of the binomial law when there are more than two
categories of events in n independent trials with constant probability, pi (for
i = 1, 2, . . . , k categories), throughout the trials.
See B.1.6 in [1], p. 436.
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
17 / 38
ANADI
2015/2016
18 / 38
Continuous Variables
Let X be a random continuous variable:
the variable can assume an infinite number of possible values;
the probability associated to each particular value is zero, i.e. P(X = x) = 0
probabilities associated to intervals of the variable domain can be non-zero
Z b
P(x1 X x2 ) = P(x1 < X x2 ) = P(x1 X < x2 ) = P(x1 < X < x2 ), for any x1 and x2 .
F(x): distribution function
F(u) = P(X u) =
Z u
f (x) dx
f (x) =
dF
dx
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
19 / 38
Continuous Variables
Let X be a random continuous variable:
Mean or expected value of X :
Z
= E(X) =
x f (x) dx
D
Variance of X :
2 = V (X) =
(x )2 f (x) dx =
x2 f (x) dx 2
Standard deviation of X :
p
2
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
20 / 38
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
21 / 38
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
22 / 38
= 0 and 2 = 1
is called a standard normal distribution
variable and is denoted as Z .
The cumulative distribution function is
(z) = P(Z z)
Z=
in [3]
P(X x) = P
X x
= P(Z z)
ANADI
2015/2016
23 / 38
Exponential Distribution Distribution of decay phenomena, where the rate of decay is constant,
such as in radioactivity phenomena.
See B.2.3 in [1], p. 442.
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
24 / 38
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
25 / 38
Gamma Distribution Is a sort of generalization of the exponential distribution, since the sum of
independent random variables, each with the exponential distribution, follows
the Gamma distribution. Several continuous distributions can be regarded as a
generalization of the Gamma distribution.
See B.2.5 in [1], p. 445.
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
26 / 38
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
27 / 38
Chi-Square Distribution The sum of squares of independent random variables, with standard
normal distribution, follows the chi-square (2 ) distribution. The number n of
added terms is the so-called number of degrees of freedom , = d f = n
(number of terms that can vary independently, achieving the same sum).
See B.2.7 in [1], p. 448.
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
28 / 38
Students t Distribution is the distribution followed by the ratio of the mean deviations over the
sample standard deviation.
See B.2.8 in [1], p. 449.
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
29 / 38
F Distribution was introduced by Ronald A. Fisher (1890-1962), in order to study the ratio of
variances. The ratio of two independent Gamma-distributed random variables,
each divided by its mean, also follows the F distribution.
See B.2.9 in [1], p. 451.
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
30 / 38
X np
Z= p
np(1 p)
is approximately a standard normal random variable.
To approximate a binomial probability with a normal distribution, a continuity correction is
applied as follows:
x + 0.5 np
P(X x) = P(X x + 0.5)
=P Z p
np(1 p)
x 0.5 np
P(X x) = P(X x 0.5)
=P Z p
np(1 p)
and
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
31 / 38
X
Z=
is approximately a standard normal random variable. The same continuity correction used
for the binomial distribution can also be applied. The approximation is good for
> 5.
In [3]
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
32 / 38
fXY (x, y) = 1
x
P((X,Y ) R) =
Z Z
R
fX (x) =
and
fY (x) =
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
33 / 38
x<y
Z 1000 Z 2000
0
=
0
+
2000
6 106 exp(0.001x 0.002y)dy dx
2000
Z
6 106 exp(0.001x 0.002y) dx dy
x
ANADI
(1)
2015/2016
34 / 38
E[h(X,Y )] =
R R
h(x, y) fXY (x, y) dxdy,
X,Y discrete
X,Y continuous
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
35 / 38
corr(X,Y )
XY
=
XY = p
V [X]V [Y ] X Y
Given that X > 0 and Y > 0, if the covariance between X and Y is positive, negative, or zero,
the correlation between X and Y is positive, negative, or zero, respectively.
1 XY +1
The correlation is a dimensionless quantity that can be used to compare the linear relationships
between pairs of variables in different units.
Two random variables with nonzero correlation are said to be correlated.
the correlation is a measure of the linear relationship between random variables
XY = XY = 0
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
36 / 38
Exercises
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
37 / 38
References
EOS (eos@isep.ipp.pt)
ANADI
2015/2016
38 / 38