You are on page 1of 38

ANADI

Probabilities and Distributions

Licenciatura em Engenharia Informatica

2015/2016

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

1 / 38

Attention:
This document is a collection of material gather from the bibliography of the Curricular Unit.
Further details can be found in [2], [3] and [1].

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

2 / 38

Deterministic and Random Variables

Deterministic data:
data generated in accordance to known and precise laws.
Attributes:
the same data will be obtained, within the precision of the measurements, under repeated
experiments in well- defined conditions.

Random data:
it does not exists a precise mathematical law describe the data.
there is no possibility of obtaining the same data in repeated experiments, performed under
similar conditions.
A dataset is one realization (or one instance) of a set consisting of a possibly infinite number of
realizations of a generating process random or stochastic process.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

3 / 38

Discrete Variables
Digital Channel Adapted from Example 3-4 in [3]
There is a chance that a bit transmitted through a digital transmission channel is received in
error. Let X equal the number of bits in error in the next four bits transmitted. The possible
values for X are {0, 1, 2, 3, 4}. Let us assume that the probabilities are:

P(X = 0) = 0.6561,

P(X = 1) = 0.2916,

P(X = 3) = 0.0036,

P(X = 4) = 0.0001

P(X = 2) = 0.0486,

The probability distribution of X is specified by the possible values along with the probability of
each.

Practical Interpretation: A random experiment can often be summarized with a random


variable and its distribution. The details of the sample space can often be omitted.
EOS (eos@isep.ipp.pt)

ANADI

2015/2016

4 / 38

Let X a discrete (random) variable that takes the distinctive values x1 , x2 , ..., xn .

Definition
The probability mass function f , of a discrete variable X , is such that:


f (x) =

P (X = x)
0

if
if

x = xi
, i = 1, 2, ..., n
x 6= xi

Properties

0 f (xi ) 1
n

f (xi ) = 1

i=1

For the previous example we get:

f (0) = 0.6561, f (1) = 0.2916, f (2) = 0.0486, f (3) = 0.0036, f (4) = 0.0001

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

5 / 38

Discrete Variables

Digital Channel
The probability that three or fewer bits are in error is given by P(X 3). The event that X 3 is
the union of the events {X = 0}, {X = 1}, {X = 2}, {X = 3} and all these events are mutually
exclusive. Therefore,

P(X 3) = P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3)


= 0.6561 + 0.2916 + 0.0486 + 0.00036 = 0.9999
There is a probability of 99.99% of three or fewer bits are in error.

Definition
The cumulative distribution function F , of a discrete variable X , is such that:
k

F(x) = P(X x) =

f (xi ),

i=1

where xk larger variable of the random variable X that is less than x.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

6 / 38

Discrete Variables
Properties

0 F(x) 1
F(x2 ) F(x1 ), x1 , x2 with x1 > x2 (F is monotonous not decreasing)
lim F (x) = 0 e lim F (x) = 1

x+

P (x1 x x2 ) = F (x2 ) F (x1 ), x1 , x2 with x2 > x1


Even if the random variable X can assume only integer values, the cumulative distribution
function is defined at noninteger values. In the digital Channel Example we have:

F(1.5) = P(X 1.5) = P(X = 0) + P(X = 1) = 0.6561 + 0.2916 = 0.9477


F(x) = 0.9477,
1x<2

x<0
0,

0.6561, 0 x < 1

0.9477, 1 x < 2
F(x) =
0.9963, 2 x < 3

0.9999, 3 x < 4

1,
x4
F(x) is piecewise constant between the values x1 , x2 , . . . .
P(X xi ) can be determined from the jump at the value xi .
EOS (eos@isep.ipp.pt)

ANADI

2015/2016

7 / 38

Discrete Variables

Some notation and concepts1 :

n: size of a sample dataset;


X : discrete random variable
k distinct values xi each one occurring ni times
k

n=

ni

i=1

ni : absolute frequency
ni
fi = : relative frequency
n
Pi P(X = xi )
P(X): probability function
F(X): distribution function
See Table 1.5, p. 11 and Figure 1.5, p. 12 in [1].

See e.g. [1], p. 10


EOS (eos@isep.ipp.pt)

ANADI

2015/2016

8 / 38

Discrete Variables

Let X be a random discrete variable:


Mean or expected value of X :

= E(X) = x f (x)
x

Variance of X :

2 = V (X) = E(X )2 = (x )2 f (x) = x2 f (x) 2


x

Standard deviation of X :

p
2

Expected value for a function of a continuous random variable:

E[h(X)] = h(x) f (x)


x

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

9 / 38

Examples of Discrete Distributions


Digital Channel
The chance that a bit transmitted through a digital transmission channel is received in error is
0.1. Also, assume that the transmission trials are independent. Let X be the number of bits in
error in the next four bits transmitted. Determine P(X = 2) .
Let the letter E denote a bit in error, and let the letter O denote that the bit is okay, that is,
received without error.
We can represent the outcomes of this experiment as a list of four letters that indicate the bits
that are in error and those that are okay. For example, the outcome OEOE indicates that the
second and fourth bits are in error and the other two bits are okay. The corresponding values for
x are
Outcome
OOOO
OOOE
OOEO
OOEE
OEOO
OEOE
OEEO
OOOO
EOS (eos@isep.ipp.pt)

x
0
1
1
2
1
2
2
3

Outcome
EOOO
EOOE
EOEO
EOEE
EEOO
EEOE
EEEO
EEEE

ANADI

x
1
2
2
3
2
3
3
4
2015/2016

10 / 38

Examples of Discrete Distributions


Digital Channel(cont.)
The event that X = 2 consists of the six outcomes: EEOO, EOEO, EOOE, OEEO, OEOE,
OOEE. Using the assumption that the trials are independent, the probability of EEOO is

P(EEOO) = P(E) P(E) P(O) P(O) = (0.1)2 (0.9)2 = 0.0081.


Also, any one of the six mutually exclusive outcomes for which X = 2 has the same probability of
occurring. Therefore,

P(X = 2) = 6 0.0081 = 0.0486.


In general, P(X = x) =(number of outcomes that result in x errors) (0.1)x (0.9)4x .

Binomial Distribution
Description:
Probability of k successes in n independent and constant probability Bernoulli trials.
Sample space: {0, 1, . . . , n}.
 
Probability function: bn,p (k) P(X = x) =

n k
p (1 p)nk with k = 0, 1, 2, . . . , n.
k

Distribution function: Bn,p (k) =

bn,p (i).

i=0
EOS (eos@isep.ipp.pt)

ANADI

2015/2016

11 / 38

Examples of Discrete Distributions

Bernoulli Distribution Success or failure in one trial. A dichotomous trial is also called a Bernoulli
trial.
See B.1.1 in [1], p. 431.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

12 / 38

Examples of Discrete Distributions (cont.)

Uniform Distribution Probability of occurring one out of n equiprobable events.


See B.1.2 in [1], p. 432.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

13 / 38

Examples of Discrete Distributions (cont.)

Geometric Distribution Probability of an event occurring for the first time at the kth trial, in a
sequence of independent Bernoulli trials, when it has a probability p of
occurrence in one trial.
See B.1.3 in [1], p. 433.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

14 / 38

Examples of Discrete Distributions (cont.)

Hypergeometric Distribution Probability of obtaining k items, of one out of two categories, in a


sample of n items extracted without replacement from a population of N items
that has D = pN items of that category (and (1 p)N = qN items from the
other category). In quality control, the category of interest is usually one of the
defective items.
See B.1.4 in [1], p. 434.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

15 / 38

Examples of Discrete Distributions (cont.)


Binomial Distribution Probability of k successes in n independent and constant probability
Bernoulli trials.
See B.1.5 in [1], p. 435.

P(X = x) dbinom and P(X x)pbinom


Example
If X B(10, 0.1) then for computing P(X 3) in R we do: > pbinom(3,10,0.1) and get

[1] 0.9872048
For P(X > 3) we have two options: > 1-pbinom(3,10,0.1) or
> pbinom(3,10,0.1,lower.tail = F)
EOS (eos@isep.ipp.pt)

ANADI

2015/2016

16 / 38

Examples of Discrete Distributions (cont.)

Multinomial Distribution Generalization of the binomial law when there are more than two
categories of events in n independent trials with constant probability, pi (for
i = 1, 2, . . . , k categories), throughout the trials.
See B.1.6 in [1], p. 436.

P(X = x) dmultinom and P(X x)pmultinom

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

17 / 38

Examples of Discrete Distributions (cont.)


Poisson Distribution Probability of obtaining k events when the probability of an event is very
small and occurs at an average rate of events per unit time or space
(probability of rare events).
See B.1.6 in [1], p. 438.

P(X = x) dpois and P(X x)ppois


For the expressions of the means and variances, and main properties see e.g. Appendix B in [1].

Type ?distribution for all available distributions!


EOS (eos@isep.ipp.pt)

ANADI

2015/2016

18 / 38

Continuous Variables
Let X be a random continuous variable:
the variable can assume an infinite number of possible values;
the probability associated to each particular value is zero, i.e. P(X = x) = 0
probabilities associated to intervals of the variable domain can be non-zero

f (x): the probability density function (pdf)


f (x) 0
P(a X b) =

Z b

f (x) dx = area under f from a to b, for any a and b.


a

f (x) dx = 1, where D is the domain of the variable


D

P(x1 X x2 ) = P(x1 < X x2 ) = P(x1 X < x2 ) = P(x1 < X < x2 ), for any x1 and x2 .
F(x): distribution function
F(u) = P(X u) =

Z u

f (x) dx

f (x) =

dF
dx

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

19 / 38

Continuous Variables
Let X be a random continuous variable:
Mean or expected value of X :
Z

= E(X) =

x f (x) dx
D

Variance of X :

2 = V (X) =

(x )2 f (x) dx =

x2 f (x) dx 2

Standard deviation of X :

p
2

Expected value for a function of a continuous random variable:


Z
E[h(X)] = h(x) f (x) dx
D

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

20 / 38

Examples of Continuous Distributions

Uniform Distribution Equiprobable equal-length sub-intervals of an interval. Approximation of


discrete uniform distributions, an example of which is the random number
generator routine of a computer.
See B.2.1 in [1], p. 439.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

21 / 38

Examples of Continuous Distributions (cont.)


Normal Distribution Is an approximation of the binomial distribution for large n and not too small
p and q = 1 p. It is also an approximation of large sums of random variables
acting independently Central Limit Theorem.
Measurement errors often fall into this category.
Sequences of measurements whose deviations from the mean satisfy this
distribution law, the so-called normal sequences, were studied by Karl F. Gauss
(1777-1855).
See B.2.2 in [1], p. 441.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

22 / 38

Examples of Continuous Distributions (cont.)


Standard normal distribution: a random
variable with

= 0 and 2 = 1
is called a standard normal distribution
variable and is denoted as Z .
The cumulative distribution function is

(z) = P(Z z)

Standardizing a Normal Random Variable:


If X is a normal random variable with E(X) =
and V (X) = 2 , the random variable

Z=

in [3]


P(X x) = P

X x

= P(Z z)

is a normal random variable with E(X) = 0 and


V (X) = 1, i.e. Z is a standard normal random
variable.
EOS (eos@isep.ipp.pt)

ANADI

2015/2016

23 / 38

Examples of Continuous Distributions (cont.)

Exponential Distribution Distribution of decay phenomena, where the rate of decay is constant,
such as in radioactivity phenomena.
See B.2.3 in [1], p. 442.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

24 / 38

Examples of Continuous Distributions (cont.)


Weibull Distribution The Weibull distribution describes the failure rate of equipment and the
wearing-out of materials.
See B.2.4 in [1], p. 444.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

25 / 38

Examples of Continuous Distributions (cont.)

Gamma Distribution Is a sort of generalization of the exponential distribution, since the sum of
independent random variables, each with the exponential distribution, follows
the Gamma distribution. Several continuous distributions can be regarded as a
generalization of the Gamma distribution.
See B.2.5 in [1], p. 445.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

26 / 38

Examples of Continuous Distributions (cont.)


Beta Distribution The Beta distribution is a continuous generalization of the binomial
distribution.
See B.2.6 in [1], p. 446.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

27 / 38

Examples of Continuous Distributions (cont.)

Chi-Square Distribution The sum of squares of independent random variables, with standard
normal distribution, follows the chi-square (2 ) distribution. The number n of
added terms is the so-called number of degrees of freedom , = d f = n
(number of terms that can vary independently, achieving the same sum).
See B.2.7 in [1], p. 448.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

28 / 38

Examples of Continuous Distributions (cont.)

Students t Distribution is the distribution followed by the ratio of the mean deviations over the
sample standard deviation.
See B.2.8 in [1], p. 449.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

29 / 38

Examples of Continuous Distributions (cont.)

F Distribution was introduced by Ronald A. Fisher (1890-1962), in order to study the ratio of
variances. The ratio of two independent Gamma-distributed random variables,
each divided by its mean, also follows the F distribution.
See B.2.9 in [1], p. 451.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

30 / 38

Normal approximation to the Binomial and Poisson distributions

Normal Approximation to the Binomial Distribution:


If X is a binomial random variable with parameters n and p,

X np
Z= p
np(1 p)
is approximately a standard normal random variable.
To approximate a binomial probability with a normal distribution, a continuity correction is
applied as follows:

x + 0.5 np
P(X x) = P(X x + 0.5)
=P Z p
np(1 p)

x 0.5 np
P(X x) = P(X x 0.5)
=P Z p
np(1 p)

and

The approximation is good for np > 5 and n(1 p) > 5.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

31 / 38

Normal approximation to the Binomial and Poisson distributions

If X is a Poisson random variable with E(X) = and V (X) = , then

X
Z=

is approximately a standard normal random variable. The same continuity correction used
for the binomial distribution can also be applied. The approximation is good for

> 5.

In [3]

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

32 / 38

Joint Probability Distributions


Joint Probability Mass Function, fXY (x, y), of two discrete variables X and Y is such that:
fXY (x, y) 0

fXY (x, y) = 1
x

fXY (x, y) = P(X = x,Y = y)


Joint Probability Density Function, fXY (x, y), of two continuous variables X and Y , is
such that:
fXY (x, y) 0 for all x, y
Z + Z +

fXY (x, y) dxdy = 1

For any region R of two-dimensional space,

P((X,Y ) R) =

Z Z
R

fXY (x, y)) dxdy

Marginal probability density functions:


Z

fX (x) =
and

fY (x) =

EOS (eos@isep.ipp.pt)

fXY (x, y)dy

ANADI

fXY (x, y)dx

2015/2016

33 / 38

joint Probability distributions


Server Access Time Adapted from Example 5-2 in [3]
Let the random variable X denote the time until a computer server connects to your machine (in
milliseconds), and let Y denote the time until the server authorizes you as a valid user (in
milliseconds).
Each of these random variables measures the wait from a common starting time and X < Y .
Assume that the joint probability density function for X and Y is

fXY (x, y) = 6 106 exp(0.001x 0.002y),

x<y

The probability that X < 1000 and Y < 2000 is:

P(X 1000,Y 2000) =

Z 1000 Z 2000
0

fXY (x, y)dy dx = ... = 0.915

The probability that Y exceeds 2000 milliseconds is


Z 2000 Z

P(Y > 2000)

=
0

+
2000


6 106 exp(0.001x 0.002y)dy dx
2000
Z

6 106 exp(0.001x 0.002y) dx dy
x

= ... = 0.0475 + 0.0025 = 05


EOS (eos@isep.ipp.pt)

ANADI

(1)
2015/2016

34 / 38

Joint Probability distributions

Two random variables X and Y are Independent Random Variables if


fXY (x, y) = fX (x, y) fY (x, y), for all x and y.
All previous concepts generalize more than two variables!!
Expected Value of a Function of Two Random Variables


E[h(X,Y )] =

h(X,Y ) fXY (x, y)

R R
h(x, y) fXY (x, y) dxdy,

X,Y discrete
X,Y continuous

Covariance between to random variables:

cov(X,Y ) = XY = E[(X X )(Y Y )] = E(XY ) X Y


Covariance is a measure of linear relationship between the random variables.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

35 / 38

Joint Probability distributions

Correlation between random variables X and Y :

corr(X,Y )
XY
=
XY = p
V [X]V [Y ] X Y
Given that X > 0 and Y > 0, if the covariance between X and Y is positive, negative, or zero,
the correlation between X and Y is positive, negative, or zero, respectively.

1 XY +1
The correlation is a dimensionless quantity that can be used to compare the linear relationships
between pairs of variables in different units.
Two random variables with nonzero correlation are said to be correlated.
the correlation is a measure of the linear relationship between random variables

If X and Y are independent random variables,

XY = XY = 0

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

36 / 38

Exercises

Selected exercises from [3]:


Binomial: 3-105, 3-107
Geometric: 3-129, 3.137
Hypergeometric: 3-153
Poisson: 3-165, 3-169
Normal Distribution: 4-72, 4-75, 4-91, 4-101, 4-105
Exponential Distribution: 4-119
Weibull distribution: 4-157, 4-163
Solutions:
3-85. = 7, = 1.414; 3-87. 0.0001; 3-105. (a) 1 (b) 0.999997 (c) E(X) = 12.244, V(X) = 2.179;
3-107. (a) Binomial, p = 104 /369 , n = 1E09 (b) 0 (c) E(X)= 4593.9, V(X) =4593.9; 3-129. (a)
3.91 1019 (b) 200 (c) 2.56 1018 ; 3-137. (a) 0.0604 (b) 0.1808 (c) 15; 3-153. (a) 0.25 (b)
0.72 (c) 0.16 (d) 5; 3-165. (a) 0.2 (b) 99.89%; 3-169. (a) 0.026 (b) 0.287 (c) 0.868; 4-91. =
0.912; 4-75. (a) 0.0082 (b) 0.72109 (c) 0.564; 4-105. (a) 0 (b) 0.156 (c) 10,233 (d) 8.3 days/year
(e) 0.0052; 4-119. (a) 0.0498 (b) 0.8775; 4-157. (a) 803.68 hours (b) 85319.64 (c) 0.1576;
4-163. (a) 0.5698 (b) 0.1850 (c) 0.4724

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

37 / 38

References

Joaquim P. Marques de Sa.


Applied Statistics Using SPSS, STATISTICA, MATLAB and R.
John Wiley & Sons, 2nd edition, 2007.
Douglas C. Montgomery.
Design and Analysis of Experiments.
John Wiley & Sons, New York, 8th edition, 2013.
Disponvel na Biblioteca.
Douglas C. Montgomery and George C. Runger.
Applied statistics and probability for engineers.
John Wiley & Sons, Hoboken, NJ, 5th edition, 2013.
Disponvel na Biblioteca.

EOS (eos@isep.ipp.pt)

ANADI

2015/2016

38 / 38

You might also like