You are on page 1of 4

Handling Distributions

Statistical distributions: Statistical distributions or probability distributions describe the outcomes of


varying a random variable, and the probability of occurrence of those outcomes. When the random
variable takes only discrete values, the corresponding probability distributions are called discrete
probability distributions. Examples of this kind are the binomial distribution, Poisson distribution,
and hypergeometric distribution. On the other hand, when the random variable takes continuous
values, the corresponding probability distributions are called continuous probability distributions.
Examples of this kind are normal, exponential, and gamma distributions.

Random sampling In statistics, a finite subset of individuals from a population is called a sample. In
random sampling, the samples are drawn at random from the population, which implies that each
unit of population has an equal chance of being included in the sample.

Random number generator (RNG) A random number generator is a computational or physical device
designed to generate a sequence of numbers that appear to be independent draws from a
population, and that also pass a series of statistical tests. They are also called Pseudo-random
number generators, since the random numbers generated through this method are not actual, but
simulated. In this article, we will consider RNG's which generate random numbers between 0 and 1,
also called uniform RNG's.

The following steps are typically performed for the Monte Carlo simulation of a physical process.

Static Model Generation Every Monte Carlo simulation starts off with developing a deterministic
model which closely resembles the real scenario. In this deterministic model, we use the most likely
value (or the base case) of the input parameters. We apply mathematical relationships which use the
values of the input variables, and transform them into the desired output.

Input Distribution Identification When we are satisfied with the deterministic model, we add the risk
components to the model. As mentioned before, since the risks originate from the stochastic nature
of the input variables, we try to identify the underlying distributions, if any, which govern the input
variables. This step needs historical data for the input variables. There are standard statistical
procedures to identify input distributions.

Random Variable Generation After we have identified the underlying distributions for the input
variables, we generate a set of random numbers (also called random variates or random samples)
from these distributions. One set of random numbers, consisting of one value for each of the input
variables, will be used in the deterministic model, to provide one set of output values. We then
repeat this process by generating more sets of random numbers, one for each input distribution, and
collect different sets of possible output values. This part is the core of Monte Carlo simulation..

Analysis and Decision Making After we have collected a sample of output values in from the
simulation, we perform statistical analysis on those values. This step provides us with statistical
confidence for the decisions which we might make after running the simulation
We have so far encountered a number of stochastic processes. Examples are the Arrival process in a
queuing network, the time taken to service a job in a server, the failure process of components in a
system etc. You can list as many as you wish. In fact many physical processes do contain some degree
of randomness and the study of random processes is of paramount importance to any professional.
You, as engineers will encounter randomness in your professional life on daily basis.

Power systems engineer: the load actual load in given city is a random variable. You may be able to
describe the load by some distribution function

Telecoms engineer: the utilization of your network is a random variable, etc.

In the context of queuing networks, we have so far assumed a nice Poisson distribution both for the
arrival and for the service processes. I remind you that this distribution implies two important
properties: the process is Memory-less, the probability of an event during a small time interval dt is
given by dt where  is the average event rate. The distribution of time between Poisson events is
given by:

f (t )   e   t

Simulation of a uniformly distributed variable:


Suppose that we need to simulate a variable that is uniformly distributed between two values, a, b.

We know the distribution function is given by:

1
f (t ) 
b  a for b≤t≤a and zero otherwise.

The variable t is equally likely to occur anywhere within this interval. We can generate a value for t
satisfying above distribution using the following simple algorithm:

Generate p = Random ( )

Compute t  a  p (b  a)

Note, the function Random ( ) is intended to generate a real number in the range 0..1. Most software
tool provide a function to generate such random numbers (in the range 0..1), see Excel RAND(). Some
tools do generate a random integer in the rand 0.. MAX_INT. This could then be used to generate the
number in the range 0..1 (i.e. divide the generated number by MAX_INT).

If the algorithm above is repeated a large number of times, you can show that resulting t's will
correspond exactly to a uniform distribution.

Simulation of a Poisson Process:


Suppose that we need to simulate the arrival of messages into a server. Let us take this arrival
process to be a Poisson process. We need to generate times t1, t2, t3, t4, t5, .. such that the distribution
of times between events conforms to the Poisson distribution.

Starting at any time, we know that the probability of an event to occur after time t is given by:
p  e  t

We know that p ≤1. If we generate a probability p, then we can compute the variable t as

1
t   ln( p )

The algorithm is:

Generate p = Random ( )

1
t   ln( p)
Compute 

Simulation of a Normally Distributed Process:


Suppose that we need to simulate a variable that is known to have a Gaussian distribution:

 x  2
1
f ( x)  e 2 2

2 
The cumulative probability is

x
p  f  u  du


To generate a random number from this Normal distribution we can do the following:

Generate p = Random ( )

x
p  f  u  du
Solve for x in: 

This could be solved numerically or using the Normal distribution tables. For those of you who
consider themselves to be ‘Lazy', you can copy the Normal distribution tables into your computer
code and use the search and/or interpolation techniques to find x.

This algorithm, for the Normal distribution, is not quite efficient and you can use it if and when you
don't care about the time spent solving for x. Also, remember that in many uses you will need to
repeat the process so many times and the algorithm will sooner or later become unbearably
inefficient.

Hit-Miss Algorithm:
This is a very powerful algorithm that could be used to generate random numbers from a distribution
where the we know the function is a bit complicated and does not render itself to an analytical
solution. The Normal distribution above is a typical example.
Let us state the following:

 The random variable x is limited to the range x1 < x < x2

f  f ( x)  f
 The distribution function min max where f
min and fmax are the minimum and
maximum values that could be taken by the function respectively.

For the Normal distribution above, f(x) becomes negligibly small for values of x    4 and
x    4 (please convince yourself by consulting the widely available Normal distribution tables).

1
f max 
Again, for the Normal distribution, f min  0 and 2 

The algorithm proceeds as follows:

x0  x1   x 2  x1 .Random()
Generate x0 in the range (x1,x2):

y1  f min   f max  f min .Random()


Generate y1 in the range (fmin , fmax):

y 0  f ( x0 )
Compute y0 as:

If (y1 < y0 ) then

Hit: Accept x0 as a number from the given distribution & exit.

Else

Miss: repeat above procedure.

End.

Using the Normal distribution, this algorithm returns a value in 2-4 iterations.

You might also like