You are on page 1of 21

Chapter 4

Random Variable and Probability Distributions


4.1 Random Variables
Random variable is a function which assigns a real number to each outcome in the
sample space of a random experiment. Mathematically a random variable is defined as
a function from the sample space S to the line of real numbers IR, i.e.
X: SR
The value of a random variable is determined by the result of a random experiment.
Example: Flip a coin three times, let X be the number of heads in three tosses.
S= {(HHH), (HHT), (HTH), (HTT), (THH), (THT), (TTH), (TTT)}
X (HHH) =3,
X (HHT) = X (HTH) = X (THH) = 2
X (HTT) = X (THT) = X (TTH) =1
X (TTT) =0
X= {0,1,2,3}. X assumes a specific number of values with some probabilities.
Random variables are of two types:
1. Discrete Random Variables: are variables which can assume only integer or
whole numbers. Its range is finite (or countable infinite).
Examples:
Toss coin n times and count the number of heads.
Number of children in a family.
Number of car accidents per week.

Number of bacteria per two cubic centimeter of water.


2. Continuous random variable: are variables that can assume all values
between any two given values (intervals) or which can assume decimals or
fractions.
Examples:
Height of students at certain college.
Mark of a student.
Life time of light bulbs.
Length of time required to complete some task.
4.2 Probability Distribution
Definition: Probability distribution of a random variable X is a description of the
probabilities associated with the possible values of X. Since, there are two types of
random variables; there are also two types of probability distribution. These are
discrete and continuous probability distributions.

Page 1 of 22

Example: Consider the experiment of tossing a coin three times. Let X be the number
of heads observed. Here, X is a random variable which assumes values, 0, 1, 2 & 3.
The probability distribution of X can be constructed as follows:
Find all possible values of X, X:{0,1,2,3}
Find corresponding probability for all possible values of X
Present in table by associating each possible value with the corresponding
probability i.e,
X=x
0
1
2
3
Total
P(X=x)

1/8

3/8

3/8

1/8

8/8=1

Note: We can list possible values like we did above only for discrete variables when X
assumes few possible values. Otherwise probability distribution is expressed in
function. Probability distribution for discrete random variable is called probability
mass function & denoted by P(x) while probability distribution for continuous random
variable is called probability density function & denoted by f(x).
Properties of Probability Distribution
The probability distribution of a random variable should satisfy the following
conditions.
A. If X is discrete random variable the
1.
2.

P ( X=x )

if X is discrete.

P ( X =x) =1
i=1

3. P(a X b)

b1

P ( X=x )

x=a +1

P(a X b)

b1

P ( X =x )
x=a

P(a X b)

P ( X=x )

x=a +1

P(a X b)

P ( X =x )
x=a

B. If X is a continuous random variable then


1.

f ( x ) 0

Page 2 of 22

2.

f ( x ) dx=1

1. P(a X b)

f ( x ) dx
a

4. Probability of a fixed value of a continuous random variable is zero


(probability at a point is zero for continuous variables).
5.

P(a X b) P(a X b) P(a X b) P(a X b)=

f ( x ) dx
a

6. Probability means area for continuous random variable.

4.3 Cumulative Probability Distributions


The cumulative distribution function (CDF) of a random variable X is the function

F ( x )=Pr ( X x ) x R .
The cumulative distribution function exists and is defined in the same manner for all
random variables (discrete and continuous).
Note that

F: R

[0, 1].

For any CDF, we have the following properties:


1. The CDF is non-negative:

F ( x ) 0.

2. The CDF goes to zero on the far left:


than

Probabilities can never be negative.

lim x F ( x )=0 .
X is never less

3. The CDF goes to zero on the far right:


more than

lim x F ( x )=1 .
X is never

4. The CDF is non-decreasing:


is a subset of the event

F ( b ) F ( a ) if ba then the event X a


X b, and subsets never have higher probabilities.

Any function which satisfies these four properties can be used as the CDF of some
random variable or other.
Let X be discrete random variables, then the CDF F(x) satisfies

Page 3 of 22

I.
II.
III.

F(x)= P(X

)=

P ( X =x )
xi x

0F ( x ) 1
If x
y , then F ( X ) F ( y )

Example: Suppose that a days production of 850 manufactured parts contains 50


parts that dont conform to customer requirements. Two parts are selected at
random, without replacement, from the batch. Let the random variable X equal the
number of nonconforming parts in the sample. What is the cumulative distribution
of X?
Solution: First find the pmf of X

800 799 =0.866


850 849

( )( )

P ( X=0 )=
P ( X=1 )=

800 50
850 849

( )( )

=0.111

50 49
850 849

=0.003

( )( )

P ( X=2 ) =
Therefore,

F ( 0 ) =P ( X 0 ) =0.866

F ( 1 )=P ( X 1 )=0.866+0.111=0.997

F ( 2 )=P ( X 2 )=0.866+0.111+0.003=1
Note that F(X) is defined for all X from -

<X <

and not only for 0,1, and 2.

An alternative method to describe the distribution of a discrete random variable can


also be used for continuous random variables. Let X be continuous random variables,
then the CDF F(x) satisfies
I.

F(x)= P(X

)=

f ( x ) dx

II.
III.

0F ( x ) 1
If x
y , then F ( X ) F ( y )
Page 4 of 22

Extending the definition of

f (x)

to the entire real line enables us to define the

cumulative distribution function for all real numbers. Lets formally defined the
probability density function (pdf) of a random variable X,
cumulative distribution function

F(x)

F ( b )F ( a )

, with

, as the derivative of the CDF

f ( x )=

P(a X b)=

f (x)

dF ( x )
dx

Example: Let X denote the amount of space occupied by an article placed in a 1-ft3
packing container. The probability density functions of X is

8
f ( x )= 90 x ( 1x ) 0<x<1
0 otherwise

Then what is

and

P ( X 0.5 )

We know the CDF for X is

P ( 0.25<X 0.5 ) ?

0
10 x 9 x 10
1

, x0
, 0<x <1
, x1

Then the derivative of

F ( x ) exists on ( , ) and we get


for
'
0<x<1 and
F ( X )= 90 x 890 x 9
for
'
<x 0 and 1 x< , which is just the pdf of X.
F ( X )=0

i.

4.4 Mean and Variance of a random variable


Expected Value of a Random Variable

Definition:
1. Let a discrete random variable X assume the values X1, X2, .,Xn with the
probabilities P(X1), P(X2), .,P(Xn) respectively. Then the expected value of
X , denoted as E(X) is defined as:

E ( X )= X 1 P( X 1)

X 2 P(X 2 )

+ .+

Page 5 of 22

X n P(X n )

X i Pi
X=1

2. Let X be a continuous random variable assuming the values in the interval


b

(a,b) such that

, then

f ( x ) dx=1
a

E ( X )=

xf ( x ) dx
a

For any random variable X:


1. The expected value of X is its mean
Mean of X E(X)
2. The variance of X is given by:
Variance of X Var(X )
Where:

E ( X ) =

if X is discrete or,

X i 2 Pi

E ( X 2 ) ( E ( X ) )

X=1
b

if X is continuous

x 2 f ( x ) dx
a

Example 1: Compute the mean and variance of the random variable X, which denotes
the number showing up when a single die is rolled.
Solution: First, we have to find the frequency distribution,
X=xi

P(X=xi)

1/6

1/6

1/6

1/6

1/6

1/6

Example 2: Compute the mean and variance of the following probability density
function is.
Page 6 of 22

1
for x [ 0,4 ] ,
f ( x )= 4
0 otherwise .

[ ] (

1
1 1 24 1 1 2 1 2
E ( X )= xf ( x ) dx= x dx=
x =
4 0 =2
4
4 2 0 4 2
2

1
1
1 1 3
2
2
V ( X )= ( x ) f ( x ) dx = ( x 2 ) dx= ( x 4 x +4 ) dx=
x 2 x +4 x
4
40
4 3

0
2

=1.333

Let X and Y are random variables and k is a constant.

Page 7 of 22

]|

RULE 1:

E (k ) k ,

RULE 2:

Var (k ) 0 ,

RULE :3

E (kX ) kE( X )

RULE: 4

Var (kX ) k 2Var ( X ) ,

RULE:5

E ( X Y ) E ( X ) E (Y )

Exercise
1. A test consists of 4 true false questions. Assume a student give answer to all
questions by guessing. Let, X denotes the number of questions not answered
correctly.
a. Construct the probability distribution of X
b. Compute mean & variance of the random variable x
2. Compute the mean and variance of the following probability distribution.

X
for x [ 0,4 ] ,
(
)
f x= 8
0 otherwise .
3. Two dice are rolled. Let X be a random variable denoting the sum of
the numbers on the two dice.
i) Give the probability distribution of X
ii) Compute the expected value of X and its variance

4. suppose that X is a continuous random variable whose probability density


function is given by: f(x) =

C (4 X 2 X 2) ,0 <X <2
0,Otherwise

a) What is the value of C?


b) Find P(x>1)
c) Find P(1/2<x<1)
d) Find expected value and variance of X
5. A construction firm has recently sent in bids for 3 jobs worth (in profits)
10, 20, and 40 (thousand) dollars. If its probabilities of winning the jobs are

Page 8 of 22

respectively
.2, .8, and .3, what is the firms expected total profit and varianc?

Page 9 of 22

6. Simple probabilities can be computed from elementary consideration. However, in dealing with
probabilities of whole classes of events, we have to consider more efficient ways of analysis of
probability. For this purpose we should know the concept of a probability distribution.
7. Probability distribution is defined for random variables (rvs). A variable X which takes values x 1, x2,,
xn with associated probabilities p1, p2, , pn , respectively is called a random variable (rv). The summary
of distinct values xi of a random variable X together with their probabilities P(x) or f(x) is known as
Probability Distribution of random variable. The function P(x) is called Probability Mass Function
(pmf), and f(x) is called Probability Density Function (pdf).
8. Example: Let a pair of fair dice be tossed and X denote the sum of the points observed. Then the
probability distribution is
9.
X

10.
2

11.
3

12.
4

13.
5

14.
6

15.
7

16.
8

17.
9

18.
1

19.
1

20.
1

21.
P

22.
1

23.
2

24.
3

25.
4

26.
5

27.
6

28.
5

29.
4

30.
3

31.
2

32.
1

33. The probability distribution can be represented graphically by plotting P(X) against X, just as relative
frequency distributions. By cumulating probabilities, we obtain cumulative probability distributions,
which are analogous to cumulative relative frequency distributions.
34. In practice a detailed list as given in the example will not be presented. Instead, a model or formula will
be used. These models are named like normal distribution, binomial distribution, Poisson distribution,
etc. If the random variable is discrete/ continuous, the model is known as discrete/continuous probability
distribution respectively.
35.
4.4 Common Probability Distributions
4.4.1

Binomial Distribution

36. It is used to represent the probability distribution of discrete random variables. Binomial means two
categories. The successive repetition of an observation (trial) may result in an outcome which possesses
or which does not possess a specified character. Our primary interest will be either of these possibilities.
Conventionally, the outcome of primary interest is termed as success. The alternative outcome is termed
as failure. These terminologies are used irrespective of the nature of the outcome. For example, nongermination of a seed may be termed as success.
37. In binomial distribution the experiment consisting the following criteria

There are only two outcomes in Bernoulli trials (success or failure)

Fixed number of trials (n) i.e. n should be discrete

At each trial the probability of success (p) remains the same

n trials are independent.

38. The variable X which represents the count of the number of successes in Bernoulli trials will be a
discrete random variable. The probability distribution of such discrete random variable X is called the
binomial distribution.
39. The binomial distribution is given by the probability mass function ( pmf)

40.

P[X=x] =

n x n x
p q
for all possible values of X.
x

41. In the formula, n= number of trials


42.
x= number of successes in a trial
43.
n-x = number of failures in a trial
44.
p = probability of success (= x/n)
45.
q = 1 - p = probability of failure

n

x = the possible number of ways in which x successes can occur.

46.
47.

Remark: If X is a binomial random variable with parameters n and p then

48.

E ( X ) np

Var ( X ) npq

49. Example 1
50. A given mid-exam contains 10 multiple choice questions, and each question has four alternatives with
one exact answer. Find the probability that the student exactly answered
i.
ii.
iii.

3 questions
At least 3 questions
At most 2 questions

vii.

Solution: Using binomial distribution we can get the probability value easily. That is n = 10,
viii.

iv.
v.
vi.

8 questions
Find the mean and variance of x

p = (the chance of getting exact answer from 4 alternatives)

ix. q = 1- p = 1- =

x. The possible marks for a student from 10 questions are X = 0, 1, 2, 3. . . 10.


xi. P(X = x) = (nx) pxqn-x
i.
ii.

P(X = 3) = (10C3) (0.25)3(0.75)7 = 0.250


P(X = 8) = (10C8) (0.25)8 (0.75)2 = 0.00386

iii.

P(X

iv.

P(X >= x) = 1 - P(X < x). Hence P(X >= 3) = 1 P(X < 3)

2) = P(X < 2). = P(X = 0) + P(X = 1) + P(X = 2)

xii.

= 1 {P(X = 0) + P(X = 1) + P(X = 2)}

xiii.

P(X = 0) = (10Cc0) (0.25)0(0.75)10 = 0.0563

xiv.

P(X = 1) = (10C1) (0.25)1(0.75)9

xv.

P(X = 2) = (10C2) (0.25)2(0.75)8

xvi.

P(X

xvii.

.. P(X >= 3) = 1 (0.0563 + 0.1877 + 0.2816) = 0.4744

2)

xix.

The variance = npq =10x0.25x0.75 = 1.875

Poisson Distribution

A random variable X is said to have a Poisson distribution if its probability distribution


is given by:

x e
P ( X x)
, x 0,1,2,......
x!
Where the average number.

xx.
xxi.

= 0.2816

= (0.0563 + 0.1877 + 0.2816) = 0. 9256

xviii. V. The mean = np =10x0.25= 2.5.


4.4.2

= 0.1877

The Poisson distribution depends only on the average number of occurrences per unit
time of space. The Poisson distribution is used as a distribution of rare events, such
as:

Number of misprints.

Hereditary.

Natural disasters like earth quake.

Arrivals

Accidents.

The process that gives rise to such events is called Poisson process.

Examples: 1. If 1.6 accidents can be expected on any given day, what is the probability
that there will be 3 accidents on any given day?

Solution; Let X =the number of accidents,

X poisson 1.6 p X x

1.6

1.6 x e 1.6
1.6 3 e 1.6
, p X 3
0.1380
x!
3!

2. On the average, five smokers pass a certain street corners every ten minutes, what is
the probability that during a given 10minutes the number of smokers passing will be
a. 6 or fewer
b. 7 or more
c. Exactly 8
d.
e. If X is a Poisson random variable with parameters

E (X )

f.

then

Var (X )

g. Note: The Poisson probability distribution provides a close approximation to the


binomial probability distribution when n is large and p is quite small or quite large
with

np .

h.

(np) x e (np)
P( X x)
x!

i.

Usually we use this approximation if

[or
j.

, x=0, 1,2 . ,

= average value

np 5 .

In other words, if

n 20 and np 5

n(1 p) 5 ], then we may use Poisson distribution as an approximation to binomial distribution.


Example:

Find the binomial probability P(X=3) by using the Poisson distribution

p 0.01 and n 200

if

U sin g Poisson , np 0.01 * 200 2


2 3 e 2
0.1804, u sin g Binomial , n 200, p 0.01
3!
200
(0.01) 3 (0.99) 99 0.1814
P ( X 3)
3
P ( X 3)
k.

e x
x! . In the formula,

l. P[X=x] =
m.
n.

= np = mean number of times an event occurs.


x = the number of times an event occur.
e= Naperian base equaling 2.7182

o. The value of e can be obtained directly from mathematical tables. In case of Poisson
distribution the counts of alternative events, i.e., failures are not of interest. This is a contrast
between binomial and Poisson distributions. For Poisson distribution all that we need is np,
the mean number of successes. We need not know about n and p individually. Thus, the

. .The

Poisson distribution is determined by the parameter

special property of Poisson

distribution is that its mean and variance are same to . I.e. mean = variance =
.
p. Example 1: In Tekure Ambessa Hospital, the average new born female baby in every 24 hour
is 7. What is the probability that

q.

i.
No female babies are born in a day?
ii.
Only three female babies are born per day?
iii.
2 female babies are born in 12 hours
Solution
r. In this case
i.

= 7 per day

No female baby born in a day

P(X = 0) =

ii.

Only three female babies are born

iii.

2 female babies are born in 12 hours in this case

s.

P(X = 2) =

P(X = 3)

e 3.5 (3.5) 2
2!

e 7 7 0
-7
0! = e = 0.0138189
e 7 7 3
3! =

0.78998

= 72 = 3.5

= 0.184959

t. Example 2: In some experiments it was observed that the incidence of stem fly in black gram
was 6 percent. Suppose we examine 50 black gram plants in a field at random. What is
probability that at most 3 plants will be found to be affected by stem fly?

u. Solution: The probability that a plant is affected by stem fly is given as 0.06. The number of
plants observed (n = 50). Hence, = np = 3 which is less than 5 so we can use poison
distribution to calculate probability. The required probability is
v.
w.

x.

y.

z.

3]

P[X

= P[X = 0] + P[X = 1] + P[X = 2] + P[X = 3]

e x
x!

P[X = x] =

P[X = 0] =

P[X = 1] =

e 3 31
-3
1! = 3e

P[X = 2] =

e 3 3 2
2!

P[X = 3] =

e 3 33 27e 3

4.5e 3
3!
6

e 3 30
0!

= e-3

= 4.5e-3

aa.
P[X 3 ] = 13e-3
ab. From mathematical table it can found that e-3 = 0.0498.
ac. Therefore P[X
4.4.3

3] = 13 0.0498 = 0.6474. //

Normal Distribution

ad. The most important and widely used probability distribution in statistical inference is normal
distribution. It is also known as Gaussian distribution. Most of the distributions occurring in
practice, for instance, binomial, Poisson, etc., can be approximated by normal distribution.
ae. Further, many of the sampling distributions like Students t, F, & 2 distributions tend to
normality for large samples. Therefore, the normal distribution finds an important place in
statistical inference.
af. The normal distribution is used to represent the probability distribution of a continuous
random variable like life expectancies of some product, the volume of shipping container etc.
Its probability density function is expressed by the relation,

1
ag.

f(x)

2 e

1
2

ah. In the above formula, = a constant equaling 22/7.


ai.
e = Naperian base equaling 2.7182
aj.

population mean.

ak.

= population standard deviation.

al.
x= a given value of the r.v in the range am. Properties of Normal Distribution:
1.

It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum ordinate is at

x and is given by
f ( x)

an.
2.
3.
4.

5.

x .

1
2

It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
It is a continuous distribution.
It is a family of curves, i.e., every unique pair of mean and standard deviation defines a different normal
distribution. Thus, the normal distribution is completely described by two parameters: mean and standard
deviation.
Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean is 0.5.

f ( x)dx 1

6.
7.
8.

It is unimodal, i.e., values mound up only in the center of the curve.

Mean Median mod e

The probability that a random variable will have a value between any two points is equal to the area under the
curve between those points.
ao.
For a normal distribution the frequency curve will be symmetrical or bell shaped. However, not all
symmetrical curves are normal. The shape of the normal curve is completely determined by two parameters

& . For any given , there can be a number of normal curves each with a different . Likewise, for any
given , there can be a number of normal curves each with a different . In order to make such all
distributions readily comparable with each other, their individuality as expressed by their mean and standard
deviation has to be suppressed. This is done by transforming the normal variable into standard normal variable.

ap. The standard normal variable is denoted by Z and is given by Z =


standard normal variable is known as standard normal distribution. It is given by

1
aq. f(X) =

2 e

1 2
z
2

in the range -

ar. where E(Z)=0 and variance of Z =1


as.

Properties of the Standard Normal Distribution:

X
. The distribution of the

z .

at. The same as a normal distribution, but also...

Mean is zero

Variance is one

Standard deviation is one

Areas under the standard normal distribution curve have been tabulated in various ways. The most common
ones are the areas between

Given a normal distributed random variable X with

Mean

and s tan dard deviation

P (a X b) P (

Note:

Z 0 and a positive value of Z .

a X b

P (a X b) P (

a
b
Z
)

P ( a X b) P ( a X b ) P ( a X b ) P ( a X b )

0 and 1. Tables are readily available for different values of Z.


For standard normal distribution,
Because of the symmetrical nature of the normal distribution the tables are presented only for the positive values
of Z.

P (Z)
-

Area under curve is equal to one. But, area above or below z = 0 is 0.5.
Example:1 On a final examination in mathematics, the mean/average mark was 72 and the standard deviation
was 15.
I.
II.
III.

Determine the standard score of the student receiving the grades:


a.
60
b) 93
c) 72
Determine students grade who have standard score
a.
-1
b) 1.6
Find the probability that any student score be
a. Between 60 & 93. i.e. P[ 60< X < 93] , Where X is mark of student

b. Less than 70 i.e P [ X


c. Greater than 30 i.e P [X
-

Solution:
I.

a.

X X 60 72

0.8
S
15
93 72
1 .4
15

Z =

b. Z =
c. Z = 0
II.

a) X =
III.
-

P [-0.8

a) X =

A. P [60

70] (excercise)
30] (excercise)

+ ZS = 72

+ -1(15) = 57

+ ZS = 72 + 1.6(15)

X 93]

= P[

96

60 X X X 93 X

S
S
S ] = P[-0.8

Z 1.4 ] =

Z 0] P[0 Z 1.4] P[0 Z 0.8] P[0 Z 1.4] 0.2881 0.4192


= 0.7073

(This is from standard normal table).

Example:2 A normal distribution has mean 62.4.Find its standard deviation if 20.0% of the area under the
normal curve lies to the right of 72.9
Solution

72.9
72.9 62.4

) 0.2005 P( Z
) 0.2005

10.5
10.5
P( Z
) 0.2005 P (0 Z
) 0.50 0.2005 0.2995

10.5
And from table P (0 Z 0.84
0.84

12.5

P ( X 72.9) 0.2005 P (

From the table of normal curves it can be seen that 68.26% of the area lies within the range of
within the range of 2 , and 99.74% within the range of
normal distribution which is frequently used in statistical inference.
-

, 95.46%

. This is an important property of

Exercises

1. A random variable has a normal distribution with


.Find its mean if the probability that the
random variable will assume a value less than 52.5 is 0.6915.
2. Of a large group of men, 5% are less than 60 inches in height and 40% are between 60 & 65 inches.
Assuming a normal distribution, find the mean and standard deviation of heights
3. Suppose that an examination consists of six true and false questions, and assume that a student has no
knowledge of the subject matter. The probability that the student will guess the correct answer to the
first question is 30%. Likewise, the probability of guessing each of the remaining questions correctly is
also 30%.
i. What is the probability of getting more than three correct answers?
ii. What is the probability of getting at least two correct answers?
iii. What is the probability of getting at most three correct answers?
iv. What is the probability of getting less than five correct answers?
4. A computer shop sells both desktop and laptop online. Assume that 80% of the sells online are desktops
and 20% are laptops.
a. Find the probability that 3 of the next 4 online sells are laptops.
b. Find the probability that 2 of the next four online cells are desktops.
5. Suppose the number X, of a companys employees who are absent on Mondays has approximately a
poisson distribution. Furthermore, assume that the average number of Monday absentees is 2.6.
a. Find the mean and standard deviation of X.
b. Find the probability that fewer than two employees are absent on a given Monday.
c. Find the probability that more than five employees are absent on a given Monday.
d. Find the probability that exactly five employees are absent on a given Monday.
6. The number X of people who arrive at a cashiers counter in a bank during a specified period of time
often exhibits(approximately) a poisson probability distribution. Suppose that the mean number of
arrivals per minute for cashier service at a bank is one person per minute.
a. What is the probability that in a given minute the number of arrivals will equal 3 or more?
b. Can you tell the bank manager that the number of arrivals will rarerly exceed two per minute?

7. Assume that the length of time X, b/n charges of a cellular phone is normally distributed with a mean of
10hrs and a standard deviation of 1.5hrs. Find the probability that the cell phone will last
a. Between 8 and 12 hours
b. Below 8 hours
c. Below 9 hours
8. Suppose a paint manufacturer has a daily production, that is normally distribute with mean of 100,000
gallons and a standard deviation of 1,000 gallons. Management wants to creat an incentive bonus for the
production crew when the daily production exceeds in 90th percentile of the distribution, in hopes that
the crew will, in turn, become more productive. At what level of production should mamagment pay the
incentive bonus?

You might also like