Chapter 2

Outline Joint r.v.
s Conditional distribution Conditional E() Conditional Variance Conditional Probability MGF Bayesian
Chapter 2
Conditional Probability and Distributions
References:
Chapters 6 and 7, Ross, S.M. A First Course in Probability. 7th
+
Ed., Pearson
Chapter 15 & 18, Klugman, S.A., Panjer, H.H., and Willmot, G.E. Loss Models from Data to Decisions. 4th Ed. Wiley
Chapters 2 & 3, Ross, S.M. Introduction to Probability Models. 8th+ Ed., Pearson
Chapters 4.1.8-4.1.10, Bean, M.A. Probability: The Science of Uncertainty. 9th Ed. Pearson.
AMA528 (By Dr. Catherine Liu) Ch2 Conditional Probability and Distributions September 17, 2014 2 / 29
Outline Joint r.v.s Conditional distribution Conditional E() Conditional Variance Conditional Probability MGF Bayesian
Outline
1
Joint distributions
1
Joint distribution
2
Marginal distribution
2
Conditional Distribution
3
Conditional Expectation
4
Conditional Variance
5
Conditional Probability
6
Joint MGF
7
Introduction to Baysian
Joint and marginal cdf of X and Y
F(a, b) = P {X a, Y b}, < a, b < .
F
X
(a) = P(X a) = P(X a, Y < ) = F(a, );
F
Y
(b) = P(Y b) = P(X < , Y b) = F(, b).
NB:
All joint probability statements about X and Y can be answered in

terms of their joint distribution function.
P{a
1
< X a
2
, b
1
< Y b
2
}
= F(a
2
, b
2
) + F(a
1
, b
1
) F(a
1
, b
2
) F(a
2
, b1), a
1
< a
2
, b
1
< b
2
Example 0
Example 1
Joint and marginal pmfs
Joint probability mass function of discrete r.v.s X and
Y: p(x, y) = P(X = x, Y = y).
Marginal probability mass function of X and Y:
p
X
(x) = P(X = x) =
y:p(x,y)>0
p(x, y)
p
Y
(y) = P(Y = y) =
x:p(x,y)>0
p(x, y)
Joint and marginal pdfs
Joint probability density function of continuous r.v. X and Y is a function f (x, y),
having the property that for all sets A and B of real numbers:
P(X A, Y B) =
A
f (x, y)dxdy.
Marginal probability mass function of X and Y:
f
X
(x) =
f (x, y)dy,
f
Y
(y) =
f (x, y)dx;
and
P(X A) = P{X A, Y (, )} =
A
f (x, y)dxdy =
A
f
X
(x)dx.
Example 2
Example 3
Let X represent the age of an insured automobile involved in an accident. Let Y represent the length of time the owner has
insured the automobile at the time of the accident.
X and Y have joint probability density function
f (x, y) =
1
64
(10 xy
2
) for 2 x 10 and 0 y 1
0 otherwise
Calculate the expected age of an insured automobile involved in an accident.
Independence
X independent of Y:
P{X A, Y B} = P{X A}P{Y B}
P{X a, Y b} = P{X a}P{Y b}
F(a, b) = F
X
(a)F
Y
(b)
p(x, y) = p
X
(x)p
Y
(y) for discrete X and Y.
f (x, y) = f
X
(x)f
Y
(y) for continuous X and Y.
NB:Independence is a symmetric relation
Conditional-jointly continuous
Conditional pdf of X given Y = y:
f
X|Y
(x|y) =
f (x, y)
f
Y
(y)
for all y such that f
Y
(y) > 0.
Conditional cdf of X given Y = y
F
X|Y
(a|y) = P(X a|Y = y) =
f
X|Y
(x|y)dx
and
P(X A|Y = y) =
A
f
X|Y
(x|y)dx
Example 4.1
Let X and Y be continuous random variables with joint density function
f (x, y) =
24xy , 0 < x < 1, 0 < y < 1 x

0 elsewhere
Calculate P(Y < X|X =
1
3
).
Conditional-jointly discrete
Conditional pmf of X given Y = y:
p
X|Y
(x|y) = P(X = x|Y = y) =
p(x, y)
p
Y
(y)
Conditional cdf of X given Y = y:
F
X|Y
(x|y) = P(X x|Y = y) =
ax
p
X|Y
(a|y)
for all y such that p
Y
(y) > 0.
Eg 4.2: If X and Y are independent Poisson random variables with respective parameters
1
and
2
, calculate the conditional
distributions of X given that X + Y = n.
Example 5
The joint density of N and X is given by
f
N,X
(n, x) =
x
n
e
2x
n!
, x 0; n = 0, 1, 2,
Determine the conditional probability density function for X|N = n.
Distributional form of the Law of Total Probability
If X
1
and X
2
are jointly continuous,
f
X
1
(x
1
) =
f
X
1
|X
2
=t
(x
1
)f
x
2
(t )dt ,
f
X
2
(x
2
) =
f
X
2
|X
1
=s
(x
2
)f
X
1
(s)ds.
F
X
1
(x
1
) =
F
X
1
|X
2
=t
(x
1
)f
x
2
(t )dt ,
F
X
2
(x
2
) =
F
X
2
|X
1
=s
(x
2
)f
X
1
(s)ds.
If X
1
is continuous and X
2
is discrete,
f
X
1
(x
1
) =
allx
2
f
X
1
|X
2
=x
2
(x
1
)p
x
2
(x
2
),
If X
1
is discrete and X
2
is continuous,
p
X
1
(x
1
) =
p
X
1
|X
2
=t
(x
1
)f
X
2
(t )dt .
Examples
Eg 6: An auto insurer is trying to develop a model for claim size in an attempt to better price its products. on the basis of historical
data, it has determined that the claim size distribution for policyholders classied as good risks has density
f
X
(x) = 2e
2x
, x > 0
and the claim size distribution for policyholders classied as bad risks has density
f
X
(x) =
1
3
e
x/3
, x > 0,
where claims are measured in thousands of dollars. There is a 30% chance that a given policyholder is a bad risk. What is the
probability that this policyholders claim exceeds $1,000?
Eg 7: An insurance company has just sold a group health insurance plan to a new employer. Experience with similar employers
suggests that the number of hospitalization claims per month can be modeled using the mass function
p(n) =
n
e
n!
, n = 0, 1, 2, ,
where is a parameter that describes the groups expected utilization. The insurer is uncertain what the true value of for this
group is, i.e. how many claims to expect, and decides to model this uncertainty using the density function
f
() = e
, > 0.
Determine the probability that the group has two or more hospitalization claims in the next month.
Distributional form of the Bayes Theorem
f
X
1
|X
2
=x
2
(x
1
) =
f
X
2
|X
1
=x
1
(x
2
)f
X
1
(x
1
)
f
X
2
(x
2
)
Eg 8: An insurance company has just sold a group health insurance plan to a new employer. Experience with similar employers
suggests that the number of hospitalization claims per month can be modeled using the mass function
p(n) =
n
e
n!
, n = 0, 1, 2, ,
where is a parameter that describes the groups expected utilization. The insurer is uncertain what the true value of for this
group is, i.e. how many claims to expect, and decides to model this uncertainty using the density function
f
() = e
, > 0.
Suppose that there is one hospitalization in the rst month. How does this information alter the insurers belief about the true
value of ?
Conditional expectations
Denition: The conditional expectation of u(X) given Y = y is
E[u(X)|Y = y] =
u(x)f
X|Y
(x|y)dx continuous x
=
x
u(x)f
X|Y
(x|y) discrete x
Schwarz inequality: [E(XY)]
2
E(X
2
)E(Y
2
), where the = holds iff
there exists a scalar c such that Y = cX.
Examples
Eg 9: Suppose a r.v. X is to be observed and, based on its value, one
must predict the value of a r.v. Y. Then among all predictors g(X)s,
show that the best possible predictor of Y in the sense that the mean
squares of the difference between it and Y is the smallest, is
g(X) = E(Y|X). Alternatively,
E[(Y E(Y|X))
2
] E[(Y g(X))
2
]
Eg 10: Suppose X
1
> 0 and X
2
> 0 are identically independent
distributed. Then E(X
1
/X
2
) 1, where = holds iff X
i
, i = 1, 2 is a
scalar.
Double expectation
E[X] = E[E(X|Y)]
E[X] =
y
E[X|Y = y]P(Y = y)(y discrete)
E[X] =
E[X|Y = y]f
Y
(y)dy(y continuous)
Eg 11: Suppose that a point X is chosen in accordance with a uniform distribution on the interval [0,1]. Also suppose that
after the value X = x has been observed (0 < x < 1), a point Y is chosen in accordance with a uniform distribution on
the interval [x, 1]. Determine the value of E(Y).
Eg 12: A coin, having the probability p of coming up heads, is to be successively ipped until the rst head appears.
What is the expected number of ips required?
Covariance
Cov(X, Y) = E[{X E(X)}{Y E(Y)}] = E(XY) E(X)E(Y).
Properties:
Cov(X, X) = Var(X);
Cov(X, Y) = Cov(Y, X);
Cov(cX, Y) = cCov(X, Y);
Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z).

Conditional variance
The conditional variance of X given that Y = y, is
dened by
Var (X|Y = y) = E[(X E[X|Y = y])
2
|Y = y]
Conditional variance formula:
Var (X) = E[Var (X|Y))] + Var (E[X|Y])
Eg 13: Suppose that by any time t the number of people that have
arrived at a train depot is a Poisson random variable with mean t . If the
initial train arrives at the depot at a time (independent of when the
passengers arrive) that is uniformly distributed over (0, T), what are the
mean and variance of the number of passengers who enter the train?
Computing probabilities by conditioning
: Dene the indicator r.v. X by
X =
1, If E occurs;
0, If E does not occur.
Then for any r.v. Y,
E(X) = P(E)
E[X|Y = y] = P(E|Y = y)
Therefore
P(E) =
y
P(E|Y = y)P(Y = y), if y is discrete
P(E|Y = y)f
Y
(y)dy, if y is continuous
NB: When Y is a discrete r.v. with sample sample {y
1
, , y
n
}. Dene the event F
i
= {Y = y
i
}. Then
P(E) =
n
i =1
P(E|F
i
)P(F
i
),
where F
1
, , F
n
are mutually exclusive.
Example 14
An insurance company supposes that the number of accidents that each of its
policyholders will have in a year is Poisson distributed, with the mean of the Poisson
depending on the policyholder. If the Poisson mean of a randomly chosen policyholder
has a gamma distribution with density function
g() = e
, 0
What is the probability that a randomly chosen policyholder has exactly n accidents
next year?
MGF of the sum of random number of r.v.s
Random sample: X
1
, X
2
, , i .i .d.X;
N: r.v., 0, N, NX;
The MGF of Y =
N
i =1
X
i
is, by double expectation,
M
Y
(t ) = E[{M
X
(t )}
N
].
Then
M
Y
(t ) = E[N{M
X
(t )}
N1
M
X
(t )].
Therefore,
E(Y) = E(N)E(X),
Var(Y) = E(N)Var(X) + Var(N)[E(X)]
2
.
Example 15: Let Y denote a uniform r.v. on (0,1), and suppose that, conditional on Y = p, the r.v. X has a binomial distribution
with parameters n and p. By using MGF, establish the result that X is equally likely to take on any of the values of 0, 1, , n.
Joint MGFs
For any n r.v.s X
1
, , X
n
, the joint MGF, M(t
1
, , t
n
), is dened, for all real
values of t
1
, , t
n
, by
M(t
1
, , t
n
) = E(e
t
1
X
1
++t
n
X
n
)
The individual MGFs can be obtained from M(t
1
, , t
n
) by letting all but one
of the t
j
s be 0, that is
M
X
i
(t ) = E(e
tX
i
) = M(0, , 0, t , 0, , 0),
where the t is in the i th place.
Example 16: The joint probability density of X and Y is given by
f (x, y) =
2
e
y
e
(xy)
2
/2
, < x < , 0 < y <
0, elsewhere.
Compute the joint moment generating function of X and Y and hence the individual moment generating functions as well.
Joint MGFs
Assumption: The risk level of each policyholder in the rating class may be
characterized by a risk parameter (possibly vector valued), but the value of
varies by policyholder.
Therefore there is a probability distribution with pf () of various values
across the rating class, say, if is a scalar parameter, the cdf () may be
interpreted as the proportion of policyholders in the rating calss with risk
parameter less than or equal to .
Risk parameter : In statistical terms, is a r.v. with distribution
() = P( ).
Example 17: There are two types of driver. Good drivers make up 75% of the population and in one year have zero claims with
probability 0.7, one claim with probability 0.2, and two claims with probability 0.1. Bad drivers make up the other 25% of the
population and have zero, one, or two claims with probabilities 0.5, 0.3 and 0.2, respectively. Describe this process and how it
relates to an unknown risk parameter.
Baysian statistics
The prior distribution is a probability distribution over the space of possible parameter values, denoted by () and representing
our opinion concerning the relative chances that various values of are the true value of the parameter.
The posterior distribution is the conditional probability distribution of the parameters given the observed data, denoted by
|x
(|x).
The model distribution is the probability distribution for the data as collected given a particular value for the parameter, denoted by
f
X|
(x|), where vector notation for x is used to remind us that all the data appear here.
The joint distribution has pdf
f
X
(x, ) = f
X|
(x|)().
The marginal distribution of x has pdf
f
X
(x) =
f
X|
(x|)()d.
Therefore the posterior distribution can be computed as
|x
(|x) =
f
X|
(x|)()
f
X|
(x|)()d
NB: In Bayesian analysis, it is customary to use pdf to represent discrete and mixed distributions in addition to those that are
continuous. In the formulas, integrals should be replaced by sums as appropriate.

Chapter 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2

Uploaded by

Copyright:

Available Formats

Outline Joint r.v.

All joint probability statements about X and Y can be answered in

24xy , 0 < x < 1, 0 < y < 1 x

Cov(X, Y) = Cov(Y, X);

Cov(cX, Y) = cCov(X, Y);

Cov(X, Y + Z) = Cov(X, Y) + Cov(X, Z).

You might also like