You are on page 1of 9

JOINT PROBABILITY DISTRIBUTION

Joint Density function of Discrete Variables:


Recall that for discrete variables, density function represents individual probabilities. It can be
extended for joint distribution of two or more discrete variables. For two discrete variables, X and Y,
their simultaneous randomness can be completely captured by their joint density function f (.,.) which
represents nothing but the joint probabilities, i.e.

f ( x, y ) = P[( X = x ) and (Y = y )] for all possible values of x and y.

Properties of basic probability demand that such a density function must satisfy:

f ( x, y ) ≥ 0, for all x, y , and ∑∑ f ( x, y ) = 1.


x y
(1)

If this were not so, one would end up having probability of an event being a negative number or the
total probability would not be one.

Example 1: To illustrate, let us take X to be the no. of LCD televisions sold by an electronics store in a
day, while Y denotes the no. of local daily newspapers which carried an advertisement from the store on
that day. The following table provides a possible joint density function:

Example 1: Joint density between No. of LCD TV’s sold (X) and No. of newspapers carrying store AD (Y)

No. of LCD TV’s sold in a day (X) Marginal


0 1 2 3 4 density of Y
No. of local daily 0 0.25 0.1 0.05 0.02 0.01 0.43
newspapers carrying 1 0.05 0.1 0.15 0.06 0.04 0.4
store AD (Y) 2 0.01 0.02 0.03 0.06 0.05 0.17
Marginal density of X 0.31 0.22 0.23 0.14 0.1 1

Thus, there is a 10% chance that on a day no local daily carries the store advertisement and the store
sells 1 LCD television. The chance that at most 1 daily newspaper carries the store advertisement and
the store sells more than 2 LCD televisions can be computed as: 0.02+0.01+0.06+0.04=0.13.

Joint Density Function of Continuous Variables:


On the other hand, suppose X and Y are two continuous variables, whose randomness we want
to depict through the density function in two arguments, f (.,.) . For any such function f (.,.) to qualify
as the (joint) density function of X and Y, we must have

Prepared by Shubhabrata Das, IIMB


P[( X , Y ) ∈ A] = ∫∫ f ( x, y )dxdy, for any subset A of the plane.
A

In particular, if A is a rectangle of the form ( x1 , x2 ) × ( y1 , y2 ) , we get

x2 y2

P[( x1 < X < x2 ) and (y1 < Y < y2 )] = ∫ ∫ f ( x, y)dy dx, for any x
x1 y1
1 < x2 and y1 < y2 .

It is easy to see that, the analogous condition to (1)


(1), for density function of continuous random variables
is:

∞ ∞
f ( x, y ) ≥ 0, for all x, y, and ∫∫
−∞ −∞
f ( x, y ) dy dx = 1.

Example 2: Consider
sider an young individual w
who is still living
ving with her parents and has recently got a job;
the only money she spends is on (a) entertainment (eating out, movies etc ) (b) shopping. Let X and Y
denote the percentage of her salary she spends on the two above categories on a random month. She
is careful not to overspend, and hence X + Y ≤ 1. She believes that, on a random month, (X,Y) is
`equally likely’ to be anywhere on the triangle shown below --- i.e. an uniform distribution over this
triangle describes the random patter
pattern of percentages spent on the two categories.

Or, equivalently in the functional form, the joint density function is given by:

2 x ≥ 0, y ≥ 0, x+y ≤ 1
f(x,y) = 
0 otherwise

So, in particular, the chance she spends less than 25% of her salary on entertainment and she spends
less than 50% of her salary on shopping can be computed as:

0.25 0.5

∫∫
0 0
f ( x, y ) dy dx = 2 × 0.25 × 0.5 = 0.25.

Prepared by Shubhabrata Das, IIMB


Marginal Density function:
The density function of X, to be denoted by f X (.) can be obtained from the joint density
function by summing/integrating out all the possible values of Y; in other words,


f X ( x) = ∫
−∞
f ( x, y ) dy , (continuous case) (2a)

or, f X ( x) = ∑ f ( x, y ) (discrete case). (2b)


y

In the present context, such a density function is called the marginal density of X. Similarly, the marginal
density of Y is given by:


fY ( y ) = ∫ f ( x, y ) dx, or ∑ f ( x, y ) ,
−∞ x

depending on continuous case or discrete case. In the discrete case, as in Example 1 in the previous
page, the univariate densities (probabilities) are obtained as row/column total of probabilities
associated with the joint distribution; these are usually reported in the margins --- hence the name
marginal probability distributions (densities). At times, they are also called row marginals and column
marginals, as would be obvious and appropriate in a given context.

Conditional Density:
Conditional densities (distributions) are the other forms of densities which are of interest in a
bivariate context. Thus, for example, the conditional density of X given Y=y is denoted and defined as
follows:

f ( x, y )
f X |Y = y ( x | y ) = (3)
fY ( y )

When the variables are discrete, the above is consistent with the notions of conditional
probability. In the setting of Example 1, the conditional density of the number of LCD televisions sold by
the store on a random day when exactly 1 of the local daily newspaper is carrying an advertisement of
the store is given by:

No. of LCD TV’s sold in a day,


given 1 local daily newspaper carrying the store AD
values 0 1 2 3 4
density 0.125 0.25 0.375 0.15 0.1

In the context of Example 2, given that on a random month she spends 40% of her salary on
entertainment, percentage of her salary she spends on shopping would be Uniformly distributed on
(0,0.6).

Prepared by Shubhabrata Das, IIMB


Conditional densities have the usual properties of univariate densities. So you can compute the
expected value, variance (or any other characteristics like percentiles, IQR) and interpret them in usual
manner. Thus, in the context of Example 1, the expected number of LCD televisions sold on a day when
1 local newspaper is carrying an advertisement is given by

0*0.125+1*0.25+2*0.375+3*0.15+4*0.1=1.85.

In the context of Example 2, for example among other properties, the inter-quartile range of percentage
of her salary she spends on shopping, given that she spends 40% of her salary on entertainment can be
found as:

0.45 – 0.15 = 0.3 .

INDEPENDENT Random Variables:


If X and Y are independent random variables, then the conditional distribution of X given Y
should be the (unconditional /marginal) distribution of X, i.e. we must have:

f X |Y = y ( x | y ) = f X ( x ), for any x, y.
Hence, from (3), this would amount to:

f ( x, y ) = f X ( x ) × fY ( y ), for any x, y. (4)

Indeed, usually (4) is taken as the definition for X and Y to be independent of each other.

From now onwards, for the remaining part of this section, we will write only the representations
corresponding to the continuous case, with the obvious understanding that if the variables are discrete,
then summation will replace integrals in the expressions provided.

Function of (two) Random variables and its Expected Values:


Note that, as an extension to the univariate case, for any function of X and Y, say g(X,Y), which
will also be a random variable, we can compute its expected value as:

∞ ∞
E[ g ( X , Y )] = ∫ ∫ g ( x, y) f ( x, y )dy dx.
−∞ −∞
(5)

In the above, we have taken the possible values of X and Y to be between minus infinity and infinity. If
in a given case, the ranges are finite, the density function would be zero outside this range, and
accordingly, the integrals need to be computed over the finite range only.

Thus, taking g ( x, y ) = xy or g(x,y)=x + y for example, we have:

Prepared by Shubhabrata Das, IIMB


∞ ∞
E ( XY ) = ∫ ∫ xyf ( x, y)dy dx,
−∞ −∞
(6)

∞ ∞
E( X + Y ) = ∫ ∫ ( x + y ) f ( x, y)dy dx.
−∞ −∞
(7)

Of course, the mean and variance of X can be computed either from the marginal density of X or from
the joint density function of X and Y as could be convenient from case to case:

∞ ∞ ∞ ∞ ∞ ∞
E( X ) = ∫ ∫ xf ( x, y)dy dx = ∫ xf
−∞ −∞ −∞
X ( x) dx; E (Y ) = ∫ ∫ yf ( x, y )dy dx = ∫ yf
−∞ −∞ −∞
Y ( y ) dy.

Covariance and Correlation:


However, the main objective of studying the joint density would be to explore the dependency
between X and Y. This is captured through covariance or correlation. Intuitively, we say that two
variables are positively associative, if an increase (decrease) in one of the variables tends to be
associated with increase (decrease) in the values for the other variable. A negative association is
implied from the reverse picture.

Covariance between two random variables is defined as

Covariance(X,Y) = σ XY = E[( X − µ X )(Y − µY )], (8)

where µ X = E ( X ) and µY = E (Y ). Note that the similarity in the definition with variance of a random
variables. If the variables are positively associated higher (lower) than average values of X would
(generally) tend to be associated with higher (lower) than average values of Y, leading to σ XY being
positive. On the other hand, if the valuables are negatively associated, higher (lower) than average
values of X would tend to be associated with lower (higher) than average values of Y, more often than
not, resulting in σ XY being negative. Consequently, the covariance would be negative or positive,
depending on whether the assertion pattern is negative or positive. An alternative equivalent form of
(8) for the covariance coefficient is useful in its evaluation, quite often:

σ XY = E ( XY ) − µ X µY . (9)
Unfortunately, the magnitude of the covariance does not indicate the strength of dependency; it
depends on the values of individual variances. The Cauchy-Schwarz inequality, a celebrated result in
probability theory, guarantees that the absolute value of covariance coefficient will always be less than
the product of the standard deviations of the random variables. Thus, to get an idea about the strength
of dependency, one needs to consider the standardised form of covariance; this is known as correlation
coefficient:

Prepared by Shubhabrata Das, IIMB


σ XY
Correlation( X , Y ) = ρ XY = . (10)
σ X ×σY

By the Cauchy-Schwarz inequality, the correlation coefficient always lies between -1 and 1. If it is
exactly equal to 1 or -1, the variables are said to be perfectly correlated.

On the other hand, if the correlation coefficient is zero, we say that the variables are
uncorrelated. Let us now understand the similarity as well as difference between uncorrelated variables
and independent variance, If X and Y are independent of each other, you should be able to see from (4)
and (6) that:
E ( XY ) = E ( X ) × E (Y ) = µ X µY ⇒ σ XY = 0 ⇒ ρ XY = 0.

In other words, independent variables are necessarily uncorrelated. However, the converse is
NOT true. To see this, consider a random variable X which has an Uniform distribution on (-1,1). Now
consider Y to be nothing but X 2 . You should be able to see that:

E ( X ) = 0, and so is E ( XY ) = E ( X 3 ) ⇒ σ XY = 0 ⇒ ρ XY = 0.

or, X and Y are uncorrelated. However, obviously X and Y are very much dependent on each other (You
would know the exact value of Y if somebody told you the value of X). Often because of this, we say,
correlation is (only) a measure of linear dependency; this would become clearer when we explore the
regression models, later on.

Illustrating the Computation in Example 2.


First note that, the marginal density of X can be obtained using (2a):

 2(1 − x ) 0 < x < 1


f X ( x) = 
 0 otherwise
implying
1
1
µ X = E ( X ) = 2∫ x(1 − x)dx = ;
0
3
by symmetry (since the joint density is symmetric in the two arguments), this would also be equal to µY .
Similarly

1
1 1 1 1
E ( X 2 ) = 2∫ x 2 (1 − x)dx = = E (Y 2 ) ⇒ σ X2 = − 2 = = σ Y2 .
0
6 6 3 18

Now, 1 1− x
1 1 1 1 1
E ( XY ) = ∫ ∫ 2 xy dy dx = 12 ⇒ Cov( X , Y ) = σ XY = − × =− ,
0 0
12 3 3 36
leading to: σ XY −1
ρ XY = = 36 = −0.5 .
σ X ×σY 1
18
Prepared by Shubhabrata Das, IIMB
Can you intuitively see why the correlation had to be negative in this example?

Expected Value and Variance of Linear Combination of Random Variables:


For any constants (positive or negative) a and b, aX + bY is another random variable and it is easy
see [try to prove this from (5)]

E[aX + bY ] = a µ X + bµY (11)

You may note that (11) holds true irrespective of whether X and Y are independent or otherwise. On the
other hand, the variance is given by:

Var[ aX + bY ] = a 2σ X2 + b 2σ Y2 +2abσ XY (12)

Naturally, the last term in (12) vanishes when X and Y are independent or even uncorrelated random
variables.

Application to Portfolio Diversification and Asset Allocation:


Suppose an investor is planning to invest some amount (say, for convenience of notation, an
unit amount). He is looking at the prospect of investing into two stocks S1 and S2. Naturally, returns
from either stock are not know for certain, i.e. they are random variables. Let us denote these returns
by X 1 and X 2 respectively. While X 1 and X 2 are random variables, an investor should know and form an
opinion, possibly from analyzing past data, as well his domain knowledge, a reasonable extent about the
respective randomness, In particular, he should the expected returns from the 2 stocks, denoting them
by µ1 and µ2 . While mean or expected return from a stock is important, so is the standard deviation of
return. While investors would look to invest in stocks having high expected return, they would also want
the standard deviation of return to be low in order to reduce their risk. However, such stocks are rare, if
at all they exist. More of than not, most reasonable stocks will either have high-return & high risk (high
µ as well as high σ ) or low return and low risk (low µ as well as low σ ) . How does an investor
optimize his or her choice in that case?

Going back to the specific case, given above, the investor may want to split amount of
investment into w1 and w2 for investing respectively into S1 and S2: stocks which presumably have high
µ1 and µ2 . As remarked earlier, this would typically result, to the disappointment of him, in high σ 1 and
σ 2 , as well. The return from his investment portfolio will be the random variable w1 X 1 + w2 X 2 having
an expected return of

E ( w1 X 1 + w2 X 2 ) = w1µ1 + w2 µ2 , (13)

Prepared by Shubhabrata Das, IIMB


and the standard deviation of the return :

{Var (w1 X 1 + w2 X 2 )} = {w12σ 12 + w22σ 22 + 2 w1w2 ρσ 1σ 2 } ,


1/ 2 1/ 2
(14)

where ρ denotes the correlation between returns from S1 and S2. Thus, the investor would want to
maximize (13) and minimize (14), through his selection of weights w1 and w2 , as well as stocks, S1 and S2.
Of course, there are constraints on weights w1 and w2 being nonnegative and adding up to 1 (or pre-
specified amount). In an optimization course, you will learn how to achieve this optimization by trying to
maximize the expected return of the portfolio subject to the standard deviation of return of the
portfolio being below σ * , a pre-specified acceptable level of uncertainty for the investor; or by
minimizing the standard deviation of return of the portfolio subject to the expected return of the
portfolio being above µ * , a pre-specified acceptable level of return for the investor.

However, let us now discuss the direction of the possible solution from probabilistic perspective.
Observe that, other factors like individual variability of return remaining the same, (14) would be higher
if ρ , the correlation between return is positive and lower if ρ is negative. What does it imply for the
investor? He should be looking to split his investment into stocks which yield high expected return with
negative correlation between them. Generally return from stocks of the same sector yield positive
correlation while it may be possible to select high-yielding stocks from two different sectors. This is
indeed the basic principle behind portfolio diversification.

We have illustrated the above example with only 2 stocks. Obviously, the principle is valid more
generally, and can be easily extended for any number of stocks. Thus, for example, if a particular
investment portfolio has w1 , w2 ,… wk amounts being invested into stocks, S1, S2,…, , Sk, where the return
from stock , Si has an expected value µi and standard deviation of σ i , with covariance between returns
from Si and Sj being σ ij , (13) and (14) can be generalized to obtain the expected return and variance of
the portfolio as:

 k  k  k  k k k
E  ∑ wi X i  = ∑ wi µi , Var  ∑ wi X i  = ∑ wi2σ i2 + ∑ ∑ wi w j σ ij
 i =1  i =1  i =1  i =1 i =1 j =1
≠i

If you are familiar with matrix algebra, you may find it convenient to express the above as product of
vectors matrices. At any rate, to get a closer understanding of the portfolio diversification principle (and
the illustration of computation of expected value and variance of a linear combination of random
variables) you may wish to carry out the optimization (along the lines indicated in the last paragraph)
using the SOLVER routine in MS-EXCEL, for a bunch of stocks with mean, variance-covariance of returns
are possibly estimated from a past data.

Prepared by Shubhabrata Das, IIMB


Practice Exercises: (need not submit)

1. Calculate the correlation coefficient between the no. of LCD TV’s sold and the no. of newspapers
carrying store advertisement in Example 1.

2. Soon after their marriage, Haquib and Waheeda bought a flat on the Bannerghatta Road
between IIMB and Bannerghatta National Park. Both of them are software engineers, but their
offices are far apart. Hence they have to drive different vehicles to their respective office.
However, they made sure that they would leave the home exactly at 8:15 A.M. in the morning
and almost regularly they were back at home within an hour of each other. Waheeda receives a
‘L’ mark in her office if she reaches after 9A.M. while Haquib can avoid similar ‘L’ mark as long
as he reaches before 9:10 A.M. The time taken, in minutes, to reach the office from their home
by Haquib (H) and the same for Waheeda (W) are random variables having the following joint
probability density function (pdf):

 (h − 45)(50 − w)
 if 45 < h < 55, 40 < w < 50
5000

 (50 − w)
f H ,W (h, w) =  if 55 < h < 60, 40 < w < 50
 500
 0 otherwise

a) Find the correlation between H and W.


b) How much time on average is spent by Waheeda in daily commute during his office
days?
c) What is the standard deviation of total time spent by the couple in daily commute to
and fro respective office?

3. In the computer exercise assigned to you, apply the portfolio diversification principle to
determine your optimal portfolio.

Prepared by Shubhabrata Das, IIMB

You might also like