Probablity

Statistics 211 [Student]
' $
1 Joint Probability Distributions
Often we have two or more variables of interest when studying a

relationship/phenomena.
• Success in college based on high school performance.

Data: GPA, Class rank, SAT, etc.
• Wheat yield on an acre of farm land.

Data: Rain, fertilizer, etc.
• Gas efficiency of a car (MPG)

Data: Weight, cylinders, aero-dynamics, etc.
To deal with this type of data we need a PDF for more than one variable.
1.1 Discrete Joint PMF for Two Variables
In the discrete case, a PMF for two variables:
p(x, y) = P (X = x, Y = y)
where
1. p(x, y) ≥ 0 for all x, y .

P P
2. x y p(x, y, ) = 1.
PP
3. P [(x, y) ∈ A] = A p(x, y) any region A in the xy plane.
& %
Chapter 5: Joint Probability Distributions Copyright °

c 1998-2004 by Henrik Schmiediche Slide 1
' $
Two tire-quality experts examine stacks of tires and assign quality ratings
to each tire on a 3-point scale. Let X denote the grade given by expert
A and let Y denote the grade given by B . The following table gives the
joint distribution for X for Y .
Y 1 2 3
1 0.1 0.05 0.02
X 2 0.1 0.35 0.05
3 0.03 0.1 0.2
• P (X = 1, Y = 2) =
• P (X ≥ 2, Y ≥ 3) =
The marginal probability mass function of X and Y , denoted by
pX (x) and pY (y), respectively are given by:
X X
pX (x) = p(x, y) pY (y) = p(x, y)
Y X
X 1 2 3 Y 1 2 3
pX (x) pY (y)
& %

' $
1.2 Continuous Joint PDF for Two Variables
For two continuous RANDOM variables X, Y , the joint PDF has the
following properties.
1. f (x, y) ≥ 0 all (x, y).

R∞ R∞
2. −∞ −∞ f (x, y)dxdy = 1
RR
3. P [(x, y) ∈ A] = A
f (x, y)dxdy
For 3 above, if the region is rectangular:

Z d Z b
P (a ≤ X ≤ b, c ≤ Y ≤ d) = f (x, y)dxdy
c a
Example: Consider the joint PDF:
x(1 + 3y 2 )
f (x, y) = 0 < x < 2, 0 < y < 1
4
Show that conditions 1 holds:
& %

' $
Show that condition 2 holds:
R∞ R∞
−∞ −∞
f (x, y)dxdy =
& %

' $
Show that P (0 < X < 1, 14 < Y < 12 ) = 23

512
& %

' $
The marginal probability density function of X and Y , denoted by
fX (x) and fY (y), respectively are given by:
Z ∞ Z ∞
fX (x) = f (x, y)dy fY (y) = f (x, y)dx
−∞ −∞
Example: Find the marginal PDF fX (x) and fY (y):
& %

' $
1.3 Independence of X and Y
Two RV X, Y are independent if for every pair of X, Y :
• PMF: p(x, y) = pX (x) · pY (y)

• PDF: f (x, y) = fX (x) · fY (y)
otherwise they are dependent.
Example: Show that the “tire” PMF is not independent:
Y 1 2 3
1 0.1 0.05 0.02 0.17
X 2 0.1 0.35 0.05 0.5
3 0.03 0.1 0.2 0.33
0.23 0.5 0.27
Example: Show that fX (x) and fY (y) are independent:
f (x, y) = fX (x)fY (y) ?
& %

' $
1.4 Conditional Distributions
Often we are interested in questions such as:
Given that a student scored > 1200 on the SAT, what is the
probability the student will have a college GPA ≥ B ?
These “conditional questions” can be answered by studying the

conditional probability distribution.
Let X and Y be two continuous RV’s with joint pdf f (x, y) and marginal
X pdf fX (x). Then for any X value x for which fX (x) > 0, the
conditional probability density function of Y given that X = x is:
f (x, y)
fY |X (y|x) = −∞<y <∞
fX (x)
For the discrete case:
p(x, y)
pY |X (y|x) =
pX (x)
& %

' $
Example: What is the probability that tire expert A (rv X ) will assign a
grade of 2 given that tire expert B (rv Y ) assigned a grade of
1?
Y 1 2 3
1 0.1 0.05 0.02 0.17
X 2 0.1 0.35 0.05 0.5
3 0.03 0.1 0.2 0.33
0.23 0.5 0.27
& %

' $
Example: For the joint PDF:
x(1 + 3y 2 )
f (x, y) = 0 < x < 2, 0 < y < 1
4
what is P (Y ≥ 21 |X = 1)?
& %

' $
1.5 Joint PDF’s of n Random Variables
The concept of joint PDF for 2 RV’s can be extended that that of a joint
PDF of n RV’s: Joint PMF:
P (x1 , x2 , . . . , xn ) = P (X1 = x1 , X2 = x2 , . . . , Xn = xn )
Joint PDF (rectangular region):
P (a1 ≤ X1 ≤ b1 , a2 ≤ X2 ≤ b2 , . . . , an ≤ Xn ≤ bn ) =
Z b1 Z b2 Z bn
... f (x1 , x2 , . . . , xn )dxn . . . dx2 dx1
a1 a2 an
An example of a joint PMF is the multinomial distribution:
n!
p(x1 , x2 , . . . , xr ) = px1 1 px2 2 . . . pxr r
x1 !x2 ! . . . xr !
xi = 0, 1, 2, . . . x1 + x2 + . . . xr = r
where
n = Number of trials.
r = Number of possible outcomes.
pi = P (Outcome i on any particular trial).
xi = Number of trials resulting in outcome i.
& %

' $
2 Expected Values
The expected value of two jointly distributed variables X and Y is:
PMF:
XX
E[h(X, Y )] = h(x, y) · p(x, y)
x y
PDF: Z Z
∞ ∞
E[h(X, Y )] = h(x, y) · f (x, y)dxdy
−∞ −∞
Example: What is average tire quality rating?
& %

' $
2.1 Covariance
Often the RV’s X, Y are related to each other (ie. they are not
independent) and we want to know something about the strength of the
linear relationship between X and Y .
The Covariance of X and Y is defined by
Cov(X, Y ) = E[(X − µx )(Y − µy )]

= E[XY ] − µx µy
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1. Positive values of Cov(X, Y ) indicate a positive

relationship between X and Y .
2. Negative values of Cov(X, Y ) indicate a negative
relationship between X and Y .
3. The larger the number (positive or negative) the stronger the
relationship.
& %

' $
Example: Calculate the covariance between the two tire experts:
Y 1 2 3
1 0.1 0.05 0.02 0.17
X 2 0.1 0.35 0.05 0.5
3 0.03 0.1 0.2 0.33
0.23 0.5 0.27
& %

' $
2.2 Correlation
How “strong” is a covariance of 0.23? The problem with the covariance

is that the scale depends on the units of measurement. In other words, a
covariance of 0.23 may indicate a very strong positive linear relationship
or a very weak positive linear relationship between X and Y —the
answer depends on the scale of the original data.
We can scale the covariance so that the resulting quantity—called the

correlation—is independent of the scale and units of the original data.
Cov(X, Y )
ρx,y = Corr(X, Y ) = − 1 ≤ ρx,y ≤ 1
σX σY
Properties:
1. Corr(aX + b, cY + d) = Corr(X, Y ).
2. If ρx,y = 1 or ρx,y = −1 then Y = aX + b for a 6= 0.
3. If X, Y are independent, then ρx,y = 0. However,
ρx,y = 0 does not imply independence:
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
& %

' $
Example: Calculate the correlation between the two tire experts:
Y 1 2 3
1 0.1 0.05 0.02 0.17
X 2 0.1 0.35 0.05 0.5
3 0.03 0.1 0.2 0.33
0.23 0.5 0.27
Cov(X, Y ) = 0.23 µx = 2.16 µy = 2.04
& %

' $
2.3 Correlation and Causation
Just because two variables are highly correlated does not mean that one
“causes the other”.
Correlation does not imply causation.
Example: “Pet a day keeps doctor away.” This headline appeared in the
August 2, 1990 issue of the Santa Barbara News-Press. It was
referring to a study of Medicare enrollees in a HMO.
Participants were followed for 1 year and the frequency of
doctor contacts noted. Those who owned pets had contact
with doctors an average of 8.42 times while those without pets
has an average of 9.49 contacts. The study’s conclusion is
that pet ownership has a moderating role in helping elderly
through times of stress.
Example: In October 1994, The New York Times ran a front-page article
indicating that “men from traditional families, in which the
wives stay at home to care for children, earn more and get
higher raises than men from two-career families.” The
statement is based on studies of salary and histories of male
managers. Does this study suggest cause and effect: To make
more money, have a traditional family?”.
Example: In a German town it was found that the number of births and
stork sighting where positively correlated.
& %

' $
If we have correlation it could be any of the following:
1. X causes Y: Rain (X) causes vegetation to grow (Y).

2. Y causes X: “Men from traditional families, in which the wives
stay at home to care for children, earn more and get higher
raises than men from two-career families.” More likely is that
the higher salaries (Y) permit wives to stay at home (X).
3. X and Y cause each other: Headline: Increased gun sales
result in higher number of violent crimes. Could be be that
violent crime (Y) and increased gun sales (X) cause each
other?
4. X and Y could be caused by a third (lurking) variable Z:
The number of cavities (X) and vocabulary (Y) in young
children ages 5 to 12 are correlated. Lurking variable: Age
(Z). That is, age is correlated with cavities and vocabulary,
but cavities and vocabulary are not causally related to each
other. Another example: “Pet a day keeps doctor away.”
5. X and Y are related by chance: Stork sightings (X) and
births (Y). In a case like this, you need a plausible
explanation to suspect cause and effect. No such plausible
explanation exists.
6. Unrepresentative samples: Sometimes the data you have
shows correlation simply because the data you have is
unrepresentative of the population.
& %

' $
2.4 Establishing Cause and Effect
To establish cause and effect one must have a careful, controlled,

scientific experiment. Traditionally this was the only way scientists have
permitted cause and effect conclusions.
However, more recently, many scientists have come to believe that in the
absence of such experimentations a good case can be made for cause
and effect if:
1. The association is found in many valid studies and a variety

of conditions among different populations.
2. The association persists when possible lurking variables are
accounted for.
3. There is a plausible explanation for cause and effect.
& %

' $
3 Statistics and Their Distributions
A statistic is any quantity that may be calculated from sample data. As

such, a statistic is a random variable which has a distribution.
Example: In this example we expand a little on the example that is in

your book. We consider a service center that charges $40,
$45, $50, and $55 for tuning up 4-cylinder, 6-cylinder,
8-cylinder, and rush (any size engine) jobs. The probability
distribution of revenue for a single job is given by
x 40 45 50 55
p(x) .1 .2 .3 .4
0.4
0.3
0.2
0.1
0.0
40 45 50 55
The mean and variance are given by:
E[X] = 40(.1) + 45(.2) + 50(.3) + 55(.4) =
E[X 2 ] = 402 (.1) + 452 (.2) + 502 (.3) + 552 (.4) =

V [X] = E[X 2 ] − (E[X])2 =
& %

' $
Continuing with our tune-up example, suppose that we sample two
customers and take the average of what they are charged. The
outcomes for that statistic are
x1 x2 x p(x)
40 40 40 (.1)(.1) = .01
40 45 42.5 (.1)(.2) = .02
40 50 45 (.1)(.3) = .03
40 55 47.5 (.1)(.4) = .04
45 40 42.5 (.2)(.1) = .02
45 45 45 (.2)(.2) = .04
45 50 47.5 (.2)(.3) = .06
45 55 50 (.2)(.4) = .08
50 40 45 (.3)(.1) = .03
50 45 47.5 (.3)(.2) = .06
50 50 50 (.3)(.3) = .09
50 55 52.5 (.3)(.4) = .12
55 40 47.5 (.4)(.1) = .04
55 45 50 (.4)(.2) = .08
55 50 52.5 (.4)(.3) = .12
55 55 55 (.4)(.4) = .16
So, the table above represents every possible outcome of averaging two
bills.
& %

' $
What is the distribution of the statistic?
x 40 42.5 45 47.5 50 52.5 55
p(x) .01 .04 .10 .20 .25 .24 .16

0.25
0.20
0.15
0.10
0.05
0.0
40.0 42.5 45.0 47.5 50.0 52.5 55.0
From this distribution, what is the expected value?
E[X] = 40(.01) + 42.5(.04) + 45(.10) + 47.5(.20)+
50(.25) + 52.5(.24) + 55(.16) =
What is the variance?
E[X 2 ] = 402 (.01) + 42.52 (.04) + . . . + 552 (.16) =
V [X] = E[X 2 ] − E[X]2 =
& %

' $
Suppose that we sample four customers and take the average of what
they are charged. The (partial) outcomes for that statistic are
x1 x2 x3 x4 x p(x) x1 x2 x3 x4 x p(x)
40 40 40 40 40 .0001 40 45 40 40 41.25 .0002
40 40 40 45 41.25 .0002 40 45 40 45 42.5 .0004
40 40 40 50 42.5 .0003 40 45 40 50 43.75 .0006
40 40 40 55 43.75 .0004 40 45 40 55 45 .0008
40 40 45 40 41.25 .0002 40 45 45 40 42.5 .0004
40 40 45 45 42.5 .0004 40 45 45 45 43.75 .0008
40 40 45 50 43.75 .0006 40 45 45 50 45 .0012
40 40 45 55 45 .0008 40 45 45 55 46.25 .0016
40 40 50 40 42.5 .0003 40 45 50 40 43.75 .0006
40 40 50 45 43.75 .0006 40 45 50 45 45 .0012
40 40 50 50 45 .0009 40 45 50 50 46.25 .0018
40 40 50 55 46.25 .0012 40 45 50 55 47.5 .0024
40 40 55 40 43.75 .0004 40 45 55 40 45 .0008
40 40 55 45 45 .0008 40 45 55 45 46.25 .0016
40 40 55 50 46.25 .0012 40 45 55 50 47.5 .0024
40 40 55 55 47.5 .0016 40 45 55 55 48.75 .0032
x1 x2 x3 x4 x p(x) x1 x2 x3 x4 x p(x)
55 50 40 40 46.25 .0012 55 55 40 40 47.5 .0016
55 50 40 45 47.5 .0024 55 55 40 45 48.75 .0032
55 50 40 50 48.75 .0036 55 55 40 50 50 .0048
55 50 40 55 50 .0048 55 55 40 55 51.25 .0064
55 50 45 40 47.5 .0024 55 55 45 40 48.75 .0032
55 50 45 45 48.75 .0048 55 55 45 45 50 .0064
55 50 45 50 50 .0072 55 55 45 50 51.25 .0096
55 50 45 55 51.25 .0096 55 55 45 55 52.5 .0128
55 50 50 40 48.75 .0036 55 55 50 40 50 .0048
55 50 50 45 50 .0072 55 55 50 45 51.25 .0096
55 50 50 50 51.25 .0108 55 55 50 50 52.5 .0144
55 50 50 55 52.5 .0144 55 55 50 55 53.75 .0192
55 50 55 40 50 .0048 55 55 55 40 51.25 .0064
55 50 55 45 51.25 .0096 55 55 55 45 52.5 .0128
55 50 55 50 52.5 .0104 55 55 55 50 53.75 .0192
55 50 55 55 53.75 .0192 55 55 55 55 55 .0256
& %

' $
What is the distribution of the statistic?
x p(x)
40.00 .0001
41.25 .0008
42.50 .0036
43.75 .0120
45.00 .0310
46.25 .0648
47.50 .1124
48.75 .1608
50.00 .1905
51.25 .1840
52.50 .1376
53.75 .0768
55.00 .0256
& %

' $
0.15
0.10
0.05
0.0
40.00 42.50 45.00 47.50 50.00 52.50 55.00
From this distribution, what is the expected value?
E[X] = 0(.0001) + 41.25(.0008) + . . . + 55(.0256) =
What is the variance?
E[X 2 ] = 402 (.0001) + 41.252 (.0008) + . . . + 552 (.0256) =
V [X] = E[X 2 ] − E[X]2 =
& %

' $
If we sample 10 customers (over 1 million different possible samples!)
and take the average of what they are charged the distribution of the
mean would be:
0.12
0.10
0.08
0.06
0.04
0.02
0.0
40.0 41.5 43.0 44.5 46.0 47.5 49.0 50.5 52.0 53.5 55.0
From this distribution, the expected value and variance are:
E[X] = 50, V [X] = 2.5 = 25/10
& %

' $
4 Central Limit Theorem

The “tune-up” example not only shows how a statistic has a distribution,
but examining the histogram we notice a trend: while the original
distribution consisted of four stair-steps, the distribution of the mean of 2
observations has a curved look to it and the distribution of the mean of 4
observation has an almost normal or bell shaped look. The distribution
of the mean of 10 observation has a very bell shaped look to it!
0.25
0.4
0.20
0.3
0.15
0.2
0.10
0.1
0.05
0.0
0.0
40 45 50 55 40.0 42.5 45.0 47.5 50.0 52.5 55.0

0.15
0.15
0.10
0.10
0.05
0.05
0.0
0.0
40.00 42.50 45.00 47.50 50.00 52.50 55.00 40.0000 42.5000 45.0000 47.5000 50.0000 52.5000 55.0000
0.14
0.12
0.12
0.10
0.10
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0.0
0.0
40.000 41.875 43.750 45.625 47.500 49.375 51.250 53.125 55.000 40.0 41.5 43.0 44.5 46.0 47.5 49.0 50.5 52.0 53.5 55.0
& %

' $
This pattern is not a coincidence. It turns out that the average of most
distributions (if we have enough observations) will always be a normal
distribution. This is known as the
Central Limit Theorem
Remember, the mean and variance of our 10 observation sample are:
E[X] = 50, V [X] = 2.5 = 25/10
If we superimpose a X ∼ N (50, 25/10) curve on top of our sampling

distribution we can see how well the normal curve fits our distribution:
0.12
0.10
0.08
0.06
0.04
0.02
0.0
40.0 41.5 43.0 44.5 46.0 47.5 49.0 50.5 52.0 53.5 55.0
& %

' $
Notice how the mean and variance of the distributions changed:
• Original distribution: E[X] = V [X] =

• Mean of two observations: E[X] = V [X] =
• Mean of four observations: E[X] = V [X] =
• Mean of ten observations: E[X] = V [X] =
The mean of our new statistic stayed the same, while the variance of the
new statistics decreased as our sample size increased!
This follows from:

h1 X i h1 X i
E[X̄] = E Xi = V [X̄] = V Xi =
n n
The Central Limit Theorem (CLT) states that as long as
1. our random variables Xi have a distribution with whose

mean and variance exist,
2. X1 , X2 , . . . , Xn are a random sample and
3. our sample is large,
then
σ2 ´ ³
X̄ ∼ N µ,
n
How large is “large”? The answer depends on the underlying distribution.
A good rule of thumb is that if n ≥ 30 the CLT generally applies.
& %

' $
Example: Breaking strength of a rivet has µ = 10000psi and
σ = 500psi. What is the probability that the sample mean
breaking strength for a random sample of 40 rivets is between
9900 and 10200?
& %

' $
5 Distributions of Linear Combinations
If we have n RV’s X1 , . . . , Xn have some distribution with
• mean: µ1 , . . . , µn
• variance: σ12 , . . . , σn2
then:
1. E[a1 X1 + . . . + an Xn ] = a1 E[X1 ] + . . . + an E[Xn ]
Pn Pn
2. V [a1 X1 + . . . + an Xn ] = i=1 j=1
ai aj Cov(Xi , Xj )
Note that the RV’s Xi do not have to be independent for the above to
hold. However, if they are mutually independent, then:
V [a1 X1 + . . . + an Xn ] = a21 V [X1 ] + . . . + a2n V [Xn ]
This follows from the fact that, if Xi and Xj are independent, then:
• Cov(Xi , Xj ) = 0 when i 6= j and
• Cov(Xi , Xi ) = V (Xi ).
& %

' $
Example: If RV X1 has µ = 10 and σ 2 = 16 and RV X2 has µ = 5

and σ 2
= 9 then what is the mean and variance of
Y = X1 − X2 (assume X1 and X2 are uncorrelated)?
If X1 , . . . , Xn are independent and normal (though they may have

different µi and σi for each Xi ), then any linear combination of Xi ’s
also has a normal distribution.
Example: If X1 ∼ N (µ = 10, σ 2 = 16) and X2 ∼ N (µ = 5, σ 2 = 9)

then what is the distribution of Y = 3X1 − 2X2 ?
& %


Probablity

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probablity

Uploaded by

Copyright:

Available Formats

Statistics 211 [Student]

1 Joint Probability Distributions

Often we have two or more variables of interest when studying a

• Success in college based on high school performance.

• Wheat yield on an acre of farm land.

• Gas efficiency of a car (MPG)

1.1 Discrete Joint PMF for Two Variables

In the discrete case, a PMF for two variables:

1. p(x, y) ≥ 0 for all x, y .

Chapter 5: Joint Probability Distributions Copyright °

1 0.1 0.05 0.02

X 2 0.1 0.35 0.05

3 0.03 0.1 0.2

Chapter 5: Joint Probability Distributions Copyright °

1. f (x, y) ≥ 0 all (x, y).

For 3 above, if the region is rectangular:

Example: Consider the joint PDF:

Chapter 5: Joint Probability Distributions Copyright °

Chapter 5: Joint Probability Distributions Copyright °

Show that P (0 < X < 1, 14 < Y < 12 ) = 23

Chapter 5: Joint Probability Distributions Copyright °

Example: Find the marginal PDF fX (x) and fY (y):

Chapter 5: Joint Probability Distributions Copyright °

1.3 Independence of X and Y

Two RV X, Y are independent if for every pair of X, Y :

• PMF: p(x, y) = pX (x) · pY (y)

Example: Show that the “tire” PMF is not independent:

1 0.1 0.05 0.02 0.17

X 2 0.1 0.35 0.05 0.5

3 0.03 0.1 0.2 0.33

0.23 0.5 0.27

Example: Show that fX (x) and fY (y) are independent:

f (x, y) = fX (x)fY (y) ?

Chapter 5: Joint Probability Distributions Copyright °

Often we are interested in questions such as:

These “conditional questions” can be answered by studying the

Chapter 5: Joint Probability Distributions Copyright °

1 0.1 0.05 0.02 0.17

X 2 0.1 0.35 0.05 0.5

3 0.03 0.1 0.2 0.33

0.23 0.5 0.27

Chapter 5: Joint Probability Distributions Copyright °

Chapter 5: Joint Probability Distributions Copyright °

Joint PDF (rectangular region):

An example of a joint PMF is the multinomial distribution:

r = Number of possible outcomes.

pi = P (Outcome i on any particular trial).

xi = Number of trials resulting in outcome i.

Chapter 5: Joint Probability Distributions Copyright °

The expected value of two jointly distributed variables X and Y is:

Example: What is average tire quality rating?

Chapter 5: Joint Probability Distributions Copyright °

The Covariance of X and Y is defined by

Cov(X, Y ) = E[(X − µx )(Y − µy )]

1. Positive values of Cov(X, Y ) indicate a positive

Chapter 5: Joint Probability Distributions Copyright °

1 0.1 0.05 0.02 0.17

X 2 0.1 0.35 0.05 0.5

3 0.03 0.1 0.2 0.33

0.23 0.5 0.27

Chapter 5: Joint Probability Distributions Copyright °

How “strong” is a covariance of 0.23? The problem with the covariance