Professional Documents
Culture Documents
AMS/ECON 11B
This supplement covers four topics. In x1, I brie
y review the properties of the binomial
distribution, which is covered in the textbook in section 9.2. In x2, I introduce the hypergeometric distribution, which is a close relative of the binomial distribution. In x3, I introduce
Tchebychev's inequality, which provides a simple estimate for the probability of observing a
value of a random variable that is far from its expected value. The last topic is the normal
approximation to the binomial distribution, which I discuss in x4, (and which is covered in
the textbook in sections 16.2 and 16.3). This is an historically important special case of the
Central Limit Theorem, which is one of the most important results in mathematics.
1.
Recall1 that a random variable X follows the binomial distribution with parameters n
and p, if and only if the probability distribution of X is given by
P (X = k) = n Ck pk (1 p)nk ; for k = 0; 1; 2; : : : ; n;
(1.1)
where n is a positive integer and 0 < p < 1. It is standard to write X $B (n; p) in this case.
Binomial random variables arise in situations where an experiment with two outcomes
is repeated2 a certain number of times. Suppose that the two outcomes are A and B , and
the probability of outcome A is equal to p, and further suppose that the experiment is
repeated n times in such a way that the outcomes are all independent of each other. If X
is the number of times that outcome A is observed in these n repetitions, then X $B (n; p),
as was explained in class and in the textbook.
For example, if you toss a fair coin 60 times, and H = the number of heads you observe,
then H $B (60; 1=2). The two outcomes in this case are heads and tails. But the binomial
distribution is not limited to coin tosses. If you roll a fair, six-sided die there are six basic
outcomes, but we can still describe the die-roll as a Bernoulli trial with two outcomes by
grouping the outcomes into two groups. E.g., if X = the number of 5's that are observed
in 600 rolls of a fair die, then X $B (600; 1=6), because the two outcomes we are observing
are A = 5 and B = (not 5).
In the book, it is stated without proof that if X $B (n; p) then
E (X ) = np
and
I will show you the proof that E (X ) = np below. The proof that Var(X ) = np(1 p) is
similar but slightly more technical, so I will leave it out. `Knowing' the proof below is not
a requirement of the course, so you can skip it if you want. But if you like to challenge
yourself, then read on. I have left out a few small details that you will need to ll in to
completely understand the proof.
1
See
2
Trials.
1
E (X ) =
k=0
k P (X = k ) =
k=0
k( n Ck )pk (1 p)nk :
The rst thing that we do, is drop the rst term in the sum, since when k = 0 the term
0 [ n C0 p0 (1 p)n ] = 0, so
E (X ) =
k=1
k( n Ck )pk (1 p)nk :
(1.2)
Next we manipulate the expression k( n Ck ), to put it into a more convenient form, namely
we show that k( n Ck ) = n( n1 Ck1 ):
n!
n!
=
k( n Ck ) = k
k!(n k)!
(k 1)!(n k)!
(n 1)!
= n
= n( n1 Ck1 ):
(k 1)!(n k)!
Using this identity in the second expression for E (X ), Eq.(1.2), we get
3
E (X ) =
k=1
2
n
k=1
Now, we can also pull a factor of p out of the last sum on the right to obtain
E (X ) = n
2
n
k=1
( n1 Ck1
)pk (1
p)nk
= np
k=1
( n1 Ck1
)pk1 (1
p)nk
k=1
k1 (1 p)nk =
( n1 Ck1 )p
( n1 Cj )pj (1 p)(n1)j :
j =0
k=1
j (1 p)(n1)j =
( n1 Cj )p
P (Y = j ) = 1;
j =0
j =0
This is one of the places where you have to ll in some small technical details.
ih
Example 1
An urn contains 20 red marbles and 30 blue marbles. If 8 marbles are selected from the
urn with replacement, then what is the probability that no more than 3 red marbles are
selected?
Since the marbles are sampled with replacement, the probability that a selected marble
is red is 2 , and if X = the number of red marbles observed in the sample of 8, then
5
X $B (8; 0:4). Then
P (0
3) =
k=0
Ck (0:4)k (0:6)8k
P (X = k ) =
R Ck
M R Cnk ;
M Cn
for k = 0; 1; 2; : : : ; n:
(2.1)
E (X ) = n
R
M
and
Var(X ) = n
R M R M n
M M 1:
M
(2.2)
R
:
M
So, if Y is the number of red marbles observed, when the marbles are drawn with replace R
ment, then Y $B n; M .
In other words, if X is the number of red marbles selected from the urn without replacement and Y is the number of red marbles selected with replacement, then X $HG(M; R; n)
R
and Y $ B n; M . But the similarity doesn't end there. The expected number of red
marbles in both cases is identical, as you can check from the properties of the binomial and
hypergeometric distributions, and the variances are also closely related. Specically,
P (red marble is selected) =
Var(Y ) = n
R M R
R
The variance of the hypergeometric distribution is smaller than the variance of the binomial distribution, (assuming the same distribution of marbles in the urn), because of the
additional factor of M n in the formula for the variance of the hypergeometric distribution.
M 1
In fact, if n is very small compared to M , then the two distributions are almost identical.
But even when this is not the case, the two distributions can give similar results, as the
example below illustrates.
Example 2
Returning to the urn in example 1, and suppose now that 8 marbles are randomly selected
from the urn without replacement. What is the probability that no more than 3 red marbles
are observed?
If Y = the number of red marbles that are observed, then Y $HG(50; 20; 8), and the
probability that no more than 3 red marbles are observed is
P (0
3) =
C0 30 C8
+
50 C8
0:599470691
20
20
C1 30 C7
+
50 C8
20
C2 30 C6
+
50 C8
20
C3 30 C5
50 C8
|
4
3.
Tchebychev's inequality
Recall that if X is a nite random variable with range fx1 ; x2 ; : : : ; xn g, then the expected value of X is dened by
E (X ) =
k=1
xk P (X = xk ):
The expected value is also denoted by X , (or simply , when there is no chance of confusion), and called the mean of X . Also, recall that the variance of X is dened by
Var(X ) =
k=1
(xk X )2 P (X = xk ):
2
The variance is also denoted by 2 , or X . It is the expected square deviation from the mean.
If the variance is relatively small, then the observed values of X are more likely to be close
to the mean, if the variance is large, then it is more likely to observe values of X that are
far from the mean. The 19th century Russian mathematician P.L. Tchebychev gave this
intuition a more precise form with a simple estimate for the probability of observing values
of X that are far from the mean.
Fact 3 (Tchebychev) If X is a nite random variable with mean and variance 2 , then
for any number d > 0
2
P (fjX j > dg) < 2 :
(3.1)
d
Proof4 : Suppose that the range of X is fx1 ; x2 ; : : : ; xn g. The rst step is to write down an
expression for the probability that jX X j > d,
P (jX j > d) =
jxk j>d
P (X = xk ):
(3.2)
2 =
k=1
(xk )2 P (X = xk ):
(3.3)
Since the terms in this sum are positive, if we remove some of the terms, then the remaining
sum is smaller. So, if we drop the terms in Eq.(3.3) for which jxk j d, then the result
will be smaller than the variance:
=
2
k=1
(xk )2 P (X = xk ) !
jxk j>d
(xk )2 P (X = xk ):
You don't need to be able to reproduce the proof, but it is always better to understand why something
is true than to simply memorize it.
In the last sum (on the right), the factors (xk )2 are all larger than d2 , since the sum
only includes terms for which jxk j > d. Therefore
2 >
P (X = xk ):
d2 P (X = xk ) = d2
jxj j>d
jxk j>d
(3.4)
The right-hand sum in Eq.(3.4) is the same as the sum that appears in Eq.(3.2), and is
equal to P (jX j) > d. So, it follows that
2
> P (jX j > d):
d2
ih
Example 3
The IQ score of an individual from a large population is a random variable with a mean
= 100 and standard deviation = 15. What is the probability that a randomly selected
person will have an IQ score that is more than 30 points away from 100?
Since 30 = 2 15, we can use Eq.(3.1) to conclude that
1
= 0:25:
22
So there is less than a 25% chance that the IQ of a randomly selected individual is more
than 30 points away from 100. In other words, at least 75% of the population have IQ
scores between 70 and 130.
|
The last comment in the example above can be generalized, which gives an alternate
form of Tchebychev's inequality.
P (jX j
2
:
(3.5)
d2
d and jX j > d are complementary
d) > 1
Since P (jX j > d) < 2 =d2 , by the rst version of Tchebychev's inequality, it follows
that P (jX j d) > 1 ( 2 =d2 ).
The power of Tchebychev's inequality is that it provides an estimate based on very little
information, only the mean and variance of the random variable are necessary. But this also
means that the estimates may not be very sharp. If more information about the random
variable is available, then more precise estimates are possible.
6
Example 4
Suppose that X has the probability distribution in the table below
xk P (X = xk )
2
0 :1
1
0 :2
0
0 :2
1
0:25
2
0:15
3
0 :1
2
You should verify that X = 0:25 and X = 2:2875. According to Tchebychev's inequality
2 2:2875
=
= 0:571875:
22
4
On the other hand, in this case, we know the complete probability distribution of X . If
jX 0:25j > 2 then either X > 2:25 or X < 1:75. There are only two values in the range
of X that satisfy one of these conditions, namely x1 = 2 and x5 = 3. So, in fact
P (jX 0:25j > 2) = P (jX j > 2) <
The estimate provided by Tchebychev's inequality is correct, but not very precise.
Example 5
Suppose that the student population of a large university consists of 11400 men and 14200
women. If 200 students are selected randomly for a survey of study habits, then what is
the probability that the number of women in the survey is between 100 and 120?
We'll begin by assuming that the students are selected without replacement, since you
typically don't want to include the same person twice in a survey. If X is the number of
women included in the survey, then X $HG(25600; 14200; 200), and
P (100
120) =
120
k=100
14200
Ck
11400
25600
C200k
C200
This sum can be computed using desktop software, and even by hand if you're extraordinarily patient, but if you don't have the patience or the software and you want to get a
quick estimate, then Tchebychev's inequality can be used.
First of all, we have
14200 11400 25400
14200
E (X ) = 200
= 110:9375 and Var(X ) = 200
% 49:0178:
25600
25600 25600 25599
Next, since we will be using Tchebychev's inequality, we need an interval that is symmetric
around E (X ), i.e., something like jX E (X )j d. Now, E (X ) 100 = 10:9375, and
120 E (X ) = 9:0625, so the best we can do with Tchebychev's inequality is to say that
49:0178
P (100 X 120) > P (jX 110:9375j 9:0625) > 1
% 0:59684:
9:06252
7
The actual probability is roughly 85%, so once again, Tchebychev doesn't provide a very
precise estimate, but it is still correct.
|
There are situations, though, where Tchebychev's inequality is the only5 game in town.
We'll see this when we study condence intervals in the second supplemental note.
4.
Note: The material in this section is given more complete treatment in sections 16.2 and
16.3 of the text.
The last theoretical tool that we need is an easy method for computing (approximately)
the probability of events like a X b, where X $B (n; p). Now, it is easy to write down
a correct formula for this probability, namely
P (a
b) =
k =a
k
n Ck p (1
p)nk ;
and when n is relatively small, and a and b are relatively close to each other then this can
be done `by hand' using a calculator and some patience. On the other hand, if you have to
evaluate a sum like
k nk
125
1
5
;
600 Ck
6
6
k=75
you won't want to do it by hand. Using a desktop computer and appropriate software,
these sums are simple to compute directly, and there are also websites with various applets
for doing these calculations6 . But in the absence of advanced technology, it's nice to have
a simple method for estimating these sums.
We will use the following important result, called the normal approximation to the
binomial distribution.
P (a
b) =
z1 =
a np 0:5
p
np(1 p)
where
k =a
n Ck
pk (1
and
p )n k
z2 =
z2
2
p1
ez =2 dz;
2 z1
(4.1)
b np + 0:5
p
:
np(1 p)
Comments:
1. The proof of this fact is beyond the scope of our course, but it is essentially just an
exercise in advanced calculus.
5
Barring
6
2. This is the earliest form of one of the most important results in mathematics, called
the Central Limit Theorem.
3. The same method may be used to approximate the hypergeometric distribution, as
long as the sample size is very small relative to the total population size.
2
The graph of the function '(z ) = p1 ez =2 appears in Figure 1. This curve is called
2
0.56
0.48
0.4
0.32
0.24
0.16
0.08
-4
-3.2
-2.4
-1.6
-0.8
0.8
1.6
2.4
3.2
2
Figure 1: Graph of '(z ) = p1 ez =2
2
the standard normal curve, also known as the bell-shaped curve. It is symmetric around 0,
and the total area under the curve, from I to I, is exactly 1.
The function '(z ) is dierentiable, and so continuous, which implies that it has an
antiderivative. However, the antiderivative of '(z ) cannot be expressed as a simple7 combination of algebraic, exponential, logarithmic or trigonometric functions. So we can't
computing the integrals that arise in equation (4.1) using the fundamental theorem of calculus. This diculty is overcome by using a table, like the one that appears at the end of
this note. This table, called the standard normal table, contains values of the function
A(z0 ) =
p1
z0
2
2
ez =2 dz;
for 0 z0 3:99. In words, A(z0 ) is the area under the normal curve between 0 and
z0 . These values were computed using sophisticated numerical approximation methods.
Because of the properties of '(z ), the values in this table are sucient to compute anything
we need involving the normal distribution.
2
Specically, for any z1 and z2 , the value of p1 zz12 ez =2 dz may be computed8 using
2
the table, and the simple rules below.
7
"Simple",
8
z2
1
2
ez =2 dz = A(z2 ) A(z1 ).
N1. If 0 < z1 < z2 then p
2 z1
z2
1
2
N2. If z1 < 0 < z2 then p
ez =2 dz = A(z2 ) + A(jz1 j).
2 z1
z2
1
2
N3. If z1 < z2 < 0 then p
ez =2 dz = A(jz1 j) A(jz2 j).
2 z1
N4. If 4 z0 you may assume that A(z0 ) = 0:5.
There is nothing deep or mysterious about these rules. They follow directly from the
denition of A(z0 ), the symmetry of the curve around 0 and the fact that the total area
under the curve equals 1. Also, you should note that rules N1, N2 and N3 work just as well
when z1 = I or z2 = +I, with the understanding that A(I) = 0:5.
Example 6
I'll use the normal approximation to estimate P (120 X
600), assuming that X
B (600; 1=6). First we compute z1 and z2
120 100 0:5
p
z1 =
% 2:14 and z2 = 600 p 100 + 0:5 % 54:83:
500=6
500=6
Next, use the table in Appendix A, and apply N1 to compute
P (120
P (0
% 1:59:
Example 7
The normal approximation can also be used to estimate P (X = a). To do this, we write
P (X = a) = P (a X a), and proceed as before. For example, continuing with X $
B (600; 1=6), let's estimate P (X = 85) = P (85 X 85). In this case a = b = 85, but the
corresponding values of z1 and z2 will be dierent, because of the 0:5 in the numerator of
z1 and the +0:5 in in the numerator of z2 . Specically, we have
85 100 0:5
p
% 1:7 and z2 = 85 100 + 0:5 % 1:59:
z1 = p
500=6
500=6
Using the table and N3 again, we have
10
Example 8
Suppose that a die is rolled 2400 times, and a 5 is observed 441 times. What can we say
about the die? This question and similar ones will be the topic of the second supplementary
note, but I'll give a brief preview below.
The issue in this case is whether the die is fair or not. One approach to answering this
question, is to begin with the assumption that the die is indeed fair (which implies that
the probability of rolling a 5 is exactly 1=6), estimate the probability of the observed result
given this assumption and drawing a conclusion. If, given our assumption, the probability
of observing the given number of 5's is too low, we may choose to reject this assumption
and conclude that the die is not fair. This type of procedure is called a hypothesis test.
The second approach I'll discuss makes no assumptions about the fairness, or lack
thereof, of the die. Instead, we use the data to estimate the probability of rolling a 5
with the die in question. This approach uses what is called a condence interval. A condence interval is an interval together with an estimate for the probability that the parameter
of interest lies in the given interval. In our example above, the parameter of interest is the
probability of rolling a 5.
The two approaches are closely related, and both begin by dening a random variable
11
Exercises
1. Suppose that X is a random variable with mean = 121 and variance 2 = 15. Use
Tchebychev's inequality to estimate the probabilities
a. P (jX 121j > 25) =
b. P (108
134) =
3).
5. Suppose that X $B (n; p). Use Tchebychev's inequality to estimate P (jX npj > n2=3 )
in terms of n and p. What happens to this probability as n gets large?
12
1
The table below gives the values of A(z0 ) = p
2
z0
0
2
ez =2 dz , for 0
z0
3:99.
z0
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.00
0.0000
0.0040
0.0080
0.0120
0.0160
0.0199
0.0239
0.0279
0.0319
0.0359
0.10
0.0398
0.0438
0.0478
0.0517
0.0557
0.0596
0.0636
0.0675
0.0714
0.0753
0.20
0.0793
0.0832
0.0871
0.0910
0.0948
0.0987
0.1026
0.1064
0.1103
0.1141
0.30
0.1179
0.1217
0.1255
0.1293
0.1331
0.1368
0.1406
0.1443
0.1480
0.1517
0.40
0.1554
0.1591
0.1628
0.1664
0.1700
0.1736
0.1772
0.1808
0.1844
0.1879
0.50
0.1915
0.1950
0.1985
0.2019
0.2054
0.2088
0.2123
0.2157
0.2190
0.2224
0.60
0.2257
0.2291
0.2324
0.2357
0.2389
0.2422
0.2454
0.2486
0.2517
0.2549
0.70
0.2580
0.2611
0.2642
0.2673
0.2704
0.2734
0.2764
0.2794
0.2823
0.2852
0.80
0.2881
0.2910
0.2939
0.2967
0.2995
0.3023
0.3051
0.3078
0.3106
0.3133
0.90
0.3159
0.3186
0.3212
0.3238
0.3264
0.3289
0.3315
0.3340
0.3365
0.3389
1.00
0.3413
0.3438
0.3461
0.3485
0.3508
0.3531
0.3554
0.3577
0.3599
0.3621
1.10
0.3643
0.3665
0.3686
0.3708
0.3729
0.3749
0.3770
0.3790
0.3810
0.3830
1.20
0.3849
0.3869
0.3888
0.3907
0.3925
0.3944
0.3962
0.3980
0.3997
0.4015
1.30
0.4032
0.4049
0.4066
0.4082
0.4099
0.4115
0.4131
0.4147
0.4162
0.4177
1.40
0.4192
0.4207
0.4222
0.4236
0.4251
0.4265
0.4279
0.4292
0.4306
0.4319
1.50
0.4332
0.4345
0.4357
0.4370
0.4382
0.4394
0.4406
0.4418
0.4429
0.4441
1.60
0.4452
0.4463
0.4474
0.4484
0.4495
0.4505
0.4515
0.4525
0.4535
0.4545
1.70
0.4554
0.4564
0.4573
0.4582
0.4591
0.4599
0.4608
0.4616
0.4625
0.4633
1.80
0.4641
0.4649
0.4656
0.4664
0.4671
0.4678
0.4686
0.4693
0.4699
0.4706
1.90
0.4713
0.4719
0.4726
0.4732
0.4738
0.4744
0.4750
0.4756
0.4761
0.4767
2.00
0.4772
0.4778
0.4783
0.4788
0.4793
0.4798
0.4803
0.4808
0.4812
0.4817
2.10
0.4821
0.4826
0.4830
0.4834
0.4838
0.4842
0.4846
0.4850
0.4854
0.4857
2.20
0.4861
0.4864
0.4868
0.4871
0.4875
0.4878
0.4881
0.4884
0.4887
0.4890
2.30
0.4893
0.4896
0.4898
0.4901
0.4904
0.4906
0.4909
0.4911
0.4913
0.4916
2.40
0.4918
0.4920
0.4922
0.4925
0.4927
0.4929
0.4931
0.4932
0.4934
0.4936
2.50
0.4938
0.4940
0.4941
0.4943
0.4945
0.4946
0.4948
0.4949
0.4951
0.4952
2.60
0.4953
0.4955
0.4956
0.4957
0.4959
0.4960
0.4961
0.4962
0.4963
0.4964
2.70
0.4965
0.4966
0.4967
0.4968
0.4969
0.4970
0.4971
0.4972
0.4973
0.4974
2.80
0.4974
0.4975
0.4976
0.4977
0.4977
0.4978
0.4979
0.4979
0.4980
0.4981
2.90
0.4981
0.4982
0.4982
0.4983
0.4984
0.4984
0.4985
0.4985
0.4986
0.4986
3.00
0.4987
0.4987
0.4987
0.4988
0.4988
0.4989
0.4989
0.4989
0.4990
0.4990
3.10
0.4990
0.4991
0.4991
0.4991
0.4992
0.4992
0.4992
0.4992
0.4993
0.4993
3.20
0.4993
0.4993
0.4994
0.4994
0.4994
0.4994
0.4994
0.4995
0.4995
0.4995
3.30
0.4995
0.4995
0.4995
0.4996
0.4996
0.4996
0.4996
0.4996
0.4996
0.4997
3.40
0.4997
0.4997
0.4997
0.4997
0.4997
0.4997
0.4997
0.4997
0.4997
0.4998
3.50
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
3.60
0.4998
0.4998
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
3.70
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
3.80
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
3.90
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
13