Prob 1

UCSC
AMS/ECON 11B
Random Variables | Additional Topics

c 2006 by Yonatan Katznelson, UCSC
This supplement covers four topics. In x1, I brie y review the properties of the binomial
distribution, which is covered in the textbook in section 9.2. In x2, I introduce the hypergeometric distribution, which is a close relative of the binomial distribution. In x3, I introduce
Tchebychev's inequality, which provides a simple estimate for the probability of observing a
value of a random variable that is far from its expected value. The last topic is the normal
approximation to the binomial distribution, which I discuss in x4, (and which is covered in
the textbook in sections 16.2 and 16.3). This is an historically important special case of the
Central Limit Theorem, which is one of the most important results in mathematics.
1.
The binomial distribution
Recall1 that a random variable X follows the binomial distribution with parameters n
and p, if and only if the probability distribution of X is given by
P (X = k) = n Ck pk (1 p)nk ; for k = 0; 1; 2; : : : ; n;
(1.1)
where n is a positive integer and 0 < p < 1. It is standard to write X $B (n; p) in this case.
Binomial random variables arise in situations where an experiment with two outcomes
is repeated2 a certain number of times. Suppose that the two outcomes are A and B , and
the probability of outcome A is equal to p, and further suppose that the experiment is
repeated n times in such a way that the outcomes are all independent of each other. If X
is the number of times that outcome A is observed in these n repetitions, then X $B (n; p),
as was explained in class and in the textbook.
For example, if you toss a fair coin 60 times, and H = the number of heads you observe,
then H $B (60; 1=2). The two outcomes in this case are heads and tails. But the binomial
distribution is not limited to coin tosses. If you roll a fair, six-sided die there are six basic
outcomes, but we can still describe the die-roll as a Bernoulli trial with two outcomes by
grouping the outcomes into two groups. E.g., if X = the number of 5's that are observed
in 600 rolls of a fair die, then X $B (600; 1=6), because the two outcomes we are observing
are A = 5 and B = (not 5).
In the book, it is stated without proof that if X $B (n; p) then
E (X ) = np
Var(X ) = np(1 p):
and
I will show you the proof that E (X ) = np below. The proof that Var(X ) = np(1 p) is
similar but slightly more technical, so I will leave it out. `Knowing' the proof below is not
a requirement of the course, so you can skip it if you want. But if you like to challenge
yourself, then read on. I have left out a few small details that you will need to ll in to
completely understand the proof.
1
See
2
section 9.2 in the textbook.

These repeated experiments are called Bernoulli
Trials.
1
Fact 1 If X $B (n; p) then E (X ) = np.

Proof: We begin, as we must, with the denition of expected value and go from there.
Since X $B (n; p) we have
E (X ) =
k=0
k P (X = k ) =
k=0
k( n Ck )pk (1 p)nk :
The rst thing that we do, is drop the rst term in the sum, since when k = 0 the term
0 [ n C0 p0 (1 p)n ] = 0, so
E (X ) =
k=1
k( n Ck )pk (1 p)nk :
(1.2)
Next we manipulate the expression k( n Ck ), to put it into a more convenient form, namely
we show that k( n Ck ) = n( n1 Ck1 ):
n!
n!
=
k( n Ck ) = k
k!(n k)!
(k 1)!(n k)!
(n 1)!
= n
= n( n1 Ck1 ):
(k 1)!(n k)!
Using this identity in the second expression for E (X ), Eq.(1.2), we get
3
E (X ) =
k=1
n( n1 Ck1 )pk (1 p)nk = n
2
n
k=1
( n1 Ck1 )pk (1 p)nk :
Now, we can also pull a factor of p out of the last sum on the right to obtain
E (X ) = n
2
n
k=1
( n1 Ck1
)pk (1
p)nk
= np
k=1
( n1 Ck1
)pk1 (1
p)nk
To complete the proof, we need to show that

n
k=1
( n1 Ck1 )pk1 (1 p)nk = 1:
To do this, we rst `rename' the index of summation, k. Specically, if we set k 1 = j

and rewrite the the last sum in terms of j , we get
n
n 1
k1 (1 p)nk =
( n1 Ck1 )p
( n1 Cj )pj (1 p)(n1)j :
j =0
k=1
Now, if Y $B (n 1; p), then P (Y = j ) = ( n1 Cj )pj (1 p)(n1)j , for j = 0; 1; : : : ; n 1,

so
n1
n1
j (1 p)(n1)j =
( n1 Cj )p
P (Y = j ) = 1;
j =0
j =0
since it is a sum of the probabilities of all the possible values of Y .

3
This is one of the places where you have to ll in some small technical details.
ih
Example 1
An urn contains 20 red marbles and 30 blue marbles. If 8 marbles are selected from the
urn with replacement, then what is the probability that no more than 3 red marbles are
selected?
Since the marbles are sampled with replacement, the probability that a selected marble
is red is 2 , and if X = the number of red marbles observed in the sample of 8, then
5
X $B (8; 0:4). Then
P (0
3) =
k=0
Ck (0:4)k (0:6)8k
= 8 C0 (0:4)0 (0:6)8 + 8 C1 (0:4)1 (0:6)7 + 8 C2 (0:4)2 (0:6)6 + 8 C3 (0:4)3 (0:6)5

= 0:01679616 + 0:08957952 + 0:20901888 + 0:27869184
= 0:5940864:
2.
The hypergeometric distribution
The hypergeometric distribution arises is situations of sampling without replacement.

Imagine a large urn that contains M marbles, where R of the marbles are red. Suppose
that a sample of n marbles is drawn from the urn without replacement, and let X be the
number of red marbles that are observed in the sample. What is the probability distribution
of X ?
To keep things simple, we'll assume that the sample size, n, is smaller than R and also
smaller than M R. This implies that the the random variable X can take any (integer)
value between 0 and n, i.e., Range(X ) = f0; 1; 2; : : : ; ng. Now, if k is in the range of X ,
then what is the probability that X = k?
Since the marbles are selected without replacement, and the order of the marbles is not
relevant to what we are studying, it is convenient to imagine the n marbles being drawn all
at once. The outcomes in the sample space of this experiment may then be simply described
as combinations of the M marbles, taken n at a time. This means that the total number
of possible outcomes is
#(S ) = M Cn :
Let's denote by Fk the set of outcomes in S that consist of exactly k red marbles and
n k non-red marbles, then P (X = k) = #(Fk )=#(S ). There are R Ck ways of selecting
k red marbles, and there are M R Cnk ways of selecting n k non-red marbles, so by the
multiplication principle
#(Fk ) = R Ck M R Cnk ;
and therefore
C
C
P (X = k) = R k M R nk :
M Cn
Denition: A random variable X has the hypergeometric distribution with parameters M ,
R and n, if the probability distribution of X is
P (X = k ) =
R Ck
M R Cnk ;
M Cn
for k = 0; 1; 2; : : : ; n:
(2.1)
It is customary to use the shorthand X $HG(M; R; n) in this case.
Fact 2 If X $HG(M; R; n) then
E (X ) = n
R
M
and
Var(X ) = n
R M R M n
M M 1:
M
(2.2)
The proof of this fact is similar in principle to the proof of Fact 1.

The hypergeometric distribution is very closely related to the binomial distribution, both
conceptually and mathematically. To see the similarity, imagine that n marbles are drawn
from the urn with replacement. This amounts to n repetitions of the experiment `draw one
marble from the urn', where the two outcomes are red marble and non-red marble. Since the
marble is replaced after each selection (and the marbles mixed up again), the probability
of drawing a red marble each time stays the same, namely
R
:
M
So, if Y is the number of red marbles observed, when the marbles are drawn with replace R
ment, then Y $B n; M .
In other words, if X is the number of red marbles selected from the urn without replacement and Y is the number of red marbles selected with replacement, then X $HG(M; R; n)
R
and Y $ B n; M . But the similarity doesn't end there. The expected number of red
marbles in both cases is identical, as you can check from the properties of the binomial and
hypergeometric distributions, and the variances are also closely related. Specically,
P (red marble is selected) =
Var(Y ) = n
R M R
R
M and Var(X ) = n M MM R M n = Var(Y ) M n :

M
M 1
M 1
The variance of the hypergeometric distribution is smaller than the variance of the binomial distribution, (assuming the same distribution of marbles in the urn), because of the
additional factor of M n in the formula for the variance of the hypergeometric distribution.
M 1
In fact, if n is very small compared to M , then the two distributions are almost identical.
But even when this is not the case, the two distributions can give similar results, as the
example below illustrates.
Example 2
Returning to the urn in example 1, and suppose now that 8 marbles are randomly selected
from the urn without replacement. What is the probability that no more than 3 red marbles
are observed?
If Y = the number of red marbles that are observed, then Y $HG(50; 20; 8), and the
probability that no more than 3 red marbles are observed is
P (0
3) =
C0 30 C8
+
50 C8
0:599470691
20
20
C1 30 C7
+
50 C8
20
C2 30 C6
+
50 C8
20
C3 30 C5
50 C8
|
4
3.
Tchebychev's inequality
Recall that if X is a nite random variable with range fx1 ; x2 ; : : : ; xn g, then the expected value of X is dened by
E (X ) =
k=1
xk P (X = xk ):
The expected value is also denoted by X , (or simply , when there is no chance of confusion), and called the mean of X . Also, recall that the variance of X is dened by
Var(X ) =
k=1
(xk X )2 P (X = xk ):
2
The variance is also denoted by 2 , or X . It is the expected square deviation from the mean.
If the variance is relatively small, then the observed values of X are more likely to be close
to the mean, if the variance is large, then it is more likely to observe values of X that are
far from the mean. The 19th century Russian mathematician P.L. Tchebychev gave this
intuition a more precise form with a simple estimate for the probability of observing values
of X that are far from the mean.
Fact 3 (Tchebychev) If X is a nite random variable with mean and variance 2 , then
for any number d > 0
2
P (fjX j > dg) < 2 :
(3.1)
d
Proof4 : Suppose that the range of X is fx1 ; x2 ; : : : ; xn g. The rst step is to write down an
expression for the probability that jX X j > d,
P (jX j > d) =
jxk j>d
P (X = xk ):
(3.2)
The sum includes P (X = xk ) as a term only if jxk j is greater than d.

Next we use the formula for the variance of X :
2 =
k=1
(xk )2 P (X = xk ):
(3.3)
Since the terms in this sum are positive, if we remove some of the terms, then the remaining
sum is smaller. So, if we drop the terms in Eq.(3.3) for which jxk j d, then the result
will be smaller than the variance:
=
2
k=1
(xk )2 P (X = xk ) !
jxk j>d
(xk )2 P (X = xk ):
You don't need to be able to reproduce the proof, but it is always better to understand why something
is true than to simply memorize it.
In the last sum (on the right), the factors (xk )2 are all larger than d2 , since the sum
only includes terms for which jxk j > d. Therefore
2 >
P (X = xk ):
d2 P (X = xk ) = d2
jxj j>d
jxk j>d
(3.4)
The right-hand sum in Eq.(3.4) is the same as the sum that appears in Eq.(3.2), and is
equal to P (jX j) > d. So, it follows that
2 > d2 P (jX j > d);

and dividing both sides by d2 gives Tchebychev's inequality
2
> P (jX j > d):
d2
ih
Example 3
The IQ score of an individual from a large population is a random variable with a mean
= 100 and standard deviation = 15. What is the probability that a randomly selected
person will have an IQ score that is more than 30 points away from 100?
Since 30 = 2 15, we can use Eq.(3.1) to conclude that
P (fjX 100j > 30g) <
1
= 0:25:
22
So there is less than a 25% chance that the IQ of a randomly selected individual is more
than 30 points away from 100. In other words, at least 75% of the population have IQ
scores between 70 and 130.
|
The last comment in the example above can be generalized, which gives an alternate
form of Tchebychev's inequality.
Fact 4 If X is a random variable with mean and variance 2 then
P (jX j
2
:
(3.5)
d2
d and jX j > d are complementary
d) > 1
This follows directly from the fact that jX j

events, so
P (jX j d) + P (jX j > d) = 1:
Since P (jX j > d) < 2 =d2 , by the rst version of Tchebychev's inequality, it follows
that P (jX j d) > 1 ( 2 =d2 ).
The power of Tchebychev's inequality is that it provides an estimate based on very little
information, only the mean and variance of the random variable are necessary. But this also
means that the estimates may not be very sharp. If more information about the random
variable is available, then more precise estimates are possible.
6
Example 4
Suppose that X has the probability distribution in the table below
xk P (X = xk )
2
0 :1
1
0 :2
0
0 :2
1
0:25
2
0:15
3
0 :1
2
You should verify that X = 0:25 and X = 2:2875. According to Tchebychev's inequality
2 2:2875
=
= 0:571875:
22
4
On the other hand, in this case, we know the complete probability distribution of X . If
jX 0:25j > 2 then either X > 2:25 or X < 1:75. There are only two values in the range
of X that satisfy one of these conditions, namely x1 = 2 and x5 = 3. So, in fact
P (jX 0:25j > 2) = P (jX j > 2) <
P (jX 0:25j > 2) = P (X = 2) + P (X = 3) = 0:2:
The estimate provided by Tchebychev's inequality is correct, but not very precise.
Example 5
Suppose that the student population of a large university consists of 11400 men and 14200
women. If 200 students are selected randomly for a survey of study habits, then what is
the probability that the number of women in the survey is between 100 and 120?
We'll begin by assuming that the students are selected without replacement, since you
typically don't want to include the same person twice in a survey. If X is the number of
women included in the survey, then X $HG(25600; 14200; 200), and
P (100
120) =
120
k=100
14200
Ck
11400
25600
C200k
C200
This sum can be computed using desktop software, and even by hand if you're extraordinarily patient, but if you don't have the patience or the software and you want to get a
quick estimate, then Tchebychev's inequality can be used.
First of all, we have
14200 11400 25400
14200
E (X ) = 200
= 110:9375 and Var(X ) = 200
% 49:0178:
25600
25600 25600 25599
Next, since we will be using Tchebychev's inequality, we need an interval that is symmetric
around E (X ), i.e., something like jX E (X )j d. Now, E (X ) 100 = 10:9375, and
120 E (X ) = 9:0625, so the best we can do with Tchebychev's inequality is to say that
49:0178
P (100 X 120) > P (jX 110:9375j 9:0625) > 1
% 0:59684:
9:06252
7
The actual probability is roughly 85%, so once again, Tchebychev doesn't provide a very
precise estimate, but it is still correct.
|
There are situations, though, where Tchebychev's inequality is the only5 game in town.
We'll see this when we study condence intervals in the second supplemental note.
4.
The normal approximation
Note: The material in this section is given more complete treatment in sections 16.2 and
16.3 of the text.
The last theoretical tool that we need is an easy method for computing (approximately)
the probability of events like a X b, where X $B (n; p). Now, it is easy to write down
a correct formula for this probability, namely
P (a
b) =
k =a
k
n Ck p (1
p)nk ;
and when n is relatively small, and a and b are relatively close to each other then this can
be done `by hand' using a calculator and some patience. On the other hand, if you have to
evaluate a sum like
k nk
125
1
5
;
600 Ck
6
6
k=75
you won't want to do it by hand. Using a desktop computer and appropriate software,
these sums are simple to compute directly, and there are also websites with various applets
for doing these calculations6 . But in the absence of advanced technology, it's nice to have
a simple method for estimating these sums.
We will use the following important result, called the normal approximation to the
binomial distribution.
Fact 5 If X $B (n; p), with np > 5 and n(1 p) > 5, and 0
P (a
b) =
z1 =
a np 0:5
p
np(1 p)
where
k =a
n Ck
pk (1
and
p )n k
z2 =
n are integers then
z2
2
p1
ez =2 dz;
2 z1
(4.1)
b np + 0:5
p
:
np(1 p)
Comments:
1. The proof of this fact is beyond the scope of our course, but it is essentially just an
exercise in advanced calculus.
5
Barring
6
more sophisticated techniques.

See http://www.stat.sc.edu/west/applets/binomialdemo.html, for example.
2. This is the earliest form of one of the most important results in mathematics, called
the Central Limit Theorem.
3. The same method may be used to approximate the hypergeometric distribution, as
long as the sample size is very small relative to the total population size.
2
The graph of the function '(z ) = p1 ez =2 appears in Figure 1. This curve is called
2
0.56
0.48
0.4
0.32
0.24
0.16
0.08
-4
-3.2
-2.4
-1.6
-0.8
0.8
1.6
2.4
3.2
2
Figure 1: Graph of '(z ) = p1 ez =2
2
the standard normal curve, also known as the bell-shaped curve. It is symmetric around 0,
and the total area under the curve, from I to I, is exactly 1.
The function '(z ) is dierentiable, and so continuous, which implies that it has an
antiderivative. However, the antiderivative of '(z ) cannot be expressed as a simple7 combination of algebraic, exponential, logarithmic or trigonometric functions. So we can't
computing the integrals that arise in equation (4.1) using the fundamental theorem of calculus. This diculty is overcome by using a table, like the one that appears at the end of
this note. This table, called the standard normal table, contains values of the function
A(z0 ) =
p1
z0
2
2
ez =2 dz;
for 0 z0 3:99. In words, A(z0 ) is the area under the normal curve between 0 and
z0 . These values were computed using sophisticated numerical approximation methods.
Because of the properties of '(z ), the values in this table are sucient to compute anything
we need involving the normal distribution.
2
Specically, for any z1 and z2 , the value of p1 zz12 ez =2 dz may be computed8 using
2
the table, and the simple rules below.
7
"Simple",
8
in this context, means a nite number of terms in the expression.

With sucient accuracy for our purposes.
z2
1
2
ez =2 dz = A(z2 ) A(z1 ).
N1. If 0 < z1 < z2 then p
2 z1
z2
1
2
N2. If z1 < 0 < z2 then p
ez =2 dz = A(z2 ) + A(jz1 j).
2 z1
z2
1
2
N3. If z1 < z2 < 0 then p
ez =2 dz = A(jz1 j) A(jz2 j).
2 z1
N4. If 4 z0 you may assume that A(z0 ) = 0:5.
There is nothing deep or mysterious about these rules. They follow directly from the
denition of A(z0 ), the symmetry of the curve around 0 and the fact that the total area
under the curve equals 1. Also, you should note that rules N1, N2 and N3 work just as well
when z1 = I or z2 = +I, with the understanding that A(I) = 0:5.
Example 6
I'll use the normal approximation to estimate P (120 X
600), assuming that X
B (600; 1=6). First we compute z1 and z2
120 100 0:5
p
z1 =
% 2:14 and z2 = 600 p 100 + 0:5 % 54:83:
500=6
500=6
Next, use the table in Appendix A, and apply N1 to compute
P (120
600) % A(54:83) A(2:14) = 0:5 0:4838 = 0:0162:
Next I'll estimate the probability P (0 X 85). In this case

0 100 0:5
p
z1 = p
% 11 and z2 = 85 100 + 0:5
500=6
500=6
Referring to appendix A again, and using N3, we have
P (0
% 1:59:
85) % A(11) A(1:59) = 0:5 0:4441 = 0:0559:
Example 7
The normal approximation can also be used to estimate P (X = a). To do this, we write
P (X = a) = P (a X a), and proceed as before. For example, continuing with X $
B (600; 1=6), let's estimate P (X = 85) = P (85 X 85). In this case a = b = 85, but the
corresponding values of z1 and z2 will be dierent, because of the 0:5 in the numerator of
z1 and the +0:5 in in the numerator of z2 . Specically, we have
85 100 0:5
p
% 1:7 and z2 = 85 100 + 0:5 % 1:59:
z1 = p
500=6
500=6
Using the table and N3 again, we have
P (X = 85) % A(1:7) A(1:59) = 0:0113:
10
Example 8
Suppose that a die is rolled 2400 times, and a 5 is observed 441 times. What can we say
about the die? This question and similar ones will be the topic of the second supplementary
note, but I'll give a brief preview below.
The issue in this case is whether the die is fair or not. One approach to answering this
question, is to begin with the assumption that the die is indeed fair (which implies that
the probability of rolling a 5 is exactly 1=6), estimate the probability of the observed result
given this assumption and drawing a conclusion. If, given our assumption, the probability
of observing the given number of 5's is too low, we may choose to reject this assumption
and conclude that the die is not fair. This type of procedure is called a hypothesis test.
The second approach I'll discuss makes no assumptions about the fairness, or lack
thereof, of the die. Instead, we use the data to estimate the probability of rolling a 5
with the die in question. This approach uses what is called a condence interval. A condence interval is an interval together with an estimate for the probability that the parameter
of interest lies in the given interval. In our example above, the parameter of interest is the
probability of rolling a 5.
The two approaches are closely related, and both begin by dening a random variable
Y = the number of 5's in 2400 rolls of the die.

The main dierence between the two is that the rst approach involves the assumption that
the die is fair, while in the second case we make no assumptions about the die. Thus, in
the rst case we begin by assuming that Y $B (2400; 1=6), while in the second case, the
best we can say is Y $B (2400; p), where p is the unknown probability of rolling a 5.
In the second supplement, I will use elementary methods to study both approaches, just
to give you a sense of what's involved.
|
11
Exercises
1. Suppose that X is a random variable with mean = 121 and variance 2 = 15. Use
Tchebychev's inequality to estimate the probabilities
a. P (jX 121j > 25) =
b. P (108
134) =
2. Suppose that X is the number of heads observed in 20 tosses of a fair coin.

a. What is the probability distribution of X ?
b. Use Tchebychev's inequality to estimate the probability P (jX 10j
3).
c. Use the normal approximation to estimate the same probability.

d. Compute this probability directly, (using the pdf of X ).
3. Suppose that Y $HG(1000; 350; 100).

a. Write down a precise expression for the probability P (jY
Y . You don't need to evaluate the expression.
35j > 10) using the pdf of
b. Use Tchebychev's inequality to estimate the probability in 3a.
4. Use the normal approximation to estimate the P (jY

variable Y $B (2400; 1=6) in example 8.
400j > 40), for the random
5. Suppose that X $B (n; p). Use Tchebychev's inequality to estimate P (jX npj > n2=3 )
in terms of n and p. What happens to this probability as n gets large?
12
The standard normal table
1
The table below gives the values of A(z0 ) = p
2
z0
0
2
ez =2 dz , for 0
z0
3:99.
z0
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.00
0.0000
0.0040
0.0080
0.0120
0.0160
0.0199
0.0239
0.0279
0.0319
0.0359
0.10
0.0398
0.0438
0.0478
0.0517
0.0557
0.0596
0.0636
0.0675
0.0714
0.0753
0.20
0.0793
0.0832
0.0871
0.0910
0.0948
0.0987
0.1026
0.1064
0.1103
0.1141
0.30
0.1179
0.1217
0.1255
0.1293
0.1331
0.1368
0.1406
0.1443
0.1480
0.1517
0.40
0.1554
0.1591
0.1628
0.1664
0.1700
0.1736
0.1772
0.1808
0.1844
0.1879
0.50
0.1915
0.1950
0.1985
0.2019
0.2054
0.2088
0.2123
0.2157
0.2190
0.2224
0.60
0.2257
0.2291
0.2324
0.2357
0.2389
0.2422
0.2454
0.2486
0.2517
0.2549
0.70
0.2580
0.2611
0.2642
0.2673
0.2704
0.2734
0.2764
0.2794
0.2823
0.2852
0.80
0.2881
0.2910
0.2939
0.2967
0.2995
0.3023
0.3051
0.3078
0.3106
0.3133
0.90
0.3159
0.3186
0.3212
0.3238
0.3264
0.3289
0.3315
0.3340
0.3365
0.3389
1.00
0.3413
0.3438
0.3461
0.3485
0.3508
0.3531
0.3554
0.3577
0.3599
0.3621
1.10
0.3643
0.3665
0.3686
0.3708
0.3729
0.3749
0.3770
0.3790
0.3810
0.3830
1.20
0.3849
0.3869
0.3888
0.3907
0.3925
0.3944
0.3962
0.3980
0.3997
0.4015
1.30
0.4032
0.4049
0.4066
0.4082
0.4099
0.4115
0.4131
0.4147
0.4162
0.4177
1.40
0.4192
0.4207
0.4222
0.4236
0.4251
0.4265
0.4279
0.4292
0.4306
0.4319
1.50
0.4332
0.4345
0.4357
0.4370
0.4382
0.4394
0.4406
0.4418
0.4429
0.4441
1.60
0.4452
0.4463
0.4474
0.4484
0.4495
0.4505
0.4515
0.4525
0.4535
0.4545
1.70
0.4554
0.4564
0.4573
0.4582
0.4591
0.4599
0.4608
0.4616
0.4625
0.4633
1.80
0.4641
0.4649
0.4656
0.4664
0.4671
0.4678
0.4686
0.4693
0.4699
0.4706
1.90
0.4713
0.4719
0.4726
0.4732
0.4738
0.4744
0.4750
0.4756
0.4761
0.4767
2.00
0.4772
0.4778
0.4783
0.4788
0.4793
0.4798
0.4803
0.4808
0.4812
0.4817
2.10
0.4821
0.4826
0.4830
0.4834
0.4838
0.4842
0.4846
0.4850
0.4854
0.4857
2.20
0.4861
0.4864
0.4868
0.4871
0.4875
0.4878
0.4881
0.4884
0.4887
0.4890
2.30
0.4893
0.4896
0.4898
0.4901
0.4904
0.4906
0.4909
0.4911
0.4913
0.4916
2.40
0.4918
0.4920
0.4922
0.4925
0.4927
0.4929
0.4931
0.4932
0.4934
0.4936
2.50
0.4938
0.4940
0.4941
0.4943
0.4945
0.4946
0.4948
0.4949
0.4951
0.4952
2.60
0.4953
0.4955
0.4956
0.4957
0.4959
0.4960
0.4961
0.4962
0.4963
0.4964
2.70
0.4965
0.4966
0.4967
0.4968
0.4969
0.4970
0.4971
0.4972
0.4973
0.4974
2.80
0.4974
0.4975
0.4976
0.4977
0.4977
0.4978
0.4979
0.4979
0.4980
0.4981
2.90
0.4981
0.4982
0.4982
0.4983
0.4984
0.4984
0.4985
0.4985
0.4986
0.4986
3.00
0.4987
0.4987
0.4987
0.4988
0.4988
0.4989
0.4989
0.4989
0.4990
0.4990
3.10
0.4990
0.4991
0.4991
0.4991
0.4992
0.4992
0.4992
0.4992
0.4993
0.4993
3.20
0.4993
0.4993
0.4994
0.4994
0.4994
0.4994
0.4994
0.4995
0.4995
0.4995
3.30
0.4995
0.4995
0.4995
0.4996
0.4996
0.4996
0.4996
0.4996
0.4996
0.4997
3.40
0.4997
0.4997
0.4997
0.4997
0.4997
0.4997
0.4997
0.4997
0.4997
0.4998
3.50
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
0.4998
3.60
0.4998
0.4998
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
3.70
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
3.80
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
3.90
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
0.4999
13

Prob 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Prob 1

Uploaded by

Copyright:

Available Formats

UCSC

Random Variables | Additional Topics

c 2006 by Yonatan Katznelson, UCSC

The binomial distribution

Var(X ) = np(1 p):

section 9.2 in the textbook.

Fact 1 If X $B (n; p) then E (X ) = np.

n( n1 Ck1 )pk (1 p)nk = n

( n1 Ck1 )pk (1 p)nk :

To complete the proof, we need to show that

( n1 Ck1 )pk1 (1 p)nk = 1:

To do this, we rst `rename' the index of summation, k. Speci cally, if we set k 1 = j

Now, if Y $B (n 1; p), then P (Y = j ) = ( n1 Cj )pj (1 p)(n1)j , for j = 0; 1; : : : ; n 1,

since it is a sum of the probabilities of all the possible values of Y .

= 8 C0 (0:4)0 (0:6)8 + 8 C1 (0:4)1 (0:6)7 + 8 C2 (0:4)2 (0:6)6 + 8 C3 (0:4)3 (0:6)5

The hypergeometric distribution

The hypergeometric distribution arises is situations of sampling without replacement.

It is customary to use the shorthand X $HG(M; R; n) in this case.

Fact 2 If X $HG(M; R; n) then

The proof of this fact is similar in principle to the proof of Fact 1.

M and Var(X ) = n M MM R M n = Var(Y ) M n :

The sum includes P (X = xk ) as a term only if jxk j is greater than d.

2 > d2 P (jX j > d);

P (fjX 100j > 30g) <

Fact 4 If X is a random variable with mean  and variance  2 then

This follows directly from the fact that jX j

P (jX 0:25j > 2) = P (X = 2) + P (X = 3) = 0:2:

The normal approximation

Fact 5 If X $B (n; p), with np > 5 and n(1 p) > 5, and 0

n are integers then

more sophisticated techniques.

in this context, means a nite number of terms in the expression.

600) % A(54:83) A(2:14) = 0:5 0:4838 = 0:0162:

Next I'll estimate the probability P (0 X 85). In this case

85) % A(11) A(1:59) = 0:5 0:4441 = 0:0559:

P (X = 85) % A(1:7) A(1:59) = 0:0113:

Y = the number of 5's in 2400 rolls of the die.

2. Suppose that X is the number of heads observed in 20 tosses of a fair coin.

c. Use the normal approximation to estimate the same probability.

3. Suppose that Y $HG(1000; 350; 100).

35j > 10) using the pdf of

b. Use Tchebychev's inequality to estimate the probability in 3a.

4. Use the normal approximation to estimate the P (jY

400j > 40), for the random

The standard normal table

You might also like

To do this, we rst `rename' the index of summation, k. Specically, if we set k 1 = j

The sum includes P (X = xk ) as a term only if jxk j is greater than d.

2 > d2 P (jX j > d);

Fact 4 If X is a random variable with mean and variance 2 then

This follows directly from the fact that jX j