Professional Documents
Culture Documents
Chapter 3 Outline
• Review
o Random Variables
o Relative Frequency Interpretation of Probability
• Populations, Samples, Estimation Procedures, and the Estimate’s
Probability Distribution
o Why Is the Mean of the Estimate’s Probability Distribution
Important? Biased and Unbiased Estimation Procedures
o Why Is the Variance of the Estimate’s Probability Distribution
Important? Reliability of Unbiased Estimation Procedures
• Interval Estimates
• Central Limit Theorem
• The Normal Distribution: A Way to Estimate Probabilities
o Properties of the Normal Distribution:
o Using the Normal Distribution Table: An Example
o Justifying the Use of the Normal Distribution
o Normal Distribution’s Rules of Thumb
• Clint’s Dilemma and His Opinion Poll
6. Would you have more confidence in a poll that queries a small number of
individuals or a poll that queries a large number?
Review
• Random Variables: Before the experiment is conducted:
o Bad news. What we do not know: We cannot determine the
numerical value of the random variable with certainty.
o Good news. What we do know: On the other hand, we can often
calculate the random variable’s probability distribution telling us
how likely it is for the random variable to equal each of its possible
numerical values.
• Relative Frequency Interpretation of Probability: After many, many
repetitions of the experiment the distribution of the numerical values from
the experiments mirrors the random variable’s probability distribution
Next, focus on the variance. Apply the arithmetic of variances and what we know
about the vt’s:
1
[
Var[ EstFrac ] = Var ( v1 + v2 + … + vT )
T
]
Since Var[cx] = c2Var[x]
1
[
= 2 Var ( v1 + v2 + … + vT )
T
]
Since Var[x + y] = Var[x] + Var[y] when x and y
are independent; hence, the covarainces are all 0.
1
= 2 (Var[v1 ] + Var[v2 ] + … + Var[vT ] )
T
Since Var[v1] = Var[v2] = … = Var[vT] = p(1
− p)
1
= 2 ( p (1 − p ) + p (1 − p ) + … + p (1 − p ) )
T
5
To summarize:
p (1 − p )
Mean[ EstFrac ] = p Var[ EstFrac ] =
T
where p = ActFrac = Actual fraction of the population supporting Clint
T = Sample Size
Econometrics Lab 3.1: Polling – Checking the Mean and Variance Equations
Our simulation results are consistent with the equations we derived. After many,
many repetitions, the means of the numerical values equal the means of the
estimated fraction’s probability distribution. The same is true for the variances.
7
Public opinion polls use procedures very similar to that described in our
experiment. A specific number of people are asked who or what they support and
then the results are reported. We can think of a poll as one repetition of our
experiment. Pollsters use the numerical value of the estimated fraction from one
repetition of the experiment to estimate the actual fraction. But how reliable is
such an estimate? We shall now show that the reliability of an estimate depends
on the mean and variance of the estimate’s probability distribution.
Recall Clint’s poll in which 12 of the 16 individuals queried supported him. The
estimated fraction, EstFrac, equaled .75:
12
EstFrac = = .75
16
In Chapter 2, we used our Opinion Poll simulation to show that this poll result did
not prove with certainty that the actual fraction of the population supporting Clint
exceeded .50. More generally, we observed that while it is possible for the
estimated fraction to equal the actual population fraction, it is more likely for the
estimated fraction to be greater than or less than the actual fraction. In other
words, we cannot expect the estimated fraction from a single poll to equal the
actual population fraction.
Probability Distribution
EstFrac
Mean[EstFrac] = ActFrac
Figure 3.1: Probability Distribution of EstFrac Values
When the estimation procedure is unbiased, the average of the numerical values
of the estimated fractions equaled the actual population fraction after many, many
repetitions. Table 3.1 reports that this is true.
By default, an actual population fraction of .50 and a sample size of 100 are
specified. Two new lists appear in the lower left of the window: a From list and a
10
To list. By default, a From value of .000 and a To value of .500 are selected. The
From-To Percent line reports the percent of repetitions in which the estimated
fraction lies between the From value, .000, and the To value, .500.
To summarize, there are two important points to make about Clint’s poll:
• Bad news: We cannot expect the estimated fraction from Clint’s poll, .75,
to equal the actual population fraction.
• Good news: The estimation procedure that Clint used is unbiased. The
mean of the estimated fraction’s probability distribution equals the actual
population fraction:
Mean[EstFrac] = Actual Population Fraction = ActFrac
The estimation procedure does not systematically underestimate or
overestimate the actual population fraction. If the probability distribution
is symmetric, the chances that the estimated fraction will be less than the
actual population fraction equal the chances that the estimated fraction
will be greater.
We shall use the Polling Simulation to illustrate the importance of the probability
distribution’s variance.
In addition to specifying the actual population fraction and the sample size, the
simulation includes the From-To lists.
As before, the actual population fraction equals .50 by default. Select .450 from
the “From” list and .550 from the “To” list. The simulation will now calculate the
percent of the repetitions in which the numerical value of the estimated fraction
lies within the .450-.550 interval. Since we have specified the actual population
fraction to be .50, the simulation will report the percent of repetitions in which the
numerical value of the estimate fraction lies within .05 of the actual fraction.
Initially, a sample size of 25 is selected. Note that the Pause checkbox is cleared.
Click Start and then after many, many repetitions, click Stop. Next, consider
sample sizes of 100 and 400. Table 3.2 reports the results for the three sample
sizes:
When the sample size is 25, the numerical value of the estimated fraction
falls within .05 of the actual population fraction in about 39 percent of the
repetitions. When the sample size is 100, the numerical value of the estimated
fraction falls within .05 of the actual population fraction in about 69 percent of the
repetitions. When the sample size is 400, the numerical value of the estimated
fraction falls within .05 of the actual population fraction in about 95 percent of the
repetitions.
95%
39% 69%
The variance plays the key role here. When the variance is large, the
distribution is “spread out”; the numerical value of the estimated fraction falls
12
within .05 of the actual fraction relatively infrequently. On the other hand, when
the variance is small, the distribution is tightly “cropped” around the actual
population fraction, .50; consequently, the numerical value of the estimated
fraction falls within .05 of the actual population fraction more frequently.
13
Interval Estimates
Since we are focusing on the interval from .450 to .550 and the actual population
fraction is specified as .50, we can enter .05 in the first blank:
Interval Estimate Question: What is the probability that the numerical value
of the estimated fraction, EstFrac, from one repetition of the experiment lies
within .05 of the actual population fraction, ActFrac? ______
.95
.39 .69
As the sample size becomes larger, it becomes more likely that the estimated
fraction resulting from a single poll will be close to the actual population fraction.
This is consistent with our intuition, is it not? When more people are polled we
have more confidence that the estimated fraction will be close to the actual value.
You may have noticed that the distributions of the numerical values produced by
the simulation for the samples of 25, 100, and 400 look like bell shaped curves.
Although we shall not provide a proof, it can be shown that as the sample size
increases, the distribution gradually approaches what mathematicians and
statisticians call the normal distribution. Formally, this result is known as the
Central Limit Theorem.
We shall illustrate the Central Limit Theorem by using our Opinion Poll
Simulation. Again, let the actual population fraction equal .50 and consider three
different sample sizes: 25, 100, and 400. In each case, we shall use our simulation
to calculate interval estimates for 1, 2, and 3 standard deviations around the mean.
When the sample size equals 25, the standard deviation is .10. Since the
distribution mean equals the actual population fraction, .50, 1 standard deviation
around the mean would be from .400 to .600, 2 standard deviations from .300 to
.700, and 3 standard deviations from .200 to .800. In each case, specify the
appropriate From and To values and be certain that the Pause checkbox is cleared.
Click Start and then after many, many repetitions click Stop. The simulation
results are reported in Table 3.4:
Interval: Simulation:
Standard Deviations within Percent of repetitions
Random Variable’s Mean From To within interval
1 .400 .600 69.25%
2 .300 .700 96.26%
3 .200 .800 99.85%
Table 3.4: Interval Percentages for a Sample Size of 25
When the sample size equals 100, the standard deviation is .05. Since the
distribution mean equals the actual population fraction, .50, 1 standard
deviation around the mean would be from .450 to .550, 2 standard deviations
from .400 to .600, and 3 standard deviations from .350 to .650.
Interval: Simulation:
Standard Deviations within Percent of repetitions
Random Variable’s Mean From To within interval
1 .450 .550 68.50%
2 .400 .600 95.64%
3 .350 .650 99.77%
Table 3.5: Interval Percentages for a Sample Size of 100
17
Interval: Simulation:
Standard Deviations within Percent of repetitions
Distribution Mean From To within interval
1 .475 .525 68.29%
2 .450 .550 95.49%
3 .425 .575 99.73%
Table 3.6: Interval Percentages for a Sample Size of 400
Sample Sizes
Mean[EstFrac] 25 100 400
SD[EstFrac] .100 .050 .025
Interval: 1 SD
From-To Values .400-.600 .450-.550 .475-.525
Percent of Repetitions 69.25% 68.50% 68.29%
Interval: 2 SD
From-To Values .300 .700 .400-.600 .450-.550
Percent of Repetitions 96.26% 95.64% 95.49%
Interval: 3 SD
From-To Values 200-.800 .350-.650 .425-.575
Percent of Repetitions 99.85% 99.77% 99.73%
Table 3.7: Summary of Interval Percentages Results
18
Clearly, standard deviations play a crucial and consistent role here. Regardless of
the sample size, approximately 68 or 69 percent of the repetitions fall within one
standard deviation of the mean, approximately 95 or 96 percent within two
standard deviations, and more than 99 percent within three. The normal
distribution exploits the key role played by standard deviations.
v
Distribution
Mean
Figure 3.6: Normal Distribution
v
z SD’s
Distribution
Mean
Figure 3.7: Normal Distribution Right Tail Probabilities
In words, z tells us by how many standard deviations the value lies from the
mean. If the value of the random variable equals the mean, z equals .0; if the value
is one standard deviation above the mean, z equals 1.0; if the value is two
standard deviations above the mean, z equals 2.0; etc.
z. 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
..
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6... 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 ... 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
Table 3.8: Right Tail Probabilities for the Normal Distribution
20
The row of the normal distribution table specifies the z value’s whole number and
its tenths; the column the z value’s hundredths. The numbers within the body of
the table estimate the probability that the random variable lies more than z
standard deviations above its mean.
For purposes of illustration, suppose that we want to use the normal distribution to
calculate the probability that the estimated fraction from one repetition of the
experiment would fall between .525 and .575 when the actual population fraction
was .50 and the sample size was 100. We begin by calculating the probability
distribution’s mean and standard deviation:
1
Sample Size = T = 100 Actual Population Fraction = ActFrac = = .50
2
1 1 1
×
1 p (1 − p) 2 2 1
Mean[ EstFrac ] = p = = .50 Var[ EstFrac ] = = = 4 =
2 T 100 100 400
1 1
SD[ EstFrac ] = Var[ EstFrac ] = = = .05
400 20
21
Probability Distribution
Prob[EstFrac Between .525 and .575]
To calculate the probability that the estimated fraction lies between .525
and .575, we first calculate the z-values for .525 and .575; that is, we calculate the
number of standard deviations that .525 and .575 lie from the mean. Since the
mean equals .500 and the standard deviation equals .05,
z 0.00 0.01
0.4 0.3446 0.3409
0.5 0.3085 0.3050
0.6 0.2743 0.2709
Next, consider the right tail probabilities for the normal distribution. When
we use this table we implicitly assume that the normal distribution accurately
describes the estimated fraction’s probability distribution. For the moment,
assume that this is true. The entry corresponding to z equaling .50 is .3085; this
tells us that the probability that the estimated fraction lies above .525 is .3085.
Prob[EstFrac Greater Than .525] = .3085
Probability Distribution
.3085
EstFrac
.50 SD
.50 .525
Figure 3.9a: Probability of EstFrac Greater than .525
23
The entry corresponding to z equaling 1.50 is .0668; this tells us that the
probability that the estimated fraction lies above .525 is .0668. These
probabilities are illustrated in Figures 3.9a and 3.9b.
Prob[EstFrac Greater Than .575] = .0668
Probability Distribution
.0668
EstFrac
1.5 SD’s
.50 .575
Figure 3.9b: Probability of EstFrac Greater than .575
It is now easy to calculate the probability that the estimated fraction will lie
between .525 and .575:
Prob[EstFrac Between .525 and .575]
Just subtract the probability that the estimated fraction will be greater than .525
from the probability that the estimated fraction will be greater than .575:
Prob[EstFrac Greater Than .525] = .3085
Prob[EstFrac Greater Than .575] = .0668
Prob[EstFrac Between .525 and .575] = .2417
With a sample size of 100, the probability that EstFrac will lie between .525 and
.575 equals .2417. This of course assumes that the normal distribution describes
EstFrac’s probability distribution accurately.
24
We can now calculate the probability of being within one, two, and three
standard deviations of the mean by reviewing two important properties of the
normal distribution:
• The normal distribution is symmetric about its mean.
• The area beneath the normal distribution equals 1.0.
We begin with the one standard deviation (SD) case. Table 3.10 reports that the
right hand tail probability for z = 1.00 equals .1587:
Prob[1 SD Above] = .1587
We shall now use that to calculate the probability of being within 1 standard
deviation of the mean:
• Since the normal distribution is symmetric, the probability of being more
than one standard deviation above the mean equals the probability of
being more than one standard deviation below the mean.
Prob[1 SD Below] = Prob[1 SD Above]
= .1587
• Since the area beneath the normal distribution equals 1.0, the probability
of being within one standard deviation of the mean equals 1.0 less the sum
of the probabilities of being more than one standard deviation above the
mean and the probalibity of being more than one standard deviation below
the mean.
Prob[1 SD Within] = 1.0 − (Prob[1 SD Below] + Prob[1 SD Above])
= 1.0 − .1587 + .1587
= 1.0 − .3174
= .6826
25
Probability Distribution
Prob[Within 1 SD] = .6826
1 SD 1 SD
Distribution
Mean
Figure 3.10: Normal Distributions Calculations
Table 3.11 compares the percentages calculated from our simulations with the
percentages that would be predicted by the normal distribution.
Simulation:
Percent of Repetitions
Interval: Within Interval Normal
Standard Deviations within Sample Size Distribution
Distribution Mean 25 100 400 Percentages
1 69.25% 68.50% 68.29% 68.26%
2 96.26% 95.64% 95.49% 95.44%
3 99.85% 99.77% 99.74% 99.74%
Table 3.11: Interval Percentages Results and Normal Distribution Percentages
Table 3.11 reveals that the normal distribution percentages are good
approximations of the simulation percentages. Furthermore, as the sample size
increases, the percent of repetitions within each interval gets closer and closer to
the normal distribution percentages. This is precisely what the Central Limit
Theorem concludes. We use the normal distribution to calculate interval estimates
because it provides estimates that are close to the actual values.
Table 3.12 illustrates what are sometimes called the normal distribution’s “rules
of thumb”:
More specifically, he wrote the name of each student on a 3×5 card and repeated
the following procedure 16 times:
• Thoroughly shuffle the cards.
• Randomly draw one card.
• Ask that individual if he/she supports Clint and record the answer.
• Replace the card.
After conducting his poll, Clint learns that 12 of the 16 students polled support
him. That is, the estimated fraction of the population supporting Clint is .75:
12 3
Estimated Fraction of the Population Supporting Clint: EstFrac = = = .75
16 4
Based on the results of the poll, it looks like Clint is ahead. But how confident
should he be that this is in fact true? We shall address this question in the next
chapter.
28
z
0
Figure 3.11: Right Tail Probability
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
29
2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
SD[ x ] 2π