Professional Documents
Culture Documents
This Unit covers some of the topics of Chapters 8 and 9 of Quantitative Approaches in
Business Studies. The reader may wish to download the files PROBABILITY.XLS,
NORMALDISTA.XLS, NORMALDISTB.XLS and NORMALDISTC.XLS from the web site
containing this supplement. In this Unit we will use the Data Analysis tool to generate
random numbers and we will explore the behaviour of these numbers.
We can see that the probabilities for the face values 3 and 6 (these were chosen
arbitrarily) do seem to tend to the theoretical value of 1/6 or 0.166667 as the number of
throws increases. We may use this experiment to indicate that Excel has indeed generated
a more-or-less uniform set of random numbers.
A B C D E F G H I J
1 Probability
2 Probability
3 Random Dice Face value
4 1.847407 2 n 1 2 3 4 5 6
5 4.107181 5 5 0.000 0.400 0.200 0.000 0.400 0.000
6 1.524766 2 10 0.200 0.200 0.200 0.100 0.300 0.000
7 4.306772 5 20 0.150 0.200 0.200 0.150 0.150 0.150
8 2.977935 3 30 0.167 0.233 0.167 0.167 0.133 0.133
9 0.547685 1 40 0.175 0.200 0.175 0.175 0.175 0.100
10 4.185553 5 50 0.160 0.200 0.160 0.140 0.220 0.120
11 3.93524 4 100 0.160 0.150 0.180 0.150 0.200 0.160
12 0.585772 1 200 0.175 0.145 0.140 0.170 0.190 0.180
13 2.27131 3 500 0.162 0.144 0.174 0.180 0.176 0.164
14 3.292337 4 1000 0.166 0.149 0.162 0.181 0.164 0.178
15 0.028565 1
16 1.325724 2 1 0.166667
17 5.20603 6 1000 0.166667
18 2.379711 3
19 5.026032 6
0.25
20 3.754143 4
21 5.061373 6
0.20
22 1.185461 2
23 2.114933 3
Probability
24 0.104373 1 0.15
25 4.749168 5
26 1.13657 2 0.10
27 1.525864 2
28 3.152989 4 0.05
29 0.536515 1
30 2.642293 3 0.00
31 1.4008 2 1 10 100 1000
32 5.363323 6
N
33 3.365215 4
34 4.884304 5
Figure 1
Figure 2
Statistics Distributions 3
Figure 3 shows the worksheet on the TwoCurve sheet. The blue curve shows a standard
normal distribution (mean = 0 and standard deviation = 1). On the y-axis we have
probability values and on the x-axis we have z (measurement) values. Each point on the
curve corresponds to the probability p that a measurement will yield a particular z value
(value on the x-axis.) The probability is expressed as a number from 0 to 1. Of course, we
could also talk about percentage probabilities just multiply p by 100. It can be shown that
the area under the curve must be one since a measurement must result in some value.
Note how the probability is essentially zero for any value z that is greater than 3 standard
deviations away from the mean on either side.
The two parameters of the red curve may be changed by using the spinner. You will see
the shape and position of the red curve alter. Just click on a spinner arrow to increase or
decrease the Mean and/or StdDev.
If you set the mean for the adjustable curve to zero and experiment with the standard
deviation (s), you will see that as the as s increases the curve gets wider while its height
decreases. The area, of course, remains constant.
Figure 3
On the second sheet (AboutM) you can select z1 and z2, one from each side of the mean,
and find the probability that a measurement z will be within the range see Figure 4. You
will see this probability written in textbooks as P(z1 < z < z2).
4 Statistical Distributions
The slider objects are used in one of three ways: (1) drag the slider bar, (2) for large
jumps, click on the spaces either side of the slider bar, and (3) for more precise control,
click either arrow on the slider object.
Figure 4
As shown in Figure 4, 95.45% of all observations lie within two standard deviations of the
mean. What are the corresponding percentages for 1 standard deviation of the mean and
for -3 < z < 3?
The sheet AnyP (see Figure 5) is similar to the previous sheet except that the z value may
take any values. To create a different visual effect, the area is plotted as a series of
columns.
Figure 5
Statistics Distributions 5
Figure 6
Also in the workbook NORMALDISTA you will find the sheet Student which lets you
compare the standard normal distribution with the Student distribution with varying degrees
of freedom see Figure 6.
The next Unit has some calculations using the Student distribution.
Sample Calculations
We will show how to perform some simple calculations involving the normal distribution.
These will help the reader become familiar with the Excel function NORMDIST and its
converse NORMINV. The syntax for the former is NORMDIST(x, mean, standard deviation,
cumulative), where x is the measured value, mean and standard deviation have obvious
meanings, and cumulative is a logical value (i.e. you may use TRUE or FALSE ,or 1 or 0,
for its value). A TRUE value returns the cumulative probability while a FALSE value returns
the value of the probability function.
There are also the functions NORMSDIST and NORMSINV which are used only for the
istandard normal distribution. These, of course, do not required the mean and standard
distibution arguments since the standard normal distribution these are constant at 0 and
1, respectively.
Each problem will be solved in three ways: (1) using the worksheet AnyProb or AboutM,
(2) using Appendix 5 in Quantitative Approaches in Business Studies , and (3) using the
NORMDIST or NORMINV function. In this way the use of the two Excel functions should
become clearer.
a) For a standard normal distribution, find the area under the curve to the left of z = 1.84.
i) The worksheet AnyProb may be used to determine the answer see Figure 7. You may
be concerned that this computes the area from -4 to 1.84 so part of the left tail is
missing. We will see that this is insignificant. This method yields 96.71%.
6 Statistical Distributions
A B C D E F G H
1 z1 -4 Probality that Z lies between Z1 and Z2
2 z2 1.84 96.71%
3
4 0.45
5 0.40
6
0.35
7
8 0.30
9 0.25
P
10
0.20
11
12 0.15
13 0.10
14
0.05
15
16 0.00
17 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
z
18
Figure 7
ii) Set up a worksheet as shown in Figure 8 and you can answer all question of this type
by entering the appropriate value in B5. For z = 1.84 we get 0.97116 which agrees with
the result above.
A B C D E
3 Question (a)
4 z 1.84
5 NORMDIST 0.967116 =NORMDIST(B4,0,1,TRUE)
6
Figure 8
iii) Look up the value 1.84 in Appendix 5 of Quantitative Approaches in Business Studies
and you should get 0.0329. Do not panic! The difference is explainable. The Appendix
lists values for areas to the right as shown in the diagram in its heading. Since the total
area under the curve is 1, it follows that for any given x value: Area(left of x) +
Area(right of x) = 1. So the area to the left is 1 - 0.329 = 0.9671. So we do get the same
answer!
It should be apparent by now that when we find an area to the right of x, we are finding the
probability of an observation that is greater than x. Conversely, an area to the left is the
probability of an observation being less than x.
b) How different would our solutions be if z was negative, say -1.84? The problem is now:
what is the area to the left of z = -1.84?
i) The AnyProb worksheet gives 3.29% or 0.0329. Does that value look familiar?
ii) The worksheet function NORMDIST gives the same value, i.e. 0.032884 with the
default format.
iii) Appendix 5 in Quantitative Approaches in Business Studies does not differentiate
between positive and negative values since the normal distribution is symmetrical
about the mean. So again we get 0.0329.
A B C D
7 Question (c)
8 z1 0.86 z2 -1.97
9 NORMDIST 0.805106 NORMDIST 0.024419
10 Difference 0.780686
Figure 9
iii) If you look up the two z values in Appendix 5 the two areas are 0.1949 and 0.0244. The
result we need is 1 - (the sum of these two), or 1 - (0.2193) = 0.7807. If necessary,
draw a diagram to convince yourself that this is the way to proceed.
Note: If you try to enter the text (c) in an Excel cell, it is most likely that the copyright
symbol will be displayed. To overcome this use Tools|AutoCorrect, select the
appropriate entry and click the Delete button.
d) Given a normal distribution with : = 400 and F = 50, what is the probability that x will
have a value greater than 469?
i) To use Appendix 5 of Quantitative Approaches in Business Studies we must convert
x 469 400
the x value to a z-score using z = or z = = 138
. . Then we look up
50
the z value in the Appendix to get 0.0838.
ii) The worksheet solution is shown in Figure 10. Remember that NORMDIST finds the
area to the left of the x value or the probability that the observation will be less than x.
We need the probability of it being greater so we use 1 - NORMDIST.
A B C D E F
12 Question (d)
13 mean 400
14 stdev 50
15 critical value 469
16 P(x<value) 0.916207 =NORMDIST(B15,B13,B14,TRUE)
17 P(x>value) 0.083793 =1-B17
Figure 10
e) For a normal distribution with a mean of 120 and a standard deviation of 12, determine
the value of x for the first 25%.
i) The worksheet solution is shown in Figure 11 which shows an answer of 111.91.
A B C D E F
20 Question (e)
21 mean 120
22 stdev 12
23 P 25%
24 x 111.9061 =NORMINV(B23,B21,B22)
Figure 11
8 Statistical Distributions
ii) To solve this with the AnyProb worksheet we first find what z value will include the first
25%. We do this by setting z1 to -4 and varying z2 until the probability reads 25%, or
as close to that value as we can get. We find a z value of -0.67 gives P = 25.14%. Now
we must convert the z value to an x. We have already met the relationship
x
z= so we may write x = z + . Thus x = -0.6712 + 120 = 112.96. This is
not exactly the same as the first solution because we did not find z corresponding to
exactly 25%.
iii) To solve the problem with Appendix 5 we use a similar hunting process. Look in the
table until you find an area value close to 0.250. Did you find 0.2514 with a z value of
0.67? We complete the solution as in (ii) above. However, you must realise that you
need the left tail of the curve so use -0.67 as the z value.
In methods (ii) and (iii) interpolation could be used. We have these two data points
P(z=0.67) = 0.2514 and P(z=0.68) = 0.2483. We can say that the midpoint will be
approximately P(z=0.675) = 0.2498.5 which is closer to the required 0.25 value. With
0.675 as the value of z, we find x = 111.90. This agrees with the worksheet approach.
A B C D E F G H I J
1 Random Normal Distribution Samples
2 average
3 -0.16673 0.328766 -0.28371 -1.20275 -0.50711 1.856251 0.61199 -0.45234 -1.97114 -0.19853
4 0.226639 -1.68125 -0.51268 1.231635 0.912933 -0.99342 0.469188 -0.7506 0.15325 -0.10492
5 -0.63241 0.439417 0.028652 -0.00922 -0.15163 -1.11076 -0.45031 1.230819 -0.32078 -0.10847
6 -1.45999 2.6817 -0.57594 -0.22664 -1.38391 -0.3894 0.983568 0.274481 0.87663 0.086722
7 -0.0373 0.394685 1.75377 1.381718 2.097477 0.809821 -0.19418 0.313062 -0.21152 0.700837
Figure 12
Statistics Distributions 9
A B C D E F G H I J K L M
1 sample size 1 9
14 30
2 bin frequency frequency
3 -1.8 3 0 n=1 n=9
12 25
4 -1.6 0 0
5 -1.4 4 0 10
6 -1.2 4 0 20
7 -1 3 0 8
8 -0.8 1 1 15
9 -0.6 13 5 6
10 -0.4 12 12 10
11 -0.2 6 13 4
12 0 4 27 5
13 0.2 6 21 2
14 0.4 12 12
0 0
15 0.6 6 6
-1.8 -1.4 -1 -0.6 -0.2 0.2 0.6 1 1.4 1.8 -1.8 -1.4 -1 -0.6 -0.2 0.2 0.6 1 1.4 1.8
16 0.8 3 3
17 1 4 0
18 1.2 1 0
19 1.4 5 0
20 1.6 5 0
21 1.8 2 0
22
Figure 13
The reader may wish to generate a new data set with the Random Number Generator
found in the Data Analysis tool (see Figure 14).
Figure 14
Solution: From the Central Limit Theory, the sampling distribution of the average ( x ) will
be approximately normal with x = 800 and x = 40 / 16 = 10 To find z corresponding
775 800
to x = 775, use z = = 2.5 .
10
i) Again we can use the AnyProb worksheet. Strictly speaking we need the cumulative
probability for the range -4 to -2.5 but we will settle for -4 to -2.5. Set the spinners to
these values to get an answer of 0.62% or 0.0062.
ii) Look up the absolute z in Appendix 5 and you find the value 0.0062. This is the
10 Statistical Distributions
probability P(z>2.5) but because of the symmetry of the curve it is also P(z<-2.5).
iii) The answer may be readily found with the NORMDIST function as shown in Figure 15.
The result for P(<x) in E5 is 0.00621 or 0.62%.
A B C D E F G H I
1 Calculations with samples.
2
3 Population values Probability calculation
4 mean 800 critical value 775
5 std dev 40 P(<x) 0.62% =NORMDIST(B13,B4,B9,TRUE)
6 P(>x) 99.38% =1-B14
7 Sample values
8 Size 16
9 Error of mean 10 Interval calculation
10 P 95%
11 Tail size 2.50% =(1-E10)/2
=B5/SQRT(B8)
12 x(low) 780.40 =NORMINV(E11,B4,B9)
13 x(high) 819.60 =NORMINV(1-E11,B4,B9)
14
Figure 15
b) If we measure a number of averages for samples of size 16, what interval around the
population mean will include 95% of the sample means?
i) If we want 95% distributed about the mean then there will be 47.5% on each side. We
can use the worksheet AboutM or AnyProb to find that a z value of 1.06 yields this
percentage see Figure 16. Of course, on the other side of the mean, a z of -1.96 will
encompass 47.5%. The corresponding x values are obtained from:
xU = + z and x L = z
n n
These give xU = 819.6 and xL = 780.4.
Figure 16
ii) Recall that Appendix 5 gives us areas to the right of a z value the white area on the
right side of the curve in Figure 16. So we need to search in the table an area value of
0.5 - 0.475 z = 1.96 and we finish the problem as before.
iii) A spreadsheet solution is shown in Figure 15 above where we use the NORMINV
function. The same results are obtained.
Statistics Distributions 11
Suppose you have 100 items for which it is possible to measure a quantity (weight,
diameter, etc.) with one of three progressively coarser devices. On the sheet Tables (see
Figure 17) the range A7:B39 is a table listing the frequency of specific x values in a
measured sample when the increment for the bins is 0.005 units. The ranges F7:G23 and
K7:L15 are similar tables when the bin increments are 0.01 and 0.02, respectively. Note
that the maxima for these three tables are approximately 8, 16 and 32, respectively. It is
not surprising, therefore, that we shall need to normalize the curve produced by the
NORMDIST function.
We will assume that the distribution of these measurements are normal (i.e. Gaussian).
The mean and standard deviation can be computed from such tables using the
relationships:
N N
= xi Pi and = (x i ) 2 Pi
i i
where Pi is the probability for measurement xi .
Our data is expressed in terms of percentage frequency rather than probability but we can
use the simple relationship that Pi = fi /100.
To compute the standard deviation we need the value of ( xi ) Pi ; this is the purpose
2
of the third column in each table. The formula in C8 is =(A8-$B$3)^2*B8 and this is copied
down to row 23. The data in the third column is summed to give the standard deviation. So
in B4 we use =SQRT(SUM(C8:C39)/100). Analogous formulas are used in the other
tables.
The fourth column in each table is used to compute the normal distribution values so as
to be able to display a histogram with a superimposed normal curve. In D8 we have
=NORMDIST(A8,$B$3,$B$4,FALSE)*$D$3. Carefully note the use of absolute cell
references for the mean $B$3, standard deviation $B$4 and normalization factor $D$3.
This formula is copied down to row 39.
The user may adjust the value of the normalization factor to give a total in D40 of
approximately 100. There is no merit in attempting great precision here. Now we may
construct a combination chart with the data from columns A, B and D. Similar methods are
used with the other two tables.
12 Statistical Distributions
A B C D E F G H I J K L M N
1 Normal Gaussian Distribution
2
3 mean 0.494961 norm 0.5 mean 0.497482 norm 1 mean 0.502378 norm 2
4 sdt 0.024480 sdt 0.024531 sdt 0.025249
5
6
7 x freq (%) diff sq* prop normdist x freq (%) diff sq* prop normdist x freq (%) diff sq* prop normdist
8 0.425 0.00 0.0000 0.14 0.43 0.01 0.0000 0.37 0.44 1.03 0.0040 1.49
9 0.430 0.01 0.0000 0.24 0.44 1.02 0.0034 1.04 0.46 9.18 0.0165 7.73
10 0.435 0.30 0.0011 0.41 0.45 3.06 0.0069 2.50 0.48 21.40 0.0107 21.34
11 0.440 0.72 0.0022 0.66 0.46 6.12 0.0086 5.06 0.50 30.40 0.0002 31.46
12 0.445 1.50 0.0037 1.02 0.47 7.62 0.0058 8.68 0.52 23.80 0.0074 24.77
13 0.450 1.56 0.0032 1.51 0.48 13.78 0.0042 12.62 0.54 11.65 0.0165 10.41
14 0.455 2.70 0.0043 2.15 0.49 15.10 0.0008 15.52 0.56 2.52 0.0084 2.34
15 0.460 3.42 0.0042 2.94 0.50 15.30 0.0001 16.18 0.58 0.02 0.0001 0.28
16 0.465 3.08 0.0028 3.85 0.51 13.60 0.0021 14.28 total 100 99.82
17 0.470 4.54 0.0028 4.85 0.52 10.20 0.0052 10.67
18 0.475 6.78 0.0027 5.84 0.53 7.55 0.0080 6.76
19 0.480 7.00 0.0016 6.76 0.54 4.10 0.0074 3.62
20 0.485 7.90 0.0008 7.50 0.55 2.01 0.0055 1.64
21 0.490 7.20 0.0002 7.98 0.56 0.51 0.0020 0.63
22 0.495 7.80 0.0000 8.15 0.57 0.01 0.0001 0.21
23 0.500 7.50 0.0002 7.98 0.58 0.01 0.0001 0.06
24 0.505 6.70 0.0007 7.49 total 100 99.84
25 0.510 6.90 0.0016 6.75
26 0.515 5.90 0.0024 5.83
27 0.520 4.30 0.0027 4.83
28 0.525 4.40 0.0040 3.84
29 0.530 3.15 0.0039 2.93
30 0.535 2.00 0.0032 2.14
31 0.540 2.10 0.0043 1.50
32 0.545 1.10 0.0028 1.01
33 0.550 0.91 0.0028 0.65
34 0.555 0.26 0.0009 0.40
35 0.560 0.25 0.0011 0.24
36 0.565 0.00 0.0000 0.14
37 0.570 0.01 0.0001 0.07
38 0.575 0.01 0.0001 0.04
39 0.580 0.00 0.0000 0.02
40 100 99.83
41
42
43 9.00 18.00 35.00
44 8.00 16.00
45 30.00
7.00 14.00
46
25.00
47 6.00 12.00
48 5.00 20.00
10.00
49
50 4.00 8.00 15.00
51 3.00 6.00
52 10.00
2.00 4.00
53
54 1.00 5.00
2.00
55 0.00 0.00 0.00
56
0.425
0.440
0.455
0.470
0.485
0.500
0.515
0.530
0.545
0.560
0.575
0.44
0.46
0.48
0.50
0.52
0.54
0.56
0.58
0.43
0.45
0.47
0.49
0.51
0.53
0.55
0.57
57
58
Figure 17
One will find in the Excel literature two methods to fit histogram data to a normal curve that
reportedly improve on the method shown above. This author has serious doubts about the
supposed improvement in the results. It is doubtful if the apparent improvements in
precision have any statistical significance. However, we will look at the two methods
briefly.
The sheet Solver1 is shown in Figure 18. The table A8:B24 contains the same data as the
middle table in the previous sheet. As before, we use the NORMDIST function to produce
the normal curve. We wish to use Solver (see Unit 3) to vary the three quantities mean,
standard deviation and normalization factor in such a way as to make the normal curve
agree with frequency data. If for a given measured value (xi ), the experimental frequency
is fi and the predicted is gi, then minimizing the quantity ( f i gi ) 2 will give the so-called
least-squares fit. We may compute this sum of squares of residuals (SSR) in B6 with the
SUMXMY2 function as shown in Figure 18.
Statistics Distributions 13
A B C D E F G H I J K L
1 Normal Gaussian Distribution
2
3 mean 0.497124 16
4 sdt 0.025795
5 norm 1.012054 =SUMXMY2(B9:B24,C9:C24)
14
6 SSR 4.937021
7
12
8 x freq (%) normdist
9 0.43 0.01 0.53
10
Frequency (%)
10 0.44 1.02 1.35 =NORMDIST(A9,$B$3
11 0.45 3.06 2.95 ,$B$4,FALSE)*$B$5
12 0.46 6.12 5.56 8
13 0.47 7.62 9.00 Copied to row 24
14 0.48 13.78 12.56 6
15 0.49 15.10 15.07
16 0.50 15.30 15.56
4
17 0.51 13.60 13.82
18 0.52 10.20 10.56
19 0.53 7.55 6.95 2
20 0.54 4.10 3.93
21 0.55 2.01 1.91 0
22
0.43
0.45
0.47
0.49
0.51
0.53
0.55
0.57
0.56 0.51 0.80
23 0.57 0.01 0.29
24 0.58 0.01 0.09 x
25 total 100 100.93
Figure 18
Solver is set up to minimize the SSR value in B6 by varying the mean, standard deviation
and normalization factor i.e. cells B3, B4 and B5. As a precaution, the constraint
B4>=0.001 is used to ensure that Solver never tries to make the standard deviation zero
because the NORMDIST functions would then return error values and terminate Solvers
activity. The settings for Solver are shown in Figure 19. To aid Solver in finding a solution,
you may wish to start with the values found on the Tables sheet.
Figure 19
A further refinement of the worksheet for use with Solver is given in Solver2 (see Figure
20). The formulas in column C are somewhat more complicated. This approach may give
better results than the others when the increments of the x values are large. As before,
Solver is used to minimize the SSR in B6 by varying the mean and the standard deviation.
14 Statistical Distributions
A B C D E F G H
1 Normal Gaussian Distribution These cells have been named l
2 B3 as Mean
3 mean 0.49216 B4 as std
4 std 0.02552 B5 as norm
5 norm 100.95882
6 SSR 5.2485449
7 =NORMDIST(A8,mean,std,TRUE)*norm
8 x freq (%) normdist
9 0.43 0.01 0.750283
10 0.44 1.02 1.317731
11 0.45 3.06 2.905758 =(NORMDIST(A9,mean,std,TRUE)-
12 0.46 6.12 5.505988 NORMDIST(A8,mean,std,TRUE))*norm
13 0.47 7.62 8.965295
14 0.48 13.78 12.54459 This is copied down to row 22
15 0.49 15.10 15.08387
16 0.50 15.30 15.58608
17 0.51 13.60 13.83979
18 0.52 10.20 10.56056
19 0.53 7.55 6.924826
20 0.54 4.10 3.902014
21 0.55 2.01 1.889366
22 0.56 0.51 0.786109
23 0.57 0.01 0.28105
24 0.58 0.01 0.115506
25 total 100 101 =(1-NORMDIST(A22,mean, std,TRUE))*norm
26
Figure 20