Professional Documents
Culture Documents
Preface
You may already be familiar with Excel, the versatile spreadsheet program that is used widely in
business and management analysis of nearly everything from accounting and finance to production
and marketing. Much of the success of spreadsheets is due to the complete flexibility you have in
putting text, numbers, and graphics anywhere on your computer screen (and to have formulas
update themselves automatically). In addition, Excel includes a sophisticated set of statistical tools
and is therefore a natural computing environment for a business statistics course.
The purpose of this Excel Guide is to help you learn statistics by working through real-data
examples from the textbook Practical Business Statistics by Andrew F. Siegel. You don't need to
know a lot about computers when you begin. Because the material is presented “from scratch,”
once you launch into Excel, you will be able to get results right away. Just follow along and try
commands like the ones you see presented and discussed here.
This Excel Guide works with Excel only. If you would like to enhance the statistical capabilities
of Excel, we recommend StatPad , an Excel add-in which also provides non-technical
explanations of the results of your statistical analysis. Many statistical methods explained here are
much easier to do in StatPad. When you use StatPad, it seems as though the added conveniences
were built into Excel itself, so there is no need to leave the familiar spreadsheet environment of
Excel. StatPad comes with Practical Business Statistics by Andrew F. Siegel, published by
Irwin/McGraw-Hill.
The Excel Guide begins with an introductory chapter to tell you about Excel and get you up-and-
running with the basics. After that, the chapters here closely follow the same sequence as the
chapters of the textbook Practical Business Statistics, beginning with the Histograms chapter
(Chapter 3). While this Excel Guide gives enough information for you to see how to use the
computer, you may wish to keep Practical Business Statistics handy for reference and further
details about the examples because some of them are taken directly from the textbook. Once
you've seen how to work the textbook examples, it should be straightforward for you to do
homework and projects.
Each chapter contains discussion, examples, explanations, and the results of actual Excel sessions.
Don’t forget that many Excel data files are available with Practical Business Statistics, so that
there is no need to retype any data from your textbook.
In this case, the function starts, as always with an equals sign “=”. To enter the formula that adds
Jim’s and Adrian’s sales together, you might either type the formula directly and hit Enter, or
construct it by pointing to cells as follows:
1. Select cell C6 by clicking on it or moving to it with the cursor keys
2. Hit the = key
4 Excel Guide Introduction
3. Rules of arithmetic say that these operations are performed in the following order:
a. Exponentiation “^” is done first
b. Multiplication “*” and division “/” happen next. You will want to use parentheses so
that equations with multiplication after division like 2 / (3 * 4) are correctly evaluated
c. Addition “+” and subtraction “-“ are done last. Thus 6 + 4 * 2 ^ 3 is evaluated as 6 + 4
* 8, which is 6 + 32, which is 38.
d. If you want something to happen first, put it in parentheses. For example, (2 + 3) * 4
makes the addition happen before the multiplication.
Introduction Excel Guide 5
e. If you have a minus sign that is not subtracting, be careful! It happens even before
exponentiation! Thus -2 ^ 4 is evaluated as (-2) ^ 4 which is 16. If you wanted -(2 ^ 4)
you would need to include the parentheses to make the exponentiation happen first and
to get -16 as the answer.
4. Percentages are used as if they were already divided by 100. For example, if you enter a
percent like “20%” directly into a cell, its value is taken to be 0.20. This makes it easy, for
example, to find 20% of a number: you simply multiply the number by 20%.
This is a nice way to insert a function into the worksheet because Excel will help you fill in the
details in the correct order, so that you don’t have to memorize what goes where, which is
especially useful with functions that need more than one piece of information. To insert the
AVERAGE function, click OK to see a dialog box like this
6 Excel Guide Introduction
that is ready for you to select one or more cells by clicking or dragging the mouse across cells
with the numbers you want to average. You may move this dialog box out of the way by dragging
most anywhere on it. Here is how it looks after dragging down cells B2 through B7:
When you click OK, the result is placed into the worksheet in the cell that was selected when you
first chose Insert/Function from the menu. Here is the result:
You could achieve exactly the same result by selecting cell B9 and typing “=AVERAGE(B2:B7)”
without the quotation marks and then hitting Enter. Another way to do this is to type
“=AVERAGE(” without the quotation marks, then use the mouse to drag down cells B2 through
B7, then type “)” without the quotation marks and hit Enter.
Another way to select these cells would be to use the cursor keys to move to one corner, say C70.
Then hold the Shift key while you move right → twice. Then hit the End key (with or without
Shift). Finally, hold the Shift key while you hit the down arrow ↓ . When you use the End key,
the next movement (left, right, up, or down) will go to the end of the row or column you are
working in. Holding the Shift key expands the selection.
2. Choose Insert/Name/Define from the main menu system. Because the label is at the top
and you have selected cells below it, Excel knows what you want to do and proposes to
give the range name “Sales” to the data in cells D3 through D6. Here is how it should
look
you can also use this Define Name dialog box to see what other names are defined and to
check that they refer to the correct worksheet range.
You cannot just choose any name for a range. The first character must be a letter or the
underscore character “_”. The other characters can be letters, numbers, periods, and
underscore characters, but not spaces (use underscores instead). Names cannot be the
same as a cell reference (e.g. C16, R3C5, R and C are not allowed). There is no
distinction between uppercase and lowercase letters, so “Sales”, “sales”, “SALES”, and
“sALeS” all refer to the same worksheet cells.
3. When you choose OK, the range name is assigned. Whenever you select this range, its
name (“Sales”) will appear in the name box near the top left corner of the worksheet, at
the left end of the formula bar. You can select this range quickly by choosing its name in
the name box.
Introduction Excel Guide 9
Another nice thing the fill handle can do is automatically copy a selected cell’s formula down a
column by dragging the fill handle as far as you want. If the cell is next to a column with data in it,
then double-clicking the fill handle will automatically copy the cell’s contents down the column!
10 Excel Guide Introduction
Using UNDO
Thank goodness for UNDO! No need to worry if you have just erased your precious data by
accidentally hitting the delete key, so long as you react reasonably quickly. Just choose Edit/Undo
from the main menu, and your valuable data will reappear as if by magic. Excel now has multiple
UNDO levels, so that you can undo more than one action.
To show numbers as percentages with one decimal place, you would use
12 Excel Guide Introduction
To sort it by revenues, you may either start by selecting A6 through C9, or let Excel do it for you
when you choose Data/Sort from the main menu. Here is how it should look as you prepare to
sort by Revenues, with both columns of data selected along with the identifying labels.
Introduction Excel Guide 13
When you choose OK, the cities are sorted in order by revenues, and their expenses have
correctly remained associated with them:
Making a Chart
Here is how to create a chart in Excel.
1. Select your data, either one column or multiple columns. In some cases you will want to
select the label at the top of the column for Excel to use.
2. Choose Insert/Chart from the main menu or click on the Chart Wizard icon on the
toolbar. The dialog box gives you many chart options:
14 Excel Guide Introduction
Of particular interest in statistics are the XY (Scatter) used for bivariate and multivariate
data and the Line chart used in time series analysis. Creating a histogram will require some
computation before the chart is created. Details on creating particular types of charts will
be covered as situations arise in this Excel Guide.
3. As you click on Next > to go through the sequence of dialog boxes, you will have the
option to add titles, as well as to add or take away gridlines or legends. If you choose to
put the chart back “As Object in” your worksheet, you will be able to move and size it
near the data it came from.
4. In addition, if you don’t like the gray background in a chart, double-click on it and set the
Patterns in the Area to None. To change the size of the chart, drag a sizing handle (which
appear in the corners and in the middle of the sides when you click just inside the edge of
the chart). To move the chart to a different place in the worksheet, drag just inside the
edge but not on a sizing handle. To add or change titles, right-click just inside the chart,
select Chart Options from the little pop-up menu, and choose the Titles tab. To change the
font size, right-click on the item (a title or an axis) and choose Format from the little pop-
up menu.
To find out if the Data Analysis ToolPak is installed on your system, look under the Tools menu
for Data Analysis. If you cannot find Data Analysis under Excel’s Tools menu, select Add-Ins
from the Tools menu and make sure the Analysis ToolPak is checked. If the Analysis ToolPak was
not installed when Excel was installed on your computer, you may need to install it from the Excel
CD-ROM.
Histograms (Chapter 3)
Here is how to produce a histogram in Excel by first creating a column of bins to hold the
frequencies, then using Excel’s COUNTIF function to count how many data values fall into each
bin, and finally create a bar chart of these frequencies with labels and connected bars.
You have two alternatives to these procedures while staying in Excel. First, with StatPad, creating
a histogram is quick and easy. Second, with the data analysis add-in (“Analysis ToolPak”),
creating a histogram requires more steps and the final result (after eliminating gaps between bars)
can be counterintuitive because a data value that falls on a bin boundary may be placed in the bin
to its left, instead of the bin to its right (so that, e.g., 60 would be counted as “50 to 60” instead
of “60 to 70”).
2. Compute the counted frequencies using the COUNTIF function. Select the cell to the
right of the first bin boundary amount. We want the number of data values from 30% to
1Typing “30%” in the cell is the same as typing “0.30” in the cell and then using Format/Cells/Number to specify
percentage format with two decimal places.
Chapter 3 Histograms 17
35% (remember that 35% is the same as 0.35 in Excel). Since 30% is in cell E271 and
35% is in cell E272, we can use the formula
=COUNTIF(computer_owners,"<"&E272)-COUNTIF(computer_owners,"<"&E271)
which has been carefully crafted in this form so that all counts can be found by copying
down the column, to the next-to-last cell (representing data values from 65% to 70%).
For this formula to work, the column of data must have a name such as
“computer_owners” here (if your data does not yet have a name, then select the numbers
in the data column and use Excel’s menu command Insert/Name/Define to give your data
a name). To copy and paste after typing the formula and hitting enter, you may use the
menu command Edit/Copy, then select the cells of the column and then use Edit/Paste (or
just double-click the little fill handle at the lower right of the selected cell, then delete the
last one in the column). Here is the result so far:
3. Prepare for charting by selecting the bin boundaries and the counts, INCLUDING THE
BLANK TOP ROW, which will convince Excel to draw the bar chart correctly, using the
bin boundaries as the category axis. Here is how it should look as you select Insert/Chart
from the menu (or click on the Chart Wizard icon on the toolbar):
18 Histograms Chapter 3
5. Click on Next > twice, then eliminate some unnecessary features. Delete the legend by
selecting the Legend tab and unselecting the “Show legend” checkbox, and eliminate
gridlines by selecting the Gridlines tab and unselecting anything checked there:
8. Choose the Options tab, then decrease the Gap Width to 0 to make it into a true
histogram:
20 Histograms Chapter 3
9. Click OK to complete this task. You now have a histogram in the worksheet!
10. Here are some optional steps. If you don’t like the gray background, double-click on it
and set the Patterns in the Area to None. Similarly, by double-clicking inside a bar, you
may change or eliminate the color. To change the size of the histogram, drag a sizing
handle (which appear in the corners and in the middle of the sides when you click just
Chapter 3 Histograms 21
inside the edge of the chart). To move the chart to a different place in the worksheet, drag
just inside the edge but not on a sizing handle. To add titles, right-click just inside the
chart, select Chart Options from the little pop-up menu, and choose the Titles tab. To
change the font size, right-click on the item (a title or an axis) and choose Format from the
little pop-up menu. To format the horizontal axis as percent, double click on the axis, then
choose Number and Percent. Here is one possible result:
20
Number of States
15
10
0
30% 35% 40% 45% 50% 55% 60% 65%
Percent of Households
To compute the logarithms of the data values, begin by computing the logarithm of the first data
value. To do this, select the cell to its right, then use Excel’s Insert/Function menu command. You
will find the LOG10 function under the Math & Trig category:
Select OK to see the LOG10 dialog box, then click on the first data value (you may need to drag
the dialog box out of the way to see it) to tell Excel which number to take the logarithm of, as
follows (in this case, the number in cell E79, which you specify by clicking on it):
Chapter 3 Histograms 23
Select OK, then double-click on the fill handle to copy this formula down the column of data,
resulting in a new column containing the logarithms of the data (if you prefer, you may use
Edit/Copy and Edit/Paste instead):
24 Histograms Chapter 3
Now give these logarithms a name, for example, logAssets, while they are still selected, by
choosing the Insert/Name/Define menu command and typing the name logAssets:
Chapter 3 Histograms 25
Now we are ready to construct the histogram of logAssets, using the methods explained earlier in
this chapter, but this time for the logAssets data. Here is the resulting histogram, which is much
less skewed than the original data:
26 Landmark Summaries Chapter 4
Either way, the result is the same. After selecting another cell to hold the median and repeating
these steps to find the median, the result (average is 5.1, median is 4.5) is as follows:
Be sure each column of numbers has a name (select the column of numbers and use Excel’s
Insert/Name/Define menu command if needed). The weighted average can then be computed
using the expression “=SUMPRODUCT(Credits, Grade)/SUM(Credits)”. The SUMPRODUCT
function multiplies credits by grade for each course and adds them up, while the SUM function
finds the total credits. Remember always to divide by the sum of the weights (in this example, the
credits). The result here is a grade point average of 3.45:
The 5-number summary consists of the smallest, lower quartile, median, upper quartile, and
largest. You can use Excel’s MIN and MAX functions to find the smallest and largest. Here is the
5-number summary:
To find a percentile when you have the percentage, you may use Excel’s PERCENTILE function,
which needs to know the data set and the percentage. Here is the 85th percentile for the Defects
data:
Given a number (not necessarily a data value, but in the same units as the data values) you may
use Excel’s PERCENTRANK function to find the percentage that tells what percentile it is. This
example shows that 11 is the94th percentile. That is, about 94% of the data values are smaller than
11. To get the number 0.944 to show as 94.4%, you may select the cell and format it as a
percentage (using the menu command Format/Cells/Number/Percentage).
30 Landmark Summaries Chapter 4
2. To the left of these numbers, type in the numbers 1, 2, 3 in exactly the following sequence.
This will tell Excel how to draw the lines to create the box plot (the number 2 is in the
middle, while 1 will place it to the left and 3 to the right).
Chapter 4 Landmark Summaries 31
3. Select both columns of numbers all the way down (including the blank line) and choose
Insert/Chart from the menu as follows:
32 Landmark Summaries Chapter 4
4. Choose “XY (Scatter)” as the Chart Type, and choose “Scatter with data points connected
by lines without markers” as the Chart sub-type, as follows:
5. Click Next > twice, then eliminate some unnecessary features. Delete the X Axis by
selecting the Axes tab and unselecting the “Value (X) Axis” checkbox. Delete the legend
by selecting the Legend tab and unselecting the “Show legend” checkbox, and eliminate
gridlines by selecting the Gridlines tab and unselecting anything checked there. You may
also add titles by clicking on the Titles tab:
Chapter 4 Landmark Summaries 33
6. Click on Finish to place the chart in the worksheet. The chart is selected so you see the
sizing handles around it and the data it was made from.
7. Drag the sizing handles to make it larger. In addition, if you don’t like the gray
background, double-click on it and set the Patterns in the Area to None. To move the
34 Landmark Summaries Chapter 4
chart to a different place in the worksheet, drag just inside the edge but not on a sizing
handle. To add or change titles, right-click just inside the chart, select Chart Options from
the little pop-up menu, and choose the Titles tab. To change the font size, right-click on
the item (a title or an axis) and choose Format from the little pop-up menu. Here is the
result:
2. Click on a wide-open area of the worksheet with room for two columns not touching any
other data in your worksheet. Paste the data once (using Edit/Paste from the main menu),
then select the empty cell under the last data value (one quick way is to hit End, ↓ , and
↓ ) and paste it again. Here is how it looks after pasting once, just before the second
pasting:
36 Landmark Summaries Chapter 4
3. Now sort this double data set as follows. First, select any single data value within the
column (Excel should sort the entire column). Then choose Data/Sort from Excel’s main
menu and select OK from the dialog box. You will then have two copies, sorted. Here is
the worksheet just before sorting:
Chapter 4 Landmark Summaries 37
4. Create the column of percentages. Place the number 0 in the empty cell just to the right of
the top cell of your sorted double data set by typing 0, Enter. Just below it, type the
formula “=1/COUNT(Defects)” where you would substitute your data set name for
“Defects” here. Just below that, type the = key, click on the cell with the 0 you just
entered, then type “+1/COUNT(Defects)”, substituting your data set name for “Defects”
and hit Enter. Finally, double-click the fill handle to complete the column (or copy this cell
to the cells under it to fill out the column). Here is the result just before double-clicking on
the fill handle - note that the cell P10 is where the zero was entered.
38 Landmark Summaries Chapter 4
5. Select both columns of numbers and choose Insert/Chart from the menu as follows:
Chapter 4 Landmark Summaries 39
6. Choose “XY (Scatter)” as the Chart Type, and choose “Scatter with data points connected
by lines without markers” as the Chart sub-type, as follows:
40 Landmark Summaries Chapter 4
7. Click Next > twice, then eliminate some unnecessary features. Delete the legend by
selecting the Legend tab and unselecting the “Show legend” checkbox, and eliminate
gridlines by selecting the Gridlines tab and unselecting anything checked there. You may
also add titles by clicking on the Titles tab:
Chapter 4 Landmark Summaries 41
8. Click on Finish to place the chart in the worksheet. The chart is selected so you see the
sizing handles around it and the data it was made from.
42 Landmark Summaries Chapter 4
9. Drag the sizing handles to make it larger. Then double-click on the Cumulative Percent
axis (or on any number on this Y axis), select the Number tab, choose Percentage with 0
Decimal places as follows:
Chapter 4 Landmark Summaries 43
10. In addition, if you don’t like the gray background, double-click on it and set the Patterns
in the Area to None. To move the chart to a different place in the worksheet, drag just
inside the edge but not on a sizing handle. To add or change titles, right-click just inside
the chart, select Chart Options from the little pop-up menu, and choose the Titles tab. To
change the font size, right-click on the item (a title or an axis) and choose Format from the
little pop-up menu. Here is the result:
44 Variability Chapter 5
Variability (Chapter 5)
Excel can quickly compute the basic variability measures. In this chapter we consider the standard
deviation, the range, the coefficient of variation, and the variance.
If you need the population standard deviation instead of the sample standard deviation, you may
use the function STDEVP instead of STDEV.
46 Probability Chapter 6
Probability (Chapter 6)
Most of the probability chapter requires thinking, and perhaps a calculator, to get the answers. Of
course you can use Excel to do your arithmetic for you - just select a cell, hit the = key, type an
expression such as (0.1+0.3)*0.4, and hit Enter to see the answer. Excel can also be used to
demonstrate the law of large numbers, to show you how the (random) relative frequency of an
event becomes closer to the probability as the number of trials grows larger.
4. Hit the F9 key (called the “Recalculation key”). Each time you do, a new random number
RAND() will be compared to the Probability: if it is smaller, then 1 is displayed and the
event “happens”, otherwise you will see 0. Hit F9 over and over to get a sense of how a
random event with probability 0.4 might occur. If you wish, select cell B2 and type in a
different probability number, hit Enter, then recalculate over and over again with F9. Try it
with probability 0.1 and 0.9 and others if you wish.
Chapter 6 Probability 47
5. Now select cell A8 and choose Edit/Copy from the main menu. Next, click once with the
mouse on cell A9. To select lots of cells from A9 on down, hold the Shift key while you hit
Pg Dn over and over. When you have selected a few hundred or a few thousand cells,
choose Edit/Paste from the main menu. You now have repeated the random experiment
many times, once in each cell starting with A8:
6. Hit the F9 key to see how the relative frequencies might change. Here is one possibility:
note that the relative frequencies are 0 for the first two trials because the event didn’t
happen yet. After 3 trials, the relative frequency is 1 out of 3, or 0.333333. After 4 trials it
drops to 1 out of 4, or 0.25, and so forth:
Chapter 6 Probability 49
7. To create a graph, first select the column of relative frequencies. This might be done by
selecting cell B8, hitting End, then holding down Shift while you hit the down arrow ↓ .
Then choose Insert/Chart from the main menu and choose a Line Chart with the first
Chart sub-type:
50 Probability Chapter 6
8. Click Next > twice, then delete the legend by selecting the Legend tab and unselecting the
“Show legend” checkbox. You may also add titles by clicking on the Titles tab:
Chapter 6 Probability 51
9. Click on Finish to place the chart in the worksheet, and resize it with the sizing handles.
Note how the graph of the relative frequencies hovers fairly near to the probability of 0.4.
Hit the recalculation key (F9) a few times to see how else it might have come out, with
different randomness each time.
10. You can see what relative frequencies look like with different probabilities. Here is how
they might look if you change Probability to 0.9:
52 Probability Chapter 6
Chapter 7 Random Variables 53
will be returned, or the cumulative probability that a particular number or less will be returned on
a particular day.
Here is how to use Excel’s function “POISSON(value,mean,FALSE)” to find the probability that
a Poisson random variable is exactly equal to some value, and how to use
“POISSON(value,mean,TRUE)” to find the probability that a Poisson random variable is less
than or equal to some value. The terms TRUE and FALSE in the function refer to whether the
probability is cumulative or not. Here are the results:
2. Insert random numbers by typing “=RAND()” in cell B3, just to the right of the first frame
number, hit ENTER, and then copy the result down the column to produce a column of
random numbers (this is quickly done by double-clicking the little fill handle at the lower
right corner of the selected cell B3).
3. To shuffle the population, first select both columns of numbers (the frame numbers and the
random numbers). For a large population, this is easily done by selecting the first frame
number (cell A3 here), holding Shift while you hit the right arrow → , hitting End, and
holding Shift while you hit the down arrow ↓ . Then use Data/Sort from Excel’s main
menu, being sure to sort by the random numbers.
58 Random Sampling Chapter 8
4. After the columns are sorted randomly, you may take the first three frame numbers to
obtain your random sample, which results in selection of items 7, 10, and 2 in this
example.
Alternatively, you can compute the standard error all at once with the formula
“=STDEV(rangeName)/SQRT(COUNT(rangeName))”, where “rangeName” is the name of your
data.
60 Confidence Intervals Chapter 9
To use a different confidence level other than 95%, you need only change the 0.95 in the TINV
function. For example, for a 99% confidence interval, you would use 0.99 in place of 0.95.
Chapter 9 Confidence Intervals 61
To use a different confidence level other than 95%, you need only change the 0.95 in the TINV
function. For example, for a 99% confidence interval, you would use 0.99 in place of 0.95.
62 Hypothesis Testing Chapter 10
Next, consider the case of a one-sided test to see if the sample average is significantly smaller than
the reference value (that is, the research hypothesis claims that the population mean is smaller
than the reference value). In this case, the p-value is either =TDIST(ABS(t),n-1,1) or =1-
=TDIST(ABS(t),n-1,1), depending on whether t is negative or positive respectively. Using the t
statistic of 2.4395561 and sample size n = 15 for the paper thickness example, the one-sided p-
value is 0.9857, found as follows:
64 Hypothesis Testing Chapter 10
This example has been used to illustrate the calculations. Note that, in real life, you would not
compute both of these tests (significantly greater, significantly smaller) on the same data set
because you would have to choose the side you wished to test before performing the test.
Then you can find the standard error of the average difference, the t statistic, and the p-value from
these summaries. The conclusion is that there is a very highly significant difference between men's
and women's salaries (p < 0.001). Here are the Excel results:
Chapter 10 Hypothesis Testing 67
68 Correlation and Regression Chapter 11
3. Choose XY (Scatter) from the list of chart type, and the first Chart sub-type (“Scatter.
Compares pairs of values”).
4. Continuing with Excel’s steps, you can create a scatterplot as an object in the worksheet.
Here is how the initial dialog box looks like after you select the data and begin to insert a
chart, together with the finished chart in the worksheet.
5. In addition, if you don’t like the gray background in the chart, double-click on it and set
the Patterns in the Area to None. To eliminate the legend at the right in the chart, right-
click on it and clear. To eliminate gridlines, right-click on one and clear. To change the size
of the chart, drag a sizing handle (which appear in the corners and in the middle of the
sides when you click just inside the edge of the chart). To move the chart to a different
place in the worksheet, drag just inside the edge but not on a sizing handle. To add or
change titles, right-click just inside the chart, select Chart Options from the little pop-up
menu, and choose the Titles tab. To change the font size, right-click on the item (a title or
an axis) and choose Format from the little pop-up menu. To change the number format of
an axis, double-click on it and select Number. Here is one possible result:
70 Correlation and Regression Chapter 11
90
80
70
60
50
Time 40
30
20
10
0
0 50 100 150 200
Pages
From these results, looking at the last table’s Coefficients and recognizing that “X variable 1”
refers to the X variable “Yesterday”, you can see that the least-squares prediction equation is
Today = 0.000398 + 0.111421 × Yesterday
Because the R2 is 0.0132 or 1.32% (from the first table of Regression Statistics), it is clear that
given whatever the market did yesterday does not seem to help you very much to predict what it
will do today.
To perform the t test, you may look at the t statistic (“t stat” for X Variable 1” in the last table) of
0.732 and its p-value of 0.468. Because p > 0.05 the relationship between Yesterday’s and
Today’s stock market movements is not significant.
This is also clear from the 95% confidence interval for the regression coefficient, which extends
from -0.196226 to 0.419068 and includes the reference value 0. These numbers are found in the
last row of the last table under the headings “Lower 95%” and “Upper 95%”.
74 Multiple Regression Chapter 12
1. Look under the Tools menu for Data Analysis, and then select Regression. If you cannot
find Data Analysis under Excel’s Tools menu, select Add-Ins from the Tools menu and
make sure the Analysis ToolPak is checked. If the Analysis ToolPak was not installed
when Excel was installed on your computer, you will need to install it from the Excel CD-
ROM.
2. In the resulting dialog box, you may specify the range for the Y variable by selecting the
label at the top along with the column of numbers to be predicted by dragging the mouse
down the column starting at the label “Page” in cell D9 down to the Page Cost value for
Chapter 12 Multiple Regression 75
the last magazine in cell D64. The X variables must be right next to each other, forming a
rectangular range of rows and columns. In this case the X variable range, including labels,
is from E9 (the label “Audience”) to G64 (the Income measure for the last magazine),
selected by dragging the mouse diagonally from one corner to the other. Here is the
resulting dialog box:
What to do if you do not want to use all of the X variables? For example, to leave one out
you should create a copy of the X variables (selecting them, using Edit/Copy, selecting a
cell in a different part of the worksheet, then using and Edit/Paste), select the column of
data to be omitted, delete it with the Del key (this is why we use a copy!), select the
columns to its right by dragging the mouse diagonally across from one corner to the other,
then use Edit/Cut, move to the empty column, and use Edit/Paste to close the gap. You
now have a copy of the X data that omits the column you are not using.
3. Click “Labels” in this dialog box because we have included labels at the top of the data
columns. This was done to make the results easier to interpret (so that Excel can use the
names of the variables instead of just “X variable 2” for example).
4. Click “Output Range” in this dialog box and specify (by clicking the mouse or typing a cell
address) where in the worksheet you want the results to be placed, then click OK. The
result is not a pretty sight - it still needs to be tidied up because some cells cannot be read
because they are blocked by others and the numbers are not aligned nicely.
76 Multiple Regression Chapter 12
4. Now tidy it up and format the results. If there is more in a cell than you can see, select it
and use the menu command Format/Columns/Autofit Selection in order to make the
column wider so that you can see it all. To control the number of decimal places shown,
select the cell(s), then use Format/Cells, then under the Number tab you might choose
Number and then specify the number of decimal places. The last two columns have been
deleted because they contain no new information (they just repeated the columns before
them). Here are the results after tidying up:
Chapter 12 Multiple Regression 77
The results in the first table of Regression Statistics include the R2 value of 0.787 (which tells you
that or 78.7% of the variation in Page Costs can be explained by the X variables) and the standard
error of estimate Se of 21,578 (which tells you that Page Costs can be predicted to within about
this many dollars).
The ANOVA table includes the F test, whose p-value 3.81619E-17 is very small (the “E-17” tells
you to move the decimal point to the left 17 places, so actually p =
0.0000000000000000381619). In particular, p < 0.001 and the result is very highly significant.
The last table has the Coefficients, including the constant term of 4,042.799 and the regression
coefficients: 3.788 for Audience, -123.634 for Male, and 0.903 for Income. The Standard Error
column shows standard errors for each of these coefficients. Next are their t statistics and p-values
(note that Audience and Income are significant, but Male is not). Finally you have 95% confidence
intervals for the regression coefficients - for example, we are 95% sure that the effect of an
additional dollar of Income is to increase Page Costs somewhere between $0.161 and $1.645, on
average.
1. Look under the Tools menu for Data Analysis, and then select Correlation. In the resulting
dialog box, you may select the labels at the top of each column as part of the data range
(which must be data columns arranged right next to each other, forming a rectangular
range of rows and columns). Also click on “Labels in First Row” so that Excel can use the
variable names to help you understand the results. In this case the Input Range is from D9
(the label “Page”) to G64 (the Income measure for the last magazine), selected by
dragging the mouse diagonally from one corner to the other. Here is the resulting dialog
box:
2. Click on OK. You can see, for example, that the correlation between Page Costs and
Audience is the highest, with r = 0.872. The correlation between Audience and Income is
negative, with r = -0.353. Here are the results:
Chapter 14 Time Series 79
1. To find the moving average for a quarterly series like this one, remember that it starts with
the third row (so that we can average a full year’s worth of data, with a half-year before
and a half-year after). So we start in the third quarter (cell D6 in this case). Note that if we
go back two quarters and ahead two quarters there are two “Quarter 1” values, so they
must have weight 0.5 each so that quarters 1 through 4 are treated equally. The easiest
way to compute this weighted average is actually to average two overlapping full years’
80 Time Series Chapter 14
worth of data: the four quarters of 1994 (cells C4:C7 here) with the full year beginning
one quarter later (cells C5:C8). This is why, in this case, you can use the formula
=AVERAGE(C4:C7,C5:C8)
in cell D6 for the first moving-average value. An easy way to enter the formula is to drag
down each four-quarter range instead of typing in its address. Here is how it looks so far:
If you have a monthly instead of a quarterly time series, then instead of the “quarter”
column with 1, 2, 3, 4, 1, 2, 3, ... you would have a “month” column with 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 1, 2, 3, ... and the moving average would start in the seventh row
instead of the third. The formula for the moving average would again be the average of
two overlapping full years’ worth of data (1) the first 12 months and (2) the full year
beginning one month later with months 2 through 13. With monthly data the moving
average is also unavailable for the last six months.
2. Double click on the fill handle to copy this formula down the column, then select and
delete the last two entries of this column because the moving average is unavailable for the
last two quarters. (For monthly data, delete the last six entries). Here is the result so far:
Chapter 14 Time Series 81
3. Find the ratio-to-moving-average, by dividing the Sales value by the Moving Average
value, (in this case, place the formula =C6/D6 in cell E6) then double-click on the fill
handle to copy this formula down the column. Here is the result:
82 Time Series Chapter 14
4. The seasonal index can be computed for all quarters, even when the moving average and
ratio are unavailable. The seasonal index for a given quarter (1, 2, 3, or 4) is the average
Chapter 14 Time Series 83
of all the ratios for that quarter, averaged over all the years that have a ratio for that
quarter. For example, the seasonal index for quarter 1 is the average of the ratio 1.03142
for quarter 1 in 1995 with the ratio 1.000796 for quarter 1 in 1996, and so forth through
quarter 1 of 2000. Here is a fairly easy way to compute the seasonal index column by
using the SUMIF(RANGE,CRITERIA,SUM_RANGE) function to sum the ratios for the
selected quarter, divided by the COUNTIF(RANGE,CRITERIA) function that counts
how many there are.
In this case the formula to put in cell F4 is
=SUMIF($B$6:$B$29,B4,$E$6:$E$29)/COUNTIF($B$6:$B$29,B4)
Note carefully the use of dollar signs in the cell addresses: references with $ will not
change when the formula is copied. The RANGE is $B$6:$B$29 in both functions
(SUMIF and COUNTIF), consisting of those values in the “Quarter” column for which
ratios are available, so that the first two and last two rows are excluded. The CRITERIA
in both functions (SUMIF and COUNTIF) is simply B4, which refers to the Quarter
number, 1, for the first row of data. No dollar signs are used here so that when the
formula is copied, the result will be for the appropriate quarter for that row. The
SUM_RANGE is $E$6:$E$29 for the SUMIF function, telling it to sum up the ratio
values for the specified quarter number, specifying only those rows for which ratios are
available.
After entering this formula into cell F4, drag the fill handle down the entire column (or use
copy and paste) to find all the seasonal values. Note that they repeat exactly from one year
to the next, for example, the quarter 1 seasonal index is always 0.9993252 for all years:
84 Time Series Chapter 14
5. The seasonally adjusted values are found by dividing each Sales figure by its Seasonal
Index. In this case, the formula is =C4/F4. Enter the formula into the top cell, then copy
down the column, perhaps by double-clicking on the fill handle:
Chapter 14 Time Series 85
6. Before you can find the long-term trend, you need a “time period” column consisting of
the numbers 1, 2, 3, ... counting how many time periods have gone by. A quick way to do
this is to start with 1 and 2 in the first two rows (H4 and H5 in this example), select both
cells, then double-click the little fill handle in the lower right corner of the selected cells.
86 Time Series Chapter 14
7. Use this column of time periods to predict the seasonally adjusted column (Y) from the
time period (X) using regression analysis. A quick way to do this is with the
FORECAST(X,KNOWN_Y’S,KNOWN_X’S) function, using the first time period value
column for X, using the entire seasonally-adjusted series with absolute $ cell addressing
Chapter 14 Time Series 87
as the KNOWN_Y’S, and using the entire time period column with absolute $ cell
addressing as the KNOWN_X’S. In this case, entering the formula into cell I4 using
Insert/Function from the main menu for this problem looks like this (be careful to use omit
$ for X but to use $ in the other two ranges:
8. Choose OK to see the resulting long-term trend value in the top cell, then double-click the
fill handle to copy the formula down the column:
88 Time Series Chapter 14
9. To extend the trend beyond the series and find the seasonally-adjusted forecast values, the
quickest way is to select the last two rows of the time period and the trend columns (you
Chapter 14 Time Series 89
need two rows so that Excel will know to keep increasing the time period in the next step)
as follows:
and then to drag the little fill handle at the lower right corner of the selected range to drag
it down as many rows as you want. It’s like magic!
10. To prepare to forecast by seasonalizing the trend, you will need to extend the columns for
year, quarter, and seasonal index (columns A, B, and F here). After extending columns A
and B, you may select the last seasonal index (cell F31 here) and drag the fill handle down
to extend it (if Excel has not already done this for you):
90 Time Series Chapter 14
11. You are now ready to create the forecast values by multiplying the trend by the seasonal
index. In this example, enter the formula =I4*F4 into cell J4, then double-click the fill
handle (or copy and paste) to complete the forecast column. Congratulations! You are
done the calculations!
Chapter 14 Time Series 91
92 Time Series Chapter 14
2. To list the years along the horizontal axis, click Next >, choose the Series tab, click in the
“Category (X) axis labels:” portion of the dialog box and drag with the mouse down the
numbers in the Year column in the spreadsheet (in this example, cells A4:A36, excluding
the label at the top this time). The dialog box now looks like this:
3. To add the forecasts to this chart, click Add, then click in the Values area of the dialog
box, then drag with the mouse down the Forecast values in the worksheet (just the
numbers). Next click in the Name area of the dialog box, then click on the cell with the
label “Forecast” (in cell J3 here). Your dialog box now looks like this:
Chapter 14 Time Series 95
4. Click Next >, make any changes you like, then click Finish to place the chart into the
worksheet. After resizing the chart and double-clicking on the gray background to make it
white, the chart looks like this:
45
40
35
30
25 Sales
20 Forecast
15
10
0
1995
1997
1999
2001
1994
1996
1998
1999
2000
2001
1994
1996
1998
2002
1995
1997
2000
96 ANOVA Chapter 15
2. In the dialog box that appears, click in “Input Range” and select your data including
labels at the top, being sure to extend down to the last row even if you extend past the
end of some data columns. Excel requires that your variables be next to one another so
that your Input range is a rectangle. Click the check box “Labels in First Row” so that
Excel will recognize the names of the columns. Click to the left of “Output Range”, click
to the right of “Output Range” and then click in a cell in the worksheet where Excel can
put the results. So far, here is how it looks:
Chapter 15 ANOVA 97
3. Click OK to see the results. In this case the p-value of 0.005 tells you that the mean
quality scores of these three suppliers are highly significantly different from one another (p
< 0.01). That is, you may conclude that there are supplier differences. Also shown are the
average quality for each supplier (82.056, 80.667, and 87.684) and each supplier's
variance. You also find the between-sample variability of 269.081 and the within-sample
variability of 45.631 under the MS column of the ANOVA table (MS stands for Mean
Square).
Here are the results, after tidying up by adjusting column widths (try selecting cells that
are not displayed properly, then using Format/Column/AutoFitSelection) and by
formatting most cells to show three decimal places (using Format/Cells, selecting the
Number tab, then using Category Number with 3 decimal places for these cells).
98 ANOVA Chapter 15
4. To find the suppliers’ standard deviations, you may take the square root of each variance,
using the SQRT function as follows:
Chapter 15 ANOVA 99
2. In the dialog box that appears, click in “Input Range” and select your data including
labels at the top and on the sides. Excel requires that the data be arranged in a table as
shown below. In this case there are 5 observations for each combination of shift and
supplier, so the “Rows per sample” is set at 5. Click to the left of “Output Range”, click to
the right of “Output Range” and then click in a cell in the worksheet where Excel can put
the results. So far, here is how it looks:
Chapter 15 ANOVA 101
3. Click OK to see the results, as shown below. First you see summary statistics for each
combination of shift and supplier (for example, the average quality for Shift 1 and Supplier
1 is 77.062, the average for Supplier 1 is 82.417 (to the right in the first table, for Supplier
1, under “Total”), and the average for Shift 1 is 80.076 (below, in the table headed “Total”
under the column headed Shift 1 at the very top).
In the ANOVA table are the results of the hypothesis tests, including a p-value of 0.720 for
testing whether the suppliers have equal means or not, a p-value listed as 0.000 for testing
whether the shifts have equal means or not, and a p-value of 0.014 for the interaction of
shift and supplier.
Here are the results, after tidying up by adjusting column widths (try selecting cells that
are not displayed properly, then using Format/Column/AutoFitSelection) and by
formatting most cells to show three decimal places (using Format/Cells, selecting the
Number tab, then using Category Number with 3 decimal places for these cells).
102 ANOVA Chapter 15
Chapter 16 Nonparametrics 103
1. Begin by listing both groups of numbers in a single column, with labels in the column to its
left to identify the group of each number. Then, with any data cell selected, use the menu
command Data/Sort to sort both columns by data value. Here is the Data/Sort dialog box:
2. Now find the rank of each data value, being careful to average any ties. To do this, create
a column (headed “1, 2, 3 ...” below) consisting of the initial ranks (before averaging) of
106 Nonparametrics Chapter 16
1, 2, 3, and so forth. Then create a column of ranks with tie averaging by using the
SUMIF(DataRange,DataValue,123Range)/COUNTIF(DataRange,DataValue), being
careful to use absolute $ addressing for DataRange and 123Range but not for DataValue.
Here is the result after copying that formula down the column (for example, by double-
clicking on the fill handle after entering the first formula). Note that the averaged rank of
18.5 is used for both income values of 57,000.
Chapter 16 Nonparametrics 107
3. To find the average rank for each group, you may again use the SUMIF and COUNTIF
functions, this time as
108 Nonparametrics Chapter 16
SUMIF(GroupLabelRange,”Fixed”,RanksRange)/COUNTIF(groupLabelRange,”Fixed”)
for the fixed-rate mortgages, changing “Fixed” to “Variable” for the variable-rate
mortgages. Here are the results:
Chapter 16 Nonparametrics 109
4. Now find the average difference in ranks by subtracting these average ranks. Find the
standard error by using the sample size for each group (16 and 14, here). Divide the
average difference in ranks by the standard error to find the test statistic. Finally, find the
p-value using the function
=2*(1-NORMSDIST(ABS(TestStatistic)))
The results are as follows. Note that these two groups are not significantly different from
one another because p > 0.05.
110 Chi-Squared Analysis Chapter 17
2. The results are shown below: first the original table of counts, next the table of expected
counts, and finally the CHITEST function, which uses both the original table and the table
of expected counts (but not the totals). The resulting CHITEST p-value is 3.07823E-15,
which represents the very small number 0.00000000000000307823 because the scientific
notation "E-15” tells you to move the decimal point 15 places to the left. Clearly the result
is very highly significant because this p-value is less than 0.001.
Chapter 18 Quality Control 113
To use Excel to draw an R chart for the detergent data, proceed as for the XBar chart, but use the
range values R for the first column, their average RBar for the second column, and the
Chapter 18 Quality Control 115
appropriate lower and upper control limits D3*RBar = 0 and D4*RBar = 0.556 for the third and
fourth columns. Here is the R chart in Excel:
116 Excel Range Names Appendix
• Population • CREF_Value
• State_Taxes
Table 4.3.10. Length in minutes for
Table 4.3.5. Percent Change in Housing selected films from a video library.
Values over Five Years for U.S. Regions. • Time
• Percent_Change
Table 3.9.6. Hospital Charges for Heart
Table 4.3.6. Revenues for selected Failure and Shock.
Fortune 500 companies. • Hospital_Charges
• Revenues
Table 3.9.7. CEO Compensation for
Table 4.3.7. Percent increases of initial Food Processing Firms.
public stock offerings. • CEO_Compensation
• Percent_Increase
Table 3.9.10. Cost of Traditional Funeral
Problem 4.16. Paper Mill Problems. Service.
• Problem • Funeral_Cost
Table 4.3.8. Home Mortgage Loan Fees Table 4.3.11. Sales of Some 'Light'
• Fee Foods.
• Food_Sales
Problem 4.23. Strength of Cotton Yarn.
• Strength Table 2.6.7. Closing Price and Monthly
Change for DJIA Firms.
Problem 4.24. Factory Inventory Level. • DJIA_Close
• Inventory • DJIA_Change
Problem 4.25. Your Products' Share. Table 2.6.8. Daily DJIA for January
• Share 2002.
• DJIA_Net_Change
Problem 4.26. Monthly Sales. • DJIA_Percent_Change
• Monthly_Sales
Case.
Table 4.3.9. Changing Value of the • Chairs
Dollar. • Tables
• Change • Bookshelves
• Cabinets
Table 3.9.1. Yields of Municipal Bonds.
• Value
• Yield
CHAPTER 5, RANGE NAMES.
Table 3.9.2. Market Response to Stock
Buy-Backs.
• Price_Change Table 5.1.1. Finding The Deviations
From The Average.
Table 3.9.4. CREF'S Investments. • Dart_Returns
120 Excel Range Names Appendix
Problem 5.6. Number of Executives for Problem 5.23. Airline Ticket Prices
Seattle Firms. • Ticket_Cost
• Executives
Problem 5.24. Productivity Measures.
• Productivity
Table 3.9.1. Yields of Municipal Bonds. Table 2.6.8. Daily DJIA for January
• Yield 2002.
• DJIA_Net_Change
Table 3.9.2. Market Response to Stock • DJIA_Percent_Change
Buy-Backs.
• Price_Change Case.
• Part_Size
Table 3.9.4. CREF'S Investments.
• Market_Value CHAPTER 7, RANGE NAMES.
• Weight_Food
Table 4.3.1. Last Month's Sales.
• Sales Problem 10.22. Prices.
• Price
Problem 9.40. Strength of Cotton Yarn.
• Strength Problem 10.23. Calorie Content.
• Calories
Table 5.5.4. Weights for Two Samples of
Candy Bars. Table 10.7.2. Store Returns.
• Before • Returned
• After
Problem 10.25. Satisfaction Scores.
Problem 9.44. Quality scores for • Satisfaction
agricultural produce.
• Quality Problem 10.26. Pollutant Levels.
• Pollution
Problem 9.45. Caffeine in Coffee.
• Caffeine Problem 10.27. Component Weights.
• Weight_Component
Case.
• Order_Amount Table 10.7.3. Performance of Socially
Aware Funds.
CHAPTER 10, RANGE NAMES. • ROR
Table 4.3.7. Percent increases of initial Table 10.7.8. Monthly Daycare Rates.
public stock offerings. • Laurelhurst
• Percent_Increase • Other_Areas
Problem 10.21. Weight of Frozen Foods. Table 10.7.9. New Product Preferences.
124 Excel Range Names Appendix
• Milwaukee • Today
• Green_Bay • Yesterday
• Service • Yours
• Other • Competitor
Table 16.4.2. Aerospace Firm Profits. Table 17.4.1. Vehicle Desired: This
• Aerospace_Profit week's count and last year's percentage.
• This_Count
Table 16.4.3. Relaxation Scores. • Last_Percent
• Before
• After Table 17.4.2. Incoming Telephone Calls.
• Phone_Count
Table 16.4.4. Stress Levels. • Phone_Percent
• True_Answer
• False_Answer Table 17.4.3. Survey of Future Business
Conditions.
Table 16.4.5. Gender Salary Data. • Managers
• Women • Employees
• Men
Table 17.4.4. Survey on the Chances of a
Table 16.4.6. Reliability of Products Stock Market Crash.
Under Abuse. • Stockholders
Appendix A Data Files and Variable Names 129
• Level • Lifetime
• Lifetime_D0
APPENDIX B, RANGE NAMES. • Lifetime_D1
• MajorDonor
Appendix B. Donations Database. • MajorDonor_D0
Note: “_D0” indicates 19,011 non- • MajorDonor_D1
donors, while “_D1” indicates 989 • MedHouseInc
donors, out of 20,000 overall.
• MedHouseInc_D0
• Age
• MedHouseInc_D1
• Age_D0
• OwnerOccupied
• Age_D1
• OwnerOccupied_D0
• Age55_59
• OwnerOccupied_D1
• Age55_59_D0
• PCOwner
• Age55_59_D1
• PCOwner_D0
• Age60_64
• PCOwner_D1
• Age60_64_D0
• PerCapIncome
• Age60_64_D1
• PerCapIncome_D0
• AvgGift
• PerCapIncome_D1
• AvgGift_D0
• Professional
• AvgGift_D1
• Professional_D0
• Cars
• Professional_D1
• Cars_D0
• Promotions
• Cars_D1
• Promotions_D0
• CatalogShopper
• Promotions_D1
• CatalogShopper_D0
• RecentGifts
• CatalogShopper_D1
• RecentGifts_D0
• Clerical
• RecentGifts_D1
• Clerical_D0
• Sales
• Clerical_D1
• Sales_D0
• Donation
• Sales_D1
• Donation_D0
• School
• Donation_D1
• School_D0
• Farmers
• School_D1
• Farmers_D0
• SelfEmployed
• Farmers_D1
• SelfEmployed_D0
• Gifts
• SelfEmployed_D1
• Gifts_D0
• Technical
• Gifts_D1
• Technical_D0
• HomePhone
• Technical_D1
• HomePhone_D0
• YearsSinceFirst
• HomePhone_D1
Appendix A Data Files and Variable Names 131
• YearsSinceFirst_D0
• YearsSinceFirst_D1
• YearsSinceLast
• YearsSinceLast_D0
• YearsSinceLast_D1