You are on page 1of 85

Skyline Technologies presents

TM

StatPad

Quick and Easy Data Analysis Using Excel

Amazon Revenue
($billions)

15
10

Revenue
Trend

5
0
2004

Forecast

2006

2008
2010
Time

2012

2014

Copyright 1988, 1997, 2000, 2003, 2011 by Skyline Technologies, Inc.


StatPad is a trademark of Skyline Technologies, Inc.
Excel is a registered trademark of Microsoft Corporation.

What is StatPad?

Table of Contents
What is StatPad? ..............................................................................................................................4
How to Install StatPad .....................................................................................................................5
How to Use StatPad .........................................................................................................................6
Overview of StatPad Features ........................................................................................................12
One-Sample Analysis .....................................................................................................................18
Summaries................................................................................................................................18
Histogram .................................................................................................................................19
Histogram (With Customized Bin Width and Landmark) .......................................................20
Box Plot ...................................................................................................................................21
Cumulative Distribution ..........................................................................................................22
Confidence Interval ..................................................................................................................23
Confidence Interval (One-Sided, 99%) ....................................................................................24
Hypothesis Test ........................................................................................................................25
Hypothesis Test (One-Sided) ...................................................................................................26
Percentile..................................................................................................................................27
Percentile Ranking ...................................................................................................................28
Sampling ........................................................................................................................................29
Random Sample Without Replacement ...................................................................................29
Random Sample With Replacement ........................................................................................30
Uniform Distribution ...............................................................................................................31
Normal Distribution .................................................................................................................32
Binomial Distribution ..............................................................................................................33
Binomial Percentages ...............................................................................................................34
Probability Calculations .................................................................................................................35
Normal Probability (Greater Than) ..........................................................................................35
Normal Probability (Between) .................................................................................................36
Binomial Probability (Equal to) ...............................................................................................37
Binomial Probability (This or Less) .........................................................................................38
Binomial Percent (Equal to).....................................................................................................39
Binomial Percent (Between) ....................................................................................................40
Poisson Probability (Equal to) .................................................................................................41
Poisson Probability (This or Less) ...........................................................................................42
Exponential Probability (This or More) ...................................................................................43
Exponential Probability (Between) ..........................................................................................44
Discrete Probability..................................................................................................................45
Two-Sample Analysis ....................................................................................................................46
Summaries................................................................................................................................46
Histograms ...............................................................................................................................47
Box Plots ..................................................................................................................................48
Confidence Interval ..................................................................................................................49
Hypothesis Test ........................................................................................................................50

Many-Sample Analysis ..................................................................................................................51


Summaries................................................................................................................................51
Histograms ...............................................................................................................................52
Box Plots ..................................................................................................................................53
F Test for One-Way ANOVA ..................................................................................................54
Mean Differences .....................................................................................................................55
Bivariate Analysis ..........................................................................................................................56
Scatterplot ................................................................................................................................56
Scatterplot with Least-Squares Line ........................................................................................57
Correlation ...............................................................................................................................58
Correlation with Test ...............................................................................................................59
Regression ................................................................................................................................60
Predicted and Residuals ...........................................................................................................61
Univariate Summaries ..............................................................................................................62
Histograms ...............................................................................................................................63
Box Plots ..................................................................................................................................64
Multivariate Analysis and Multiple Regression............................................................................65
Scatterplots ...............................................................................................................................65
Correlations ..............................................................................................................................66
Multiple Regression .................................................................................................................67
Predicted and Residuals ...........................................................................................................68
Diagnostic Plot .........................................................................................................................69
Univariate Summaries ..............................................................................................................70
Histograms ...............................................................................................................................71
Box Plots ..................................................................................................................................72
Time-Series Analysis .....................................................................................................................73
Trend-Seasonal ........................................................................................................................73
Forecast with Series .................................................................................................................74
Moving Average (Smooth) ......................................................................................................75
Seasonal Index .........................................................................................................................76
Seasonally Adjusted Series ......................................................................................................77
Long-Term Trend .....................................................................................................................78
Seasonalized Trend ..................................................................................................................79
A Combination: Data Series With Long-Term Trend and Forecast ........................................80
Numeric Output .......................................................................................................................81
Quality Control ..............................................................................................................................82
X-Bar, R Charts (No Standard Given) ......................................................................................82
X-Bar, R Charts (Standard Given)............................................................................................83
Percentage or Count Chart (No Standard Given).....................................................................84
Percentage or Count Chart (Standard Given) ..........................................................................85

What is StatPad?

What is StatPad?
Welcome to StatPad1, a software system designed for people who wish to perform statistical
analysis within their Microsoft Excel2 computer spreadsheets. StatPad was designed to make
statistical analysis as accessible, painless, and easy to understand as possible by bringing basic
statistical analysis and its interpretation into the environment where business and other data are
often found: namely within an Excel spreadsheet. Whenever possible, the analysis is guided by
choices from a dialog box that adapts itself automatically to your situation. The results,
consisting of charts, explanatory text, and computations, then become part of your worksheet.
StatPad will perform all aspects of basic statistics: design using a random sample, exploration
through graphic representations of data, estimation with summaries and confidence intervals
(both one-and two-sided at various confidence levels), hypothesis testing, normal and binomial
probability calculations, multiple regression analysis, trend-seasonal time series analysis, and
statistical quality control charts.
Heres how to get started if you are in a hurry: after you open the file STATPAD.XLA, you will
find StatPad listed under the Excels Add-Ins Ribbon (or Tools menu for older versions of Excel)
ready for you to select. When selected, StatPad greets you with its main dialog box, ready for
analysis.

1StatPad
2Excel

is a trademark of Skyline Technologies, Inc.

is a registered trademark of Microsoft Corporation.

How to Install StatPad

How to Install StatPad


All you need in order to run StatPad is a computer running Microsoft Excel for Windows. There
are two ways to install StatPad, depending upon whether or not you want StatPad to be there
automatically whenever you work in Excel. Please begin by copying the file STATPAD.XLA to a
folder on your computer.

If you wish StatPad to be available automatically when you run Excel:


1. In Excel, choose File/Options, select Add-Ins at the left, wait a moment, then choose
"Go" near the bottom to manage Excel Add-Ins (Excel 2007 users will start by clicking
on the OfficeButton at the top left, choose ExcelOptions at the bottom before continuing
by selecting Add-Ins at the left and choosing "Go").
2. Browse to the folder where you put the file STATPAD.XLA, select the file, and click OK.
3. Be sure the StatPad entry is checked in the list of add-ins, then choose OK.
4. StatPad will be available in the Add-Ins Ribbon near the top (or Tools menu for older
versions of Excel).

If you wish to load StatPad manually each time you open Excel:
Either double-click the file STATPAD.XLA or use Excels File Open menu commands to
open this file from its folder on your computer. Choose Enable Macros if necessary.
The choice StatPad will then be available under Excels Add-Ins Ribbon (or Tools menu
for older versions of Excel). StatPad will remain available until you close Excel.

If you need to change Excel's macro security level, you will find this at File / Options /
TrustCenter / TrustCenterSettings / Add-Ins.

How to Use StatPad

How to Use StatPad


Heres how to use StatPad:
1. Get into Excel and bring your data (if any) into the worksheet.
2. If StatPad has already been installed, simply select StatPad from Excels Add-Ins Ribbon
near the top of the screen to begin statistical analysis.
If StatPad has not yet been installed, either open the file STATPAD.XLA using Excels File
Open menu command near the top of the screen or read the previous section How to
Install StatPad to see how to make StatPad available whenever you are in Excel.
3. You will see StatPads main dialog box, ready to guide you through the analysis:

4. Select a situation from the list near the top left (One Sample, Sampling, Probability, Two
Sample, Many Sample, Bivariate, Multivariate, Time Series, or Quality Control).
5. Select the analysis you want from the list near the top right. Note that this analysis list
changes automatically for you, depending on the situation you choose. For a One Sample
situation, the analysis choices are Summaries, Histogram, etc. But if you select
Probability instead, the analysis choices instantly change to Normal Probability, Binomial
Probability, and Binomial Percent.
6. Give StatPad the additional information it needs. StatPad will automatically change to
show you what is needed, so you may fill in the blanks as they appear. For One Sample,
Summaries, you need to give StatPad a data set name and an output range. For One
Sample, Confidence Interval, so that you can tell StatPad which confidence level you

How to Use StatPad

wish, an edit box will appear automatically for this purpose (you may also decide to
choose a one-sided interval). Heres how the main dialog box changes:

For a multiple regression analysis, StatPads main dialog changes again (automatically!)
allowing you to select the X variables (for example, income, percent male, and
readership) to use to explain the Y variable (for example, the cost of a full-page color
magazine ad).

How to Use StatPad


7. Heres how to select your data set(s) from the list(s). StatPad puts into its lists each Excel
range name that identifies a single column of numbers.3 When you name your data with
StatPad, the name also becomes an Excel range name.
a. If just one data set is needed (e.g., for one-sample analysis), you may choose one of
the following:
i. Click on its name, in the list.
or
ii. Type its name into the edit-box, just above the list.
or
iii. Click on the edit-box, just above the list, and then drag in the worksheet with the
mouse to identify your column of numbers. This is useful for a quick analysis
when you do not care to use a name to identify the data.
b. If more than one data set can be specified (e.g., many-sample analysis, or the X
variables for a multiple regression), you may choose one of the following:
i. Click on each name that you wish to select, scrolling up and down as needed. If
you click again on a selected name, it is unselected (be careful not to click quickly
twice on the same name; Excel will interpret this as a double-click and StatPad
will immediately begin the analysis).
or
ii. Move through the list using the cursor (arrow) keys, selecting and unselecting by
hitting the spacebar.
8. If your data are in the worksheet, but are not offered to you as a choice4 in StatPads lists,
heres how to proceed:
a. Click on StatPads Add Data button (at the right, just above the middle of StatPads
main dialog box) to put a data set name into the list. You then see the following
dialog box, and you may drag with the mouse to select the data (one column of
numbers) and specify the name you want. This name will then appear in StatPads
lists along with the other data sets.

3Heres

a quick way to find out the name (if any) associated with a list of numbers. Highlight the list (drag with the
mouse), then look for the name in Excels Name Box near the top left corner of the worksheet. StatPad limits the size
of a each list to a maximum of 65,000 numbers.
4If

you have used Excel to name a column of numbers (e.g., with Excels Insert Name Define menu items), this name
will appear automatically in StatPads list. When you name a column of numbers within StatPad, this name also
becomes an Excel range name for your data. Names can be deleted using Excels Insert Name Define Delete menu
items.

How to Use StatPad

Heres how the screen might look after you (1) click in the Range box of the above
dialog box, (2) highlight your data in the worksheet, (3) click in the Name box of the
dialog box, and (4) type in the name (Prices for this example, but please dont use
spaces or special characters):

b. Alternatively, you may feel free to type a name for the data set into the edit-box in
StatPads main dialog box, even if that name is not proposed for you. This can be
done whenever only one data set can be used for the chosen situation (but please dont
use spaces or special characters in the name). Once you hit Enter, click on Do It, or
double click, to begin the analysis, StatPad will ask you to select the column of
numbers you want, using the following dialog box. After this, the name will
automatically show up in StatPads data set lists.

10

How to Use StatPad

9. Use the Output Range box at the lower right of the main dialog box to tell StatPad where
to put the results.
a. If youve asked for a chart:
i. If you provide a single cell as the Output Range, then StatPad will place a chart of
the default size with upper-left corner at this cell.
ii. If you provide a rectangular range of cells as the Output Range, then StatPad will
make the chart the same size as your range.
b. If youve asked for numbers and text:
i. If there is enough room without erasing any of your data, StatPad will place the
upper-left cell of the output at the Output Range you specified.
ii. If your results would overwrite any of your data, StatPad will give you the option
of either specifying a different Output Range, or (use caution!) going ahead and
erasing some of your data to make room for the results if you wish.
10. After StatPad performs the analysis you requested (or asks for clarification, if needed),
you will again find the StatPad main dialog box on your screen, ready for further analysis.
You may either continue your analysis with StatPad, or leave StatPad (select Cancel or
hit the Esc key) to return control to Excel and your worksheet.
11. You can format StatPads results because they are part of your Excel spreadsheet, after
leaving the StatPad dialog box by hitting the Esc key or selecting Cancel.
a. You can select and format numbers in individual cells as you ordinarily would in
Excel (for example, using the Number Group of the Home Ribbon). For example, you
can format with dollar signs, set the number of decimal places, format as percentages,
etc.
b. You can customize StatPads charts as you would for any Excel chart. For example,
you might select the chart and then use the Chart Tools Ribbons (Design, Layout, and
Format) at the top of the Excel window. Another method would be to double-click the
part of the chart you wish to change, for example the x axis, to bring up the relevant
formatting options. You might then choose set the scale under Axis Options (e.g., to
change the minimum and/or maximum) or select Number (e.g., to change the number
formatting).

How to Use StatPad

11

12. You can copy StatPads results to your word processor, after leaving the StatPad dialog
box by hitting the Esc key or selecting Cancel.
a. To copy text and numbers to your word processor, proceed as follows:
i. Highlight your cell(s) and choose Copy from the Clipboard Group of the Home
Ribbon.
ii. Activate your word processor and move the cursor to where you want the results
to go.
iii. Depending upon your word processor, you may wish to paste as unformatted text.
The text then becomes part of the text document and you may format it as you
like. For example, with Microsoft Word 2010, you might click on the word
"Paste" in the Clipboard Group of the Home Ribbon, then choose Paste Special
from the Paste Options, to obtain the Unformatted Text choice.
b. To copy charts to your word processor, proceed as follows:
i. Click on the edge of a chart (just one at a time) to select it, then choose Copy from
the Clipboard Group of Excel's Home Ribbon.
ii. Activate your word processor and move the cursor to where you want the chart to
go.
iii. Depending upon your word processor, you may wish to paste as a Picture
(instead of as an Excel object). The chart then becomes part of the text document
and you would be able to place and size it using your word processors
commands. For example, with Microsoft Word 2010, you might click on the
word "Paste" in the Clipboard Group of the Home Ribbon, then choose Paste
Special from the Paste Options, to obtain the Picture (Enhanced Metafile)
choice.
13. For more information about statistical analysis, its applications and interpretation, please
consult a book such as Practical Business Statistics by Andrew F. Siegel (Elsevier /
Academic Press, sixth edition, 2012).

12

Overview of StatPad Features

Overview of StatPad Features


StatPads statistical analyses are grouped into the following situations:
One Sample
Sampling
Probability
Two Sample
Many Sample
Bivariate
Multivariate
Time Series
Quality Control
These situations are presented in a list at the left in StatPads main dialog box. When you select a
situation, the appropriate analyses are available in a list to the right in this dialog box. When you
select a situation and analysis, an explanation also appears in the dialog box and the dialog box
changes to allow you to specify what is needed for the analysis (e.g., a confidence level). Here is
a list of the situations, analyses, and explanations available within StatPad. More details about
each one, with an example, are given on the pages that follow.

One Sample
Summaries

Compute statistical summaries for the data: count, average or mean, median,
smallest, largest, quartiles, standard deviation, and standard error.

Histogram

Draw a histogram to explore the data, showing the shape of the distribution,
typical values, variability, and outliers. Data are concentrated where the
histogram bars are high. Check 'Customize' to specify optional bin width and
landmark point.

Box Plot

Draw a box plot to explore the data, showing the 5-number summary (smallest,
lower quartile, median, upper quartile, and largest). In the ordinary box plot, a
line extends from the box on each side to the most extreme value. Check
Detailed box plot to indicate outliers separately and have the lines extend from
the box on each side to the most extreme value (adjacent value) that is not an
outlier.

Cumulative
Distribution

Draw a cumulative distribution function for the data, showing the percentage of
data values less than each given number. This shows you the percentiles.

Overview of StatPad Features

13

Confidence
Interval

Compute a confidence interval for the population mean. This is statistical


inference about the population, based on random sampling. Two-sided or onesided interval, with your chosen confidence level.

Hypothesis
Test

Test the null hypothesis that the population mean is equal to a given reference
value. This is statistical inference about the population, based on random
sampling. Two-sided or one-sided testing (Student's t test) is used.

Percentile

Given a percentage, find the percentile value. This data value has approximately
this percentage of the data values smaller than it.

Percentile
Ranking

Find the percentage ranking for a given value. This is the approximate
percentage of data values that are less than the given value.

Sampling
Sample
Without
Replacement

Select a random sample from a larger population, without replacement so that


no item can be selected more than once. All population items are equally likely
to appear in the sample, and they are chosen independently of one another.

Sample With
Replacement

Select a random sample from a larger population, with replacement so that an


item may be selected more than once. All population items are equally likely to
appear in the sample, and they are chosen independently of one another.

Uniform
Distribution

Select a random sample from a uniform distribution, where all values are
equally likely between the smallest and largest possible value. By specifying a
name, you will be able to easily use the result later.

Normal
Distribution

Select a random sample from a normal distribution, given the mean and
standard deviation. By specifying a name, you will be able to easily use the
result later.

Binomial
Distribution

Select a random sample from a binomial distribution (the number of


occurrences) given the number of trials and the probability of occurrence. By
specifying a name, you will be able to easily use the result later.

Binomial
Percentages

Select a random sample of binomial percentages, given the number of trials and
the probability of occurrence. By specifying a name, you will be able to easily
use the result later.

14

Overview of StatPad Features

Probability
Normal
Probability

Probabilities for a normal distribution: the symmetric bell-shaped curve, given a


mean and a positive standard deviation.

Binomial
Probability

Probabilities for a binomial distribution: the number of occurrences out of a


given number of independent trials with a given probability.

Binomial
Percent

Probabilities for a binomial percentage, given the number of independent trials


and the probability for each trial.

Poisson
Probability

Probabilities for a Poisson distribution: the number of random occurrences


where the rate is fixed, given the mean number. For example, the number of
orders you will receive next week, if orders occur at a constant rate with an
average of 5 per week.

Exponential
Probability

Probabilities for an exponential distribution: a highly skewed distribution with


no memory, given the mean. For example, the length of a telephone call or the
time until the next customer arrives where the mean is 9 minutes.

Discrete
Probability

Mean (expected value) and standard deviation for a discrete random variable,
given a set of values and their associated probabilities.

Two Samples
Summaries

Compute univariate summaries for each data set. Also find the average
difference and its standard error. If sample sizes are identical, you may indicate
that a pair of measurements was made on each item.

Histograms

Draw a histogram for each data set, for data exploration.

Box Plots

Draw a box plot for each data set, for data exploration, using the same scale for
comparison. In the ordinary box plot, a line extends from the box on each side
to the most extreme value. Check Detailed box plot to indicate outliers
separately and have the lines extend from the box on each side to the most
extreme value (adjacent value) that is not an outlier.

Confidence
Interval

Compute a confidence interval for the population mean difference. This is


statistical inference. Two-sided interval, with chosen confidence level. If
sample sizes are identical, you may indicate that a pair of measurements was
made on each item.

Hypothesis
Test

Test the null hypothesis that the population mean difference is zero. This is
statistical inference. Two-sided testing using Student's t test. If sample sizes are
identical, you may indicate that a pair of measurements was made on each item.

Overview of StatPad Features

15

Many Samples
Summaries

Select as many data sets as you wish. Compute univariate summaries for each.

Histograms

Draw a histogram for each sample, for data exploration.

Box Plots

Draw a box plot for each sample, for data exploration, using the same scale for
comparison. In the ordinary box plot, a line extends from the box on each side
to the most extreme value. Check Detailed box plot to indicate outliers
separately and have the lines extend from the box on each side to the most
extreme value (adjacent value) that is not an outlier.

F Test

One-way analysis of variance (ANOVA). Test the null hypothesis that the
population means are all identical. This is statistical inference.

Mean
Differences

Confidence intervals and hypothesis tests for the difference of each pair of
population means (least-significant-difference test). This is statistical inference.

Bivariate
Scatterplot

Draw a scatterplot to explore the relationship between two variables.

Scatterplot
with Line

Draw a scatterplot with least-squares line to explore the relationship between


two variables.

Correlation

Find the strength of the relationship between two variables as a pure number
where 1 indicates a perfect increasing relationship, -1 a perfect decreasing
relationship, and 0 suggesting no relationship.

Correlation
with Test

Find and test the strength of the relationship between two variables. This is
statistical inference.

Regression

Predict the dependent Y variable from the independent X variable using a


straight-line relationship.

Predicted and Predicted values of Y based on X, the residual difference: Actual Y Predicted
Residuals
Y, and the standardized residuals.
Univariate
Summaries

Compute univariate summaries for each variable.

Histograms

Draw a histogram for each variable, for data exploration.

16
Box Plots

Overview of StatPad Features


Draw a box plot for each variable, for data exploration. In the ordinary box plot,
a line extends from the box on each side to the most extreme value. Check
Detailed box plot to indicate outliers separately and have the lines extend from
the box on each side to the most extreme value (adjacent value) that is not an
outlier.

Multivariate
Scatterplots

Select as many X variables as you wish, but just one Y variable. Draw
scatterplots for all pairs of variables to explore their relationships.

Correlations

Find the strength of the relationship between pairs of variables as a matrix of


correlation coefficients (1 is perfect positive correlation, 1 is perfect negative
correlation, and 0 suggests no relationship).

Regression

Prediction of the dependent Y variable from the independent X variables using a


linear relationship.

Predicted and Predicted values of Y based on the X variables, the residual differences (Actual
Residuals
Y Predicted Y) and the standardized residuals.
Diagnostic
Plot

Look for problems in the regression linear model, such as unequal variability or
nonlinearity.

Univariate
Summaries

Compute univariate summaries for each variable.

Histograms

Draw a histogram of each variable, for data exploration.

Box Plots

Draw a box plot of each variable, for data exploration. In the ordinary box plot,
a line extends from the box on each side to the most extreme value. Check
Detailed box plot to indicate outliers separately and have the lines extend from
the box on each side to the most extreme value (adjacent value) that is not an
outlier.

Time Series
TrendSeasonal

A decomposition into (1) long-term trend (linear or exponential), (2) repeating


seasonal component (monthly or quarterly), (3) wandering cyclic component,
and (4) irregular component. Seasonal adjustment & forecasting. Time must
increase down data column.

Overview of StatPad Features

17

Quality Control
X-Bar, R
Charts

Chart the averages and the ranges of your data to see if this process is in or out
of control. Choose a subgroup size from 2 to 25. You may specify a standard if
one is available.

Pct, Count
Chart

Chart the percents or counts to see if this process is in or out of control. Your
data may be either counts or percentages (counts divided by the sample size).
You may specify a standard if one is available.

18

Overview of StatPad Features

One-Sample Analysis
Summaries
Summaries are used to give you selected
numbers that represent and describe your
data set.
StatPads summaries (below) for Quality
scores show how many data values there are
(n = 50), typically how high the scores are
(X
far
(S

n
i 1

X i / n =90.78), and about how

individual

i 1

scores

are

( X i X ) / (n 1) = 7.56) from
2

the mean. The quartiles are about 1/4 of the


way in from each end (highest and lowest)
while the median is 1/2 way in. The standard
error of the average is S X S / n .
To compute the summaries using StatPads main dialog box, select One Sample as the
situation and Summaries as the analysis. Select your data from the list (or use Add Data if your
column of numbers is in the worksheet but is not in the list), check the Output Range to be sure
that is where you want the results to appear, and then select Do It (or hit the Enter key).

Quality
50
90.78
7.56
72
86
93
97
99
1.069

Summaries
Count n
Mean or average
Standard deviation (variability of individuals)
Smallest
Lower quartile
Median
Upper quartile
Largest
Standard error (variability of sample average, if random sample)

Overview of StatPad Features

19

Histogram
The histogram is used to visually explore
a data set. The data axis is horizontal, and
the bars show how many data values are
within each interval. Data are concentrated
where bars are tall. You can see typical
value, variability, and distribution shape.
StatPads histogram (below) shows that
the Quality scores fall within the interval
from about 70 to 100. They are skewed with a
long tail towards lower values, being more
concentrated in the higher end of the range.
To create a histogram using StatPads
main dialog box, select One Sample as the
situation and Histogram as the analysis. Select your data from the list (or use Add Data if your
column of numbers is in the worksheet but is not in the list), check the Output Range, and then
select Do It.

Frequency

StatPad chooses a default bin width and landmark (which could be a left or right endpoint
of the histogram, or any bin boundary) for the histogram bars. These can be changed using the
Customize check-box (see next item). Note that Excel (not StatPad) chooses the minimum and
maximum horizontal scale. These may be changed (as was done for the chart below) by leaving
StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to find
Minimum and Maximum as Axis Options.

20
10
0
60

70

80
90
Quality

100

20

Overview of StatPad Features

Histogram (With Customized Bin Width and Landmark)


There are often several reasonable
choices for how wide to make the histogram
bars and where to place them left-to-right.
StatPad can choose a default bin width and
landmark for the histogram bars, or you can
specify customized values.
In the customized histogram below, the
bin width has been decreased to 1 to show
more detail (StatPads default bin width for
this data set was 5).
To create a customized histogram using
StatPads main dialog box, select One
Sample as the situation and Histogram as the
analysis. Select your data from the list (or use Add Data if your column of numbers is in the
worksheet but is not in the list). When you click on Customize, two edit-boxes appear: for Bin
Width and for the optional Landmark. You may then click on each and type the value you wish.
The Landmark setting would shift the bars left or right to align on the specified value. Then
check the Output Range, and then select Do It.

Frequency

10

0
60

70

80
90
Quality

100

Overview of StatPad Features

21

Box Plot
The box plot is used to quickly and
visually explore a data set; it shows you a
central box defined by the quartiles, with the
median indicated within the box. In the
ordinary box plot, a line extends from the box
on each side to the most extreme value. In the
detailed box plot, outliers are indicated
separately and these lines extend from the
box on each side to the most extreme value
(adjacent value) that is not an outlier.
In StatPads box plot for Sensitivity
(below left) you see that the middle half of the
data extends from about 60 to 100, with the
median at about 80. The line at the right
extends to the largest at about 180.
StatPads detailed box plot (below right) shows outliers separately, revealing that the
largest value, at about 180, is an outlier.
To display a box plot using StatPads main dialog box, select One Sample as the situation
and Box Plot as the analysis. Select your data from the list (or use Add Data if your column of
numbers is in the worksheet but is not in the list). Click on Detailed box plot if you wish outliers
to be displayed separately. Then check the Output Range, and then select Do It.
Outliers are defined as data values more than 1.5 times the interquartile range away from
either quartile.

50

100
Sensitivity

150

200

50

100
Sensitivity

150

200

22

Overview of StatPad Features

Cumulative Distribution
The cumulative distribution function is
used to show you the percentiles of the data.
Percentages are shown vertically (from 0 to
100%) and data values are horizontal. The
chart shows the percentage of the data values
(vertical scale) that are equal or less to the
given value (horizontal scale).
In StatPads cumulative distribution
function for Quality (below) you can see that
about 10% of the Quality scores are less than
or equal to 80, about 25% of the Quality
scores are less than or equal to 85, and that
about a third are scores of 90 or less.
To compute a cumulative distribution function using StatPads main dialog box, select One
Sample as the situation and Cumulative Distribution as the analysis. Select your data from the
list (or use Add Data if your column of numbers is in the worksheet but is not in the list), check
the Output Range, and then select Do It.

Cumulative Percent

100%
80%
60%
40%
20%
0%

60

70

80
90
Quality

100

Overview of StatPad Features

23

Confidence Interval
A confidence interval for the mean
includes the unknown population mean with
known confidence, e.g., 95%. Random
sampling from a normal population is
assumed.
StatPads two-sided 95% confidence
interval results for Quality (below) tell you
that the bounds of the interval are 88.63 and
92.93.
To compute a confidence interval using
StatPads main dialog box, select One
Sample as the situation and Confidence
Interval as the analysis. Select your data
from the list (or use Add Data if your column of numbers is in the worksheet but is not in the
list), check the Output Range, and then select Do It. You may also change the Confidence level
(from the default 95%) or select a one-sided interval instead of a two-sided interval (see next
item).

Confidence interval for Quality:


We are 95% sure that the
population mean for Quality
is somewhere between
88.63 and 92.93
(assuming a random sample from a normal population).

24

Overview of StatPad Features

Confidence Interval (One-Sided, 99%)


The one-sided interval says, with
specified confidence, that the unknown
population mean is either at least ... (for
an upper confidence interval) or no more
than ... (for a lower confidence interval).
You should decide whether to use a one-sided
or two-sided confidence interval before you
look at the data. You should not use both
upper and lower one-sided confidence
intervals on the same data set; either use a
two-sided interval, or choose just one side for
a one-sided confidence interval. If in doubt,
use a two-sided confidence interval.
StatPads
one-sided
upper
99%
confidence interval for Quality (below) shows you that the bound is 88.21.
To compute a one-sided 99% confidence interval using StatPads main dialog box, select
One Sample as the situation and Confidence Interval as the analysis. Select your data from the
list (or use Add Data if your column of numbers is in the worksheet but is not in the list), click on
the 1-sided box of your choice, set the Confidence Level to 99%, check the Output Range, and
then select Do It.

One-sided upper confidence interval for Quality:


We are 99% sure that the
population mean for Quality
is at least
88.21
(assuming a random sample from a normal population).

Overview of StatPad Features

25

Hypothesis Test
A hypothesis test is used to decide, based
on data, whether or not the unobservable
population mean could reasonably be equal
to a given reference value. Because the
sample average represents (with statistical
error) the unknown population mean, the
result is often stated in terms of a significant
(or nonsignificant) difference between the
sample average and the reference value, both
of which are known. Random sampling from
a normal population is assumed.
StatPads hypothesis test results for
Quality (below) show a very highly
significant difference between the reference
value (given here as 87.5) and the observed average Quality score of 90.78. Results include the t
value, the p value, the practical interpretation of the results, and a formal statement of the null
hypothesis being tested.
To perform a hypothesis test using StatPads main dialog box, select One Sample as the
situation and Hypothesis Test as the analysis. Select your data from the list (or use Add Data if
your column of numbers is in the worksheet but is not in the list), specify the Reference Value,
check the Output Range, and then select Do It. Optionally, you may specify a one-sided test
(upper or lower); see next item.
The p value says that, if the population mean had been equal to the reference value, then p is
the probability of observing such a large (or larger) difference between the sample average and
the reference value. Smaller p values indicate significance because rare events are unlikely.

.
Hypothesis test for Quality:
t = 3.07
p = 0.00350
The sample average
90.78
is highly significantly different (p<0.01)
from the reference value
87.5
We have REJECTED the null hypothesis
that claims that the population mean equals 87.5
and have instead ACCEPTED the research hypothesis
(assuming a random sample from a normal population).

26

Overview of StatPad Features

Hypothesis Test (One-Sided)


A one-sided upper hypothesis test can
decide only whether the sample average is
significantly larger than the reference value.
A one-sided lower hypothesis test can decide
only whether the sample average is
significantly less than the reference value.
You should decide whether to use a one-sided
or two-sided test before you look at the data.
You should not use both upper and lower
one-sided hypothesis tests on the same data
set; either use a two-sided interval, or choose
just one side for a one-sided hypothesis test.
If in doubt, use a two-sided test.
StatPads one-sided upper hypothesis
test results for Quality (below) show that the scores are significantly larger, on average, than the
reference value (given here as 87.5). Results include the t value, the p value, the practical
interpretation of the results, and a formal statement of the null hypothesis being tested.
To perform a one-sided hypothesis test using StatPads main dialog box, select One Sample
as the situation and Hypothesis Test as the analysis. Select your data from the list (or use Add
Data if your column of numbers is in the worksheet but is not in the list), click on the 1-sided box
of your choice, specify the Reference Value, check the Output Range, and then select Do It.

One-sided hypothesis test for Quality:


t = 3.07
p = 0.00175
The sample average
90.78
is highly significantly larger (p<0.01)
than the reference value
87.5
We have REJECTED the null hypothesis
that claims that the population mean is less than or equal to 87.5
and have instead ACCEPTED the research hypothesis
(assuming a random sample from a normal population).

Overview of StatPad Features

27

Percentile
Percentiles are landmarks in the data
that are a known percentage (of the data
values) from smallest to largest. The smallest
data value is the 0th percentile, the largest is
the 100th percentile, the median is the 50th
percentile, and so forth.
In StatPads percentile calculation
(below) the 85th percentile for the Quality
scores is found to be a score of 98. That is,
the score 98 is about 85% of the way (in the
ordered list of scores) from the smallest to the
largest score.
To find a percentile using StatPads main
dialog box, select One Sample as the situation and Percentile as the analysis. Select your data
from the list (or use Add Data if your column of numbers is in the worksheet but is not in the
list), provide the Percentage for which you would like the percentile, check the Output Range,
and then select Do It.

For Quality:
85 th percentile
is 98

28

Overview of StatPad Features

Percentile Ranking
The percentile ranking of a given data
value gives you the percentage of the way
along in the list of data values (from smallest
to largest) that this given data value is.
In StatPads percentile calculation
(below) the Quality score 87.5 is found to be
30% of the way from smallest to largest.
To find a percentile ranking using
StatPads main dialog box, select One
Sample as the situation and Percentile
Ranking as the analysis. Select your data
from the list (or use Add Data if your column
of numbers is in the worksheet but is not in
the list), provide the data Value for which you would like the percentile ranking, check the
Output Range, and then select Do It.

For Quality:
87.5 is the
30 th percentile

Overview of StatPad Features

29

Sampling
Random Sample Without Replacement
A random sample without replacement
is chosen from a population so that (1) all
population units are equally likely to be
chosen, (2) units are selected independently
of one another, and (3) once a unit is chosen,
it cannot be chosen again. All sampled units
are different when sampling without
replacement.
StatPads results (below) show a sample
of 5 selected at random (without replacement)
from a population of size 100. The selected
items (in order) are 19, 25, 59, 67, and 89.
This list of five numbers has also been given a
name (firstSample was chosen here) which
will appear in StatPads lists of data sets.
To select a random sample without replacement using StatPads main dialog box, select
Sampling as the situation and Sample Without Replacement as the analysis. Specify a
Population Size and a Sample Size. Provide an optional name for the resulting data in case you
plan to refer to it later, check the Output Range, and then select Do It.

Random sample of size 5


from population numbered from 1 to 100
chosen without replacement:
firstSample
19
25
59
67
89

30

Overview of StatPad Features

Random Sample With Replacement


A random sample with replacement is
chosen from a population so that (1) all
population units are equally likely to be
chosen, (2) units are selected independently
of one another, and (3) once a unit is chosen,
it is replaced so that it may be chosen again.
The sampled units may or may not all be
different when sampling with replacement.
The StatPad results below show a sample
of 5 selected at random (with replacement)
from a population of size 100. The selected
items (in order) are 43, 51, 55, 55, and 82. Note
that an item (55) was chosen twice. This can
happen when sampling with replacement. This

list of five numbers has also been given a name (secondSample was chosen here) which will
appear in StatPads lists of data sets.
To select a random sample with replacement using StatPads main dialog box, select
Sampling as the situation and Sample With Replacement as the analysis. Specify a Population
Size and a Sample Size. Provide an optional name for the resulting data in case you plan to
refer to it later, check the Output Range, and then select Do It.

Random sample of size 5


from population numbered from 1 to 100
chosen with replacement:
secondSample
43
51
55
55
82

Overview of StatPad Features

31

Uniform Distribution
A uniform distribution generates
numbers, chosen independently of one
another, that are equally likely to fall
anywhere within a specified interval.
In StatPads results (below) five numbers
were selected uniformly from 35 to 45. This
list of five numbers has also been given a
name (uniformSample was chosen here)
which will appear in StatPads lists of data
sets.
To select a uniform sample using
StatPads main dialog box, select Sampling
as the situation and Uniform Distribution as
the analysis. Specify the Smallest and Largest values of the distribution. Specify the Sample
Size. Provide an optional name for the resulting data in case you plan to refer to it later, check
the Output Range, and then select Do It.

Random sample of size 5


selected from a uniform distribution from 35 to 45:
uniformSample
37.28
41.19
39.90
41.81
43.87

32

Overview of StatPad Features

Normal Distribution
A
normal
distribution
generates
numbers, chosen independently of one
another, that follow a bell-shaped
distribution, with values most likely to fall
near the mean and the width of the bell
defined by the standard deviation (Std dev).
Observations fall within one standard
deviation of the mean about 68% of the time.
In StatPads results (below) five numbers
were selected from a normal distribution with
mean 65 and standard deviation 20. This list
of five numbers has also been given a name
(simulatedScores was chosen here) which
will appear in StatPads lists of data sets.
To select a normal sample using StatPads main dialog box, select Sampling as the
situation and Normal Distribution as the analysis. Specify the Mean and Standard Deviation
(Std dev) values of the distribution. Specify the Sample Size. Provide an optional name for the
resulting data in case you plan to refer to it later, check the Output Range, and then select Do It.

Random sample of size 5


selected from a normal distribution with mean 65 and standard deviation 20:
simulatedScores
49.78
69.27
58.02
88.10
63.90

Overview of StatPad Features

33

Binomial Distribution
A binomial distribution is used to
describe the number of times an event
happens out of n trials, where each trial was
performed independently with a fixed
probability.
In StatPads results (below) five numbers
are selected from a binomial distribution with
10 trials each having probability 0.5 of
success. In the first of the five samples, there
were 4 out of 10 successes. In the second
sample, 6 of 10 were successful.
To select a binomial sample using
StatPads main dialog box, select Sampling
as the situation and Binomial Distribution as the analysis. Specify the Number n of trials and
the Probability of each trial. Specify the Sample Size. Provide an optional name for the resulting
data in case you plan to refer to it later, check the Output Range, and then select Do It.

Random sample of size 5 selected from a binomial distribution


representing the number of successes in 10 trials, each with probability 0.5:
4
6
6
5
6

34

Overview of StatPad Features

Binomial Percentages
Binomial percentages describe the
percent or proportion of the time an event
happens out of n trials, where each trial was
performed independently with a fixed
probability.
In StatPads results (below) five binomial
percents were selected from a distribution
with 10 trials each having probability 0.5 of
success. In the first of the five samples, 0.3 or
30% of the 10 trials were successful. In the
second sample, 60% of the 10 were
successful.
To select a sample of binomial
percentages using StatPads main dialog box, select Sampling as the situation and Binomial
Percentages as the analysis. Specify the Number n of trials and the Probability of each trial.
Specify the Sample Size. Provide an optional name for the resulting data in case you plan to
refer to it later, check the Output Range, and then select Do It.

Random sample of size 5 selected from a binomial distribution


representing the percentage of successes in 10 trials, each with probability 0.5:
0.3
0.6
0.4
0.8
0.3

Overview of StatPad Features

35

Probability Calculations
Normal Probability (Greater Than)
A normal distribution generates numbers
according to a bell-shaped distribution, with
values most likely to fall near the mean and
the width of the bell defined by the standard
deviation. Observations fall within one
standard deviation of the mean about 68% of
the time. Probabilities for a normal
distribution are given by the area under the
bell-shaped curve.
StatPads result (below) shows the
probability (0.401) that the specified normal
distribution (with mean 75 and standard
deviation 20) is greater than the given value
(80).
To find a normal probability using StatPads main dialog box, select Probability as the
situation and Normal Probability as the analysis. Choose the type of probability you want
(Greater than, Less than, Between, or Not between), then give the Value(s) requested. Specify the
Mean and Standard Deviation of the normal distribution. Check the Output Range, and then
select Do It.

The probability that a normal random variable


with mean 75 and standard deviation 20
is greater than 80 is:
0.401

36

Overview of StatPad Features

Normal Probability (Between)


When you ask StatPad to find the
probability of being between (or not
between), the dialog box changes to allow
you to specify the two values, lower and
upper.
StatPads result (below) shows the
probability (0.175) that the specified normal
distribution (with mean 75 and standard
deviation 20) is between the two given values
(80 and 90).
To find a normal probability using
StatPads main dialog box, select Probability
as the situation and Normal Probability as
the analysis. Choose the type of probability you want (Greater than, Less than, Between, or Not
between), then give the Value(s) requested. Specify the Mean and Standard Deviation of the
normal distribution. Check the Output Range, and then select Do It.

The probability that a normal random variable


with mean 75 and standard deviation 20
is between 80 and 90 is:
0.175

Overview of StatPad Features

37

Binomial Probability (Equal to)


A binomial distribution describes the
number of times an event happens out of n
trials, where each trial was performed
independently with a fixed probability.
StatPads result (below) shows the
probability (0.205) that a specified binomial
distribution is exactly equal to 4. That is, the
probability is 0.205 of observing exactly 4
successes out of 10 independent trials with
probability 0.5 for each trial.
To find a binomial probability using
StatPads main dialog box, select Probability
as the situation and Binomial Probability as
the analysis. Choose the type of probability you want (Equal to, This or more, This or less, or
Between), then give the Value(s) requested. Specify the Number n of trials and the Probability
for each trial of the binomial distribution. Check the Output Range, and then select Do It.
If you specify an Equal to value that is not a whole number, StatPad correctly reports the
resulting probability as zero because a binomial random variable gives the number of successes
(which must be a whole number).

The probability that a binomial random variable


with 10 repeated trials, each with probability 0.5
is equal to 4 is:
0.205

38

Overview of StatPad Features

Binomial Probability (This or Less)


You can also ask StatPad to find the
probability that a binomial random variable
is This value or more, This value or
less, or Between two values.
StatPads result (below) shows the
probability (0.377) that a specified binomial
distribution is 4 or less. That is, the
probability is 0.377 of observing exactly 0, 1,
2, 3, or 4 successes out of 10 independent
trials with probability 0.5 for each trial.
To find a binomial probability using
StatPads main dialog box, select Probability
as the situation and Binomial Probability as
the analysis. Choose the type of probability you want (Equal to, This or more, This or less, or
Between), then give the Value(s) requested. Specify the Number n of trials and the Probability
for each trial of the binomial distribution. Check the Output Range, and then select Do It.

The probability that a binomial random variable


with 10 repeated trials, each with probability 0.5
is 4 or less is:
0.377

Overview of StatPad Features

39

Binomial Percent (Equal to)


A binomial percentage describes the
percent or proportion of the time an event
happens out of n trials, where each trial was
performed independently with a fixed
probability.
StatPads result (below) shows the
probability (0.117) that a specified binomial
percentage distribution is exactly equal to
70%. That is, the probability is 0.117 of
observing exactly 70% successes out of 10
independent trials (this would be 7 successes
out of the 10 trials) with probability 0.5 for
each trial.
To find a probability for a binomial percent using StatPads main dialog box, select
Probability as the situation and Binomial Percent as the analysis. Choose the type of probability
you want (Equal to, This or more, This or less, or Between), then give the Value(s) requested as
a percentage. Specify the Number n of trials and the Probability for each trial of the binomial
distribution. Check the Output Range, and then select Do It.

The probability that a binomial percentage


with 10 repeated trials, each with probability 0.5
is equal to 70% is:
0.117

40

Overview of StatPad Features

Binomial Percent (Between)


You can also ask StatPad to find the
probability that a binomial percent is This
value or more, This value or less, or
Between two values.
StatPads result (below) shows the
probability (0.171) that a specified binomial
percentage distribution is between 70% and
90%. That is, the probability is 0.171 of
observing between 70% and 90% successes
out of 10 independent trials with probability
0.5 for each trial (which, in this case,
corresponds to 7, 8, or 9 occurrences
representing 70%, 80%, or 90% successes).
To find a probability for a binomial percent using StatPads main dialog box, select
Probability as the situation and Binomial Percent as the analysis. Choose the type of probability
you want (Equal to, This or more, This or less, or Between), then give the Value(s) requested as
a percentage. Specify the Number n of trials and the Probability for each trial of the binomial
distribution. Check the Output Range, and then select Do It.

The probability that a binomial percentage


with 10 repeated trials, each with probability 0.5
is between 70% and 90% is:
0.171

Overview of StatPad Features

41

Poisson Probability (Equal to)


A Poisson distribution describes the
number of times an event happens, where the
event happens independently at a fixed mean
rate.
StatPads result (below) shows the
probability (0.0337) that a specified Poisson
distribution is exactly equal to 1. That is, the
probability is 0.0337 of observing exactly 1
occurrence of the event when we expect on
average to see 5 occurrences. The probability
is small because we expect many more (5
occurrences), on average, but may
occassionally (about 3% of the time) see just
one.
To find a Poisson probability using StatPads main dialog box, select Probability as the
situation and Poisson Probability as the analysis. Choose the type of probability you want
(Equal to, This or more, This or less, or Between), then specify the whole-number Value(s) and
the Mean rate of occurrence (which is not required to be a whole number) check the Output
Range, and then select Do It.

The probability that a Poisson random variable


with mean 5
is equal to 1 is:
0.0337

42

Overview of StatPad Features

Poisson Probability (This or Less)


You can also ask StatPad to find the
probability that a Poisson random variable is
This value or more, This value or less,
or Between two values.
StatPads result (below) shows the
probability (0.265) that the specified Poisson
distribution is 3 or less. That is, the
probability is 0.265 of observing exactly 0, 1,
2, or 3 occurrences when we expect on
average to see 5 occurrences.
To find a Poisson probability using
StatPads main dialog box, select Probability
as the situation and Poisson Probability as
the analysis. Choose the type of probability you want (Equal to, This or more, This or less, or
Between), then give the whole-number Value(s) requested. Specify the Mean rate of occurrence
of the Poisson distribution (which is not required to be a whole number) check the Output
Range, and then select Do It.

The probability that a Poisson random variable


with mean 5
is 3 or less is:
0.265

Overview of StatPad Features

43

Exponential Probability (This or More)


The exponential distribution is a skewed
continuous distribution that is often used to
model the amount of time until a task is
completed or until an event happens. The
distribution is specified by giving its mean,
which is not required to be a whole number.
StatPads result (below) shows the
probability (0.768) that the specified
exponential distribution is 2.38 or more. That
is, the probability is 0.768 of observing 2.38
or more when we expect 9 on average.
To find an exponential probability using
StatPads main dialog box, select Probability
as the situation and Exponential Probability as the analysis. Choose the type of probability you
want (This or more, This or less, Between, or Not between), then specify the Value(s) and the
Mean, check the Output Range, and then select Do It.

The probability that an exponential random variable


with mean 9
is 2.38 or more is:
0.768

44

Overview of StatPad Features

Exponential Probability (Between)


You can also ask StatPad to find the
probability that an exponential random
variable is This value or less, Between
two values, or Not between two values.
StatPads result (below) shows the
probability (0.243) that the specified
exponential distribution is between 5.2 and
10.3. That is, the probability is 0.243 of
observing a value between 5.2 and 10.3 when
we expect 9 on average.
To find an exponential probability using
StatPads main dialog box, select Probability
as the situation and Exponential Probability
as the analysis. Choose the type of probability you want (This or more, This or less, Between, or
Not between), then specify the Value(s) and the Mean, check the Output Range, and then select
Do It.

The probability that an exponential random variable


with mean 9
is between 5.2 and 10.3 is:
0.243

Overview of StatPad Features

45

Discrete Probability
A discrete probability distribution is
characterized by two lists: a list of values and
a list of probabilities (where the probabilities
must add up to 1). StatPad computes the
Expected Value (also called the Mean) as the
weighted average of the values (using
probabilities as the weights) and also
computes the standard deviation, once you
specify these two columns of numbers.
StatPads results for a situation with
three possibilities is shown below, where the
probability is 0.2 that profit is 3
($thousands), the probability is 0.5 that profit
is 5, and probability is 0.3 that profit is 8.
These are specified as two separate columns of numbers, each with its name (Profit is a
column containing 3, 5, and 8, while ProbabilityOfProfit is a column containing 0.2, 0.5, and
0.3 which properly add up to 1). We see from the results below that the expected value is $5.5
thousand and the standard deviation (measuring the risk of this situation) is $1.8 thousand for
this discrete random variable.
To compute mean and standard deviation for a discrete random variable, using StatPads
main dialog box, select Probability as the situation and Discrete Probability as the analysis.
Select one from each of the two lists (or use Add Data if your columns of numbers are in the
worksheet but are not in the lists) being sure to correctly specify which one contains the values
and which one contains the probabilities. Next check the Output Range and then select Do It.

For the discrete random variable with values in Profit


and probabilities in ProbabilityOfProfit, we have:
5.50
Mean (or expected value)
1.80
Standard Deviation

46

Overview of StatPad Features

Two-Sample Analysis
Summaries
Summaries are used to give you selected
numbers that represent and describe your
data sets. When you have two samples,
StatPad first reports summaries for each
sample separately, then gives the average
difference and the standard error of the
average difference, indicating the sampling
variability of the average difference. Note
that the two samples are assumed to have the
same measurement units (e.g., dollars).
StatPads two-sample summaries (below)
are shown for the results of a survey sent to
customers in the East and to those in the
West.
To compute summaries for two samples using StatPads main dialog box, select Two
Samples as the situation and Summaries as the analysis. Select a data set from each list (or use
Add Data if your columns of numbers are in the worksheet but are not in the lists). You may
(optionally) click on Paired to specify that the data sets have a natural pairing if the counts are
equal for the two data sets. Next check the Output Range and then select Do It.
The Paired check-box only affects the standard error of the difference. For a paired
situation, StatPad gives the ordinary standard error for the paired differences. For an unpaired
situation, StatPad uses the large-sample formula S12 / n1 S22 / n2 if both counts are at least 30.
Otherwise, StatPad uses the small-sample formula (assuming equal population variabilities)
(n1 1)S12 (n2 1)S22 1 / n1 1 / n2 / (n1 n2 2) .
East
17
1,834
661
752
1,295
1,931
2,426
2,975
160
557
239

West
19
2,390
761
836
2,004
2,294
2,853
4,085
175

Summaries
Count n
Mean or average
Standard deviation (variability of individuals)
Smallest
Lower quartile
Median
Upper quartile
Largest
Standard error (variability of sample average, if random sample)

Average difference, West East


Standard error of the difference
using the small-sample unpaired formula,
which assumes equal population variabilities.

Overview of StatPad Features

47

Histograms
Histograms are used to visually explore
data sets. The data axis is horizontal, and the
bars show how many data values are within
each interval. Data are concentrated where
bars are tall. You can see typical value,
variability, and distribution shape.
StatPads histograms are shown below
for the East and West survey data, one
histogram for each data set.
To create histograms for two samples
using StatPads main dialog box, select Two
Samples as the situation and Histograms as
the analysis. Select a data set from each list
(or use Add Data if your columns of numbers are in the worksheet but are not in the lists). Next
check the Output Range and then select Do It.
StatPad chooses a default bin width and landmark for the histogram bars. If you wish to
change these, use the Customize check-box found under One Sample, Histogram. Note that
Excel (not StatPad) chooses the minimum and maximum horizontal scale. These may be changed
by leaving StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to
find Minimum and Maximum as Axis Options.

Frequency

Frequency

4
3
2
1
0

10
5
0

1000

2000
East

3000

1000

2000

3000

West

4000

5000

48

Overview of StatPad Features

Box Plots
Box plots are used to visually explore
and compare data sets; they show you a
central box defined by the quartiles, with the
median indicated within the box. In the
ordinary box plot, a line extends from the box
on each side to the most extreme value. In the
detailed box plot, outliers are indicated
separately and these lines extend from the
box on each side to the most extreme value
(adjacent value) that is not an outlier.
StatPads detailed box plots are shown
below, on the same scale, for the East and
West survey data. Note that the western
values are generally somewhat higher,
although there is considerable overlap. There are no outliers.
To create box plots for two samples using StatPads main dialog box, select Two Samples
as the situation and Box Plots as the analysis. Select a data set from each list (or use Add Data
if your columns of numbers are in the worksheet but are not in the lists). Click on Detailed box
plot if you wish outliers to be displayed separately. Next check the Output Range and then select
Do It.
Outliers are defined as data values more than 1.5 times the interquartile range away from
either quartile.

1000

2000

3000

4000

East (bottom), West (top)

5000

Overview of StatPad Features

49

Confidence Interval
A two-sample confidence interval for the
population mean difference includes this
unknown population mean difference with
known confidence, e.g., 95%, when random
sampling is used and normal distributions are
assumed.
StatPads 95% confidence interval
results (below) for the mean difference, West
minus East, tell you that the bounds of the
interval are 71.16 and 1,042.42.
To compute a two-sample confidence
interval using StatPads main dialog box,
select Two Samples as the situation and Confidence Interval as the analysis. Select a data set
from each list (or use Add Data if your columns of numbers are in the worksheet but are not in
the lists). You may (optionally) change the Confidence level (from the default 95%). You may
also (optionally) click on Paired to specify that the data sets have a natural pairing if the counts
are equal for the two data sets. Next check the Output Range and then select Do It.
The two-sample confidence interval is based on the standard error of the difference,
described previously under Two Sample, Summaries. If unpaired, random sampling from each of
two normal populations is assumed (also assuming equal population variabilities if the smallsample standard error is used). If paired, random sampling from a normal population is
assumed for the differences formed from the two measurements on each unit sampled.

Confidence interval for the difference:


West East
We are 95% sure that the
population mean difference is between
71.16 and 1,042.42
using the small-sample unpaired standard error,
which assumes equal population variabilities, and also
assuming random samples from normal populations.

50

Overview of StatPad Features

Hypothesis Test
A two-sample hypothesis test is used to
decide, based on data, whether or not the
unobservable population means could
reasonably be equal to each other. Because
the sample averages represent (with
statistical error) their respective unknown
population means, the result is often stated in
terms of a significant (or nonsignificant)
difference between the sample averages, both
of which are known.
StatPads two-sample hypothesis test
results for the East and West survey (below)
show a significant difference between the two
regions (East and West) on average. Results
include the t value, the p value, the practical interpretation of the results, and a formal statement
of the null hypothesis being tested.
To perform a two-sample hypothesis test using StatPads main dialog box, select Two
Samples as the situation and Hypothesis Test as the analysis. Select a data set from each list (or
use Add Data if your columns of numbers are in the worksheet but are not in the lists). You may
(optionally) click on Paired to specify that the data sets have a natural pairing if the counts are
equal for the two data sets. Next check the Output Range and then select Do It.
The two-sample hypothesis test is based on the standard error of the difference, described
previously under Two Sample Summaries. If unpaired, random sampling from each of two
normal populations is assumed (also assuming equal population variabilities if the small-sample
standard error is used). If paired, random sampling from a normal population is assumed for the
differences formed from the two measurements on each unit sampled.
The p value says that, if the population means had been equal to each other, then p is the
probability of observing such a large (or larger) difference between the sample averages.
Smaller p values indicate significance because rare events are unlikely.
Hypothesis test for East and West:
t = 2.33
p = 0.026
The sample averages
1,834 and 2,390
are significantly different (p<0.05).
We have REJECTED the null hypothesis
that claims that the population means are equal
and have instead ACCEPTED the research hypothesis
using the small-sample unpaired standard error,
which assumes equal variabilities, and assuming
random samples from normal populations.

Overview of StatPad Features

51

Many-Sample Analysis
Summaries
Summaries are used to give you selected
numbers that represent and describe your
data sets. When you have many samples,
StatPad reports summaries for each sample
separately.
StatPads
many-sample
summaries
(below) are shown for the quality scores of
four suppliers (defining four samples). For
example, supplier B had 35 scores listed, with
an average of 85.14.
To compute summaries for many samples
using StatPads main dialog box, select Many
Samples as the situation and Summaries as
the analysis. Select your data sets from the list (or use Add Data if your columns of numbers are
in the worksheet but are not in the list). Next check the Output Range and then select Do It.

SupplierA SupplierB SupplierC SupplierD Summaries


20
35
15
25
Count n
90.97
85.14
76.00
89.35 Mean or average
4.68
4.84
3.73
5.05 Standard deviation (variability of individuals)
82.41
74.31
71.11
75.97 Smallest
88.57
81.61
73.10
86.77 Lower quartile
90.13
84.97
75.22
88.80 Median
93.00
89.05
79.56
93.24 Upper quartile
99.63
94.61
82.44
98.54 Largest
1.05
0.82
0.96
1.01 Standard error (variability of sample average, if
random sample)

52

Overview of StatPad Features

Histograms
Histograms are used to visually explore
data sets. The data axis is horizontal, and the
bars show how many data values are within
each interval. Data are concentrated where
bars are tall. You can see typical value,
variability, and distribution shape.
StatPads
many-sample
histograms
(below) are shown for the quality scores of
the four suppliers. Some of the horizontal
scales have been changed using Excel chart
commands (see below) because Excels
choice did not show enough detail.
To create histograms for many samples
using StatPads main dialog box, select Many Samples as the situation and Histograms as the
analysis. Select your data sets from the list (or use Add Data if your columns of numbers are in
the worksheet but are not in the list). Next check the Output Range and then select Do It.
StatPad chooses a default bin width and landmark for the histogram bars. If you wish to
change these, use the Customize check-box found under One Sample, Histogram. Note that
Excel (not StatPad) chooses the minimum and maximum horizontal scale. These may be changed
by leaving StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to
find Minimum and Maximum as Axis Options.
15

Frequency

Frequency

4
2
0

10
5
0

70

80

90

100

70

80

SupplierA

100

90

100

SupplierB

Frequency

4
Frequency

90

3
2
1
0

10
5
0

70

75

80
SupplierC

85

70

80
SupplierD

Overview of StatPad Features

53

Box Plots
Box plots are used to visually explore
and quickly compare data sets; they show you
a central box defined by the quartiles, with
the median indicated within the box. In the
ordinary box plot, a line extends from the box
on each side to the most extreme value. In the
detailed box plot, outliers are indicated
separately and these lines extend from the
box on each side to the most extreme value
(adjacent value) that is not an outlier.
StatPads many-sample detailed box
plots (below) are shown for the quality scores
of the four suppliers, arranged on the same
scale for easy comparison. There is one box
plot for each supplier. Suppliers A and D seem to have the highest scores overall, while supplier
C has the lowest. Supplier D has a low outlier score. The horizontal scale was changed using
Excel chart commands (see below) because Excels choice did not show enough detail.
To create box plots for many samples using StatPads main dialog box, select Many
Samples as the situation and Box Plots as the analysis. Select your data sets from the list (or use
Add Data if your columns of numbers are in the worksheet but are not in the list). Click on
Detailed box plot if you wish outliers to be displayed separately. Next check the Output Range
and then select Do It.
Outliers are defined as data values more than 1.5 times the interquartile range away from
either quartile. Note that Excel (not StatPad) chooses the minimum and maximum horizontal
scale. These may be changed by leaving StatPad by hitting the Esc key or selecting Cancel, then
double-clicking on the axis to find Minimum and Maximum as Axis Options.

70

80

90

100

bottom to top: SupplierA, SupplierB, SupplierC,


SupplierD

54

Overview of StatPad Features

F Test for One-Way ANOVA


The F test for one-way ANOVA (analysis
of variance) is used to decide, based on data,
whether or not the unobservable population
means could reasonably be equal to each
other. Because the sample averages represent
(with statistical error) their respective
unknown population means, the result is often
stated in terms of a significant (or
nonsignificant) difference among the sample
averages, all of which are known. It is
assumed that samples are drawn randomly
from normal populations with equal
variabilities.
StatPads F test results for the four
supplier quality scores (below) show very highly significant quality differences, on average,
from one supplier to another. Results include the F statistic, the p value, the practical
interpretation of the results, and a formal statement of the null hypothesis being tested. Excel
listed the p value (p = 5.46E15) in scientific notation; it is actually very small (p =
0.00000000000000546, with the decimal point moved 15 places to the left). You may reformat
StatPads results, for example using the Number Area of the Home Ribbon (after leaving the
StatPad dialog box by hitting the Esc key or selecting Cancel).
To perform a many-sample hypothesis test using StatPads main dialog box, select Many
Samples as the situation and F Test as the analysis. Select your data sets from the list (or use
Add Data if your columns of numbers are in the worksheet but are not in the list). Next check the
Output Range and then select Do It.
The p value says that, if the population means had all been equal to each other, then p is the
probability of observing such large (or larger) differences among the sample averages. Smaller p
values indicate significance because rare events are unlikely.

One-way ANOVA (Analysis of Variance) Hypothesis test for SupplierA,


SupplierB, SupplierC, SupplierD
F = 34.519
DF = 3 and 91
p = 5.46E15
The sample averages
are very highly significantly different (p<0.001)
We have REJECTED the null hypothesis
that claims that the population means are all equal
and have instead ACCEPTED the research hypothesis
(assuming random samples from normal populations
with equal variability).

Overview of StatPad Features

55

Mean Differences
If your F test is significant, indicating
that there are significant differences among
the averages, you may be interested in
learning which pairs in particular show
differences. The least-significant-difference
test can be used to provide a hypothesis test
and a confidence interval for each pair of
data sets. It is assumed that samples are
drawn randomly from normal populations
with equal variabilities.
StatPads many-sample mean-differences
results for the four supplier quality scores
(below) show that all pairs of suppliers show
very highly significant differences (p<0.001)
with the exception of suppliers A and D. This corresponds well with to visual impression from
the box plots created earlier. Note that 99% confidence intervals were used. Note that with four
suppliers there are six pairs of suppliers.
To perform a many-sample mean-difference analysis using StatPads main dialog box, select
Many Samples as the situation and Mean Differences as the analysis. Select your data sets from
the list (or use Add Data if your columns of numbers are in the worksheet but are not in the list).
Specify the confidence level to be used in computing the confidence intervals for the mean
differences. Next check the Output Range and then select Do It.

Sample1
SupplierA
SupplierA
SupplierA
SupplierB
SupplierB
SupplierC

Sample 2 Avg Diff


SupplierB
5.82
SupplierC 14.96
SupplierD
1.62
SupplierC
9.14
SupplierD
4.20
SupplierD
13.34

99%
LowerCI
9.30
19.19
5.34
12.96
0.96
9.30

99%
UpperCI
2.35
10.73
2.10
5.31
7.45
17.39

t
4.41
9.30
1.15
6.29
3.41
8.67

p
2.83E05
7.46E15
0.255012
1.1E08
0.000976
1.52E13

Significant?
Yes (p<0.001)
Yes (p<0.001)
No (p>0.05)
Yes (p<0.001)
Yes (p<0.001)
Yes (p<0.001)

56

Overview of StatPad Features

Bivariate Analysis
Scatterplot
A scatterplot is used to visually explore a
bivariate data set, showing the distribution of
two measurements (X and Y) that describe
each item in a sample. The X axis is
horizontal and Y is vertical. Each item is
represented by one point in the scatterplot.
You can see if there is a linear or nonlinear
relationship, if the variability is equal or not,
if there is clustering or if outliers are present.
StatPads scatterplot of coupon price (X)
and bid price (Y) for a group of tax-exempt
bonds is shown below. There is a strong
linear (straight-line) increasing relationship:
bonds that pay a higher coupon are worth
more. One bond stands out (an outlier?) with a lower coupon and price than the others.
To create a scatterplot using StatPads main dialog box, select Bivariate as the situation
and Scatterplot as the analysis. Select a data set from each list (or use Add Data if your columns
of numbers are in the worksheet but are not in the lists). Your X variable will be on the
horizontal axis, with Y on the vertical axis. Next check the Output Range and then select Do It.

105

price

100
95
90
85
4

coupon

Overview of StatPad Features

57

Scatterplot with Least-Squares Line


The least-squares line summarizes the
relationship between the two variables in a
bivariate data set. The line gives the best
prediction or explanation of Y for each value
of X.
StatPads scatterplot with least-squares
line for the coupon prices (X) and bid prices
(Y) of tax-exempt bonds is shown below. The
impression of a strong linear (straight-line)
increasing relationship is confirmed by the
line.
To create a scatterplot with least-squares
line using StatPads main dialog box, select
Bivariate as the situation and Scatterplot with Line as the analysis. Select a data set from each
list (or use Add Data if your columns of numbers are in the worksheet but are not in the lists).
Your X variable will be on the horizontal axis, with Y on the vertical axis. Next check the Output
Range and then select Do It.

105

price

100
95
90
85
4

coupon

58

Overview of StatPad Features

Correlation
The correlation between two variables
indicates the strength of their relationship as
a pure number. A perfect linear relationship
(i.e., all points exactly along a straight line)
has correlation either 1 or 1 depending on
whether it is increasing or decreasing. If
there is no relationship, the correlation will
be close to 0 (although there can be a
nonlinear relationship with correlation 0). If
all points fall on a horizontal or vertical line,
the correlation is undefined.
StatPad finds the correlation between
coupon payment and bond price to be 0.945
(below). This is a strong correlation, close to
1, summarizing the strong increasing relationship visible in the scatterplot.
To compute a correlation using StatPads main dialog box, select Bivariate as the situation
and Correlation as the analysis. Select a data set from each list (or use Add Data if your
columns of numbers are in the worksheet but are not in the lists). Next check the Output Range
and then select Do It.

0.945 Correlation between coupon and price

Overview of StatPad Features

59

Correlation with Test


Suppose you find a correlation value that
is not close to zero, suggesting a relationship
between the two variables. Could this just be
random coincidence, an artifact of random
sampling? Or is the apparent relationship
larger than would ordinarily occur if there
were no relationship in the population? This
can be decided using a hypothesis test. It is
assumed that you have a random sample from
a bivariate normal population.
StatPad finds a very highly significant
correlation between coupon payment and
bond price (shown below), confirming our
impression of a strong relationship. Because
the population correlation would be zero if there were no relationship, rejecting this hypothesis
leads to the conclusion of significant correlation or significant association. StatPads results
include the correlation, the t value, the p value, the practical interpretation of the results, and a
formal statement of the null hypothesis being tested. When testing a correlation, the test result is
the same as the t test in a regression analysis.
To compute and test a correlation using StatPads main dialog box, select Bivariate as the
situation and Correlation with Test as the analysis. Select a data set from each list (or use Add
Data if your columns of numbers are in the worksheet but are not in the lists). Next check the
Output Range and then select Do It.

0.945
Correlation between coupon and price
12.23
t
3.71E10 p
The correlation
is very highly significantly different (p<0.001)
from the reference value zero.
We have REJECTED the null hypothesis
that claims that the population correlation is zero
and have instead ACCEPTED the research hypothesis
(assuming a random sample from a bivariate normal population).

60

Overview of StatPad Features

Regression
Regression is used to predict or explain
the Y variable from the X variable, using the
least-squares line. The regression line
summarizes the form of the relationship and
can be used to predict Y for a new value of X.
StatPads regression analysis of the bond
data (below) shows that prices are
approximately $48.326 plus 8.730 times the
coupon value. Results initially include the R2
value, the standard error of estimate, the
number of observations, the F statistic, and
the p value. The regression table gives
confidence intervals, standard errors, t, and p
values for the constant term and the
regression coefficient for coupon (the X variable). The practical interpretation of these results
then follows.
To perform a regression analysis using StatPads main dialog box, select Bivariate as the
situation and Regression as the analysis. Select a data set from each list (or use Add Data if
your columns of numbers are in the worksheet but are not in the lists). You may optionally
change the Confidence level (from the default 95%). Next check the Output Range and then
select Do It.
Regression analysis to predict price from coupon.
The prediction equation is:
price =
48.326
+8.730 coupon
0.893
1.034
20
149.577
3.71E10

Constant
coupon

R squared
Standard error of estimate
Number of observations
F statistic
p value
95%
95%
Coeff
LowerCI
UpperCI
48.326
39.594
57.058
8.730
7.230
10.230

StdErr
4.156
0.714

t
11.627
12.230

p
8.38E10
3.71E10

Significant?
Yes (p<0.001)
Yes (p<0.001)

The R-squared value, 89.3%, indicates the proportion of the variance of price
that is explained by the regression model.
Thus coupon explains
a very highly significant proportion of the variation in price, based on the F test (p<0.001).
The standard error of estimate, 1.034, indicates the typical size
of errors made in predicting price using the regression model.
We estimate that:
8.730 is the increase in price associated with an increase in coupon of 1 unit. This is very highly
significant (p<0.001).

Overview of StatPad Features

61

Predicted and Residuals


The predicted value indicates the value
of Y expected, on average, for a given X
value. The least-squares line can be used to
compute a predicted Y value corresponding
to each X value in the data set. This
represents the height of the line at each X
value in the scatterplot with least-squares
line.
The residual indicates by how much each
Y value is above (or below, if negative
residual) the expected value according to the
least-squares line. The standardized residual
measures this difference relative to the
standard error of estimate and may be
interpreted as a number of standard deviations so that (for a normal distribution) you would
expect about 68% of the standardized residuals to be between 1 and 1, you would expect about
95% to be between 2 and 2, making it easy to spot unusual observations that are not close to
the line.
Predicted and residual values from StatPads regression analysis of the bond data are
shown below.
To find predicted and residual values using StatPads main dialog box, select Bivariate as
the situation and Predicted and Residuals as the analysis. Select a data set from each list (or use
Add Data if your columns of numbers are in the worksheet but are not in the lists). Next check
the Output Range and then select Do It.
Predicting price from coupon:
Predicted
Residuals
Std Resid
101.799
102.453
101.580
98.525
100.707
100.707
96.342
97.433
97.215
98.088
96.342
97.433
96.342
99.616
102.453
102.453
91.977
96.342
102.017
101.799

-0.799
-0.328
-1.080
-1.400
-0.707
0.168
1.033
1.067
0.035
0.162
1.158
1.317
1.033
0.634
0.297
0.297
-2.727
0.408
-0.017
-0.549

-0.772
-0.317
-1.044
-1.353
-0.684
0.162
0.999
1.031
0.034
0.156
1.119
1.273
0.999
0.613
0.287
0.287
-2.637
0.394
-0.016
-0.530

62

Overview of StatPad Features

Univariate Summaries
Summaries are used to give you selected
numbers that represent and describe your
data sets. When you have bivariate data, you
can find summaries separately for each
variable.
StatPads univariate summaries for
bivariate data (below) are shown for the
coupon rates and prices of bonds.
To compute univariate summaries for
bivariate data using StatPads main dialog
box, select Bivariate as the situation and
Univariate Summaries as the analysis. Select
a data set from each list (or use Add Data if
your columns of numbers are in the worksheet but are not in the lists). Next check the Output
Range and then select Do It.

coupon
20
5.814
0.332
5.000
5.550
5.813
6.125
6.200
0.074

price
20
99.081
3.072
89.250
97.375
99.375
101.125
102.750
0.687

Summaries
Count n
Mean or average
Standard deviation (variability of individuals)
Smallest
Lower quartile
Median
Upper quartile
Largest
Standard error (variability of sample average, if random sample)

Overview of StatPad Features

63

Histograms
Histograms are used to visually explore
data sets. The data axis is horizontal, and the
bars show how many data values are within
each interval. Data are concentrated where
bars are tall. You can see typical value,
variability, and distribution shape.
StatPads histograms are shown below
for the coupon rates and prices of bonds, one
histogram for each variable.
To create histograms for bivariate data
using StatPads main dialog box, select
Bivariate as the situation and Histograms as
the analysis. Select a data set from each list
(or use Add Data if your columns of numbers are in the worksheet but are not in the lists). Next
check the Output Range and then select Do It.
StatPad chooses a default bin width and landmark for the histogram bars. If you wish to
change these, use the Customize check-box found under One Sample, Histogram. Note that
Excel (not StatPad) chooses the minimum and maximum horizontal scale. These may be changed
by leaving StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to
find Minimum and Maximum as Axis Options.

Frequency

Frequency

6
4
2
0

4
2
0

4.5

5.5

coupon

6.5

85

90

95
price

100

105

64

Overview of StatPad Features

Box Plots
Box plots are used to visually explore
and compare data sets; they show you a
central box defined by the quartiles, with the
median indicated within the box. In the
ordinary box plot, a line extends from the box
on each side to the most extreme value. In the
detailed box plot, outliers are indicated
separately and these lines extend from the
box on each side to the most extreme value
(adjacent value) that is not an outlier.
StatPads detailed box plots are shown
below, on separate scales, for the coupon
rates and prices of bonds. There are no
coupon outliers, but there is one lower price
outlier.
To create box plots for bivariate data using StatPads main dialog box, select Bivariate as
the situation and Box Plots as the analysis. Select a data set from each list (or use Add Data if
your columns of numbers are in the worksheet but are not in the lists). Click on Detailed box plot
if you wish outliers to be displayed separately. Next check the Output Range and then select Do
It.
Outliers are defined as data values more than 1.5 times the interquartile range away from
either quartile. With bivariate data, StatPad creates box plots separately for each variable
because bivariate data are often measured in different units. To see box plots on the same scale,
you would select Two Sample, Box Plots from StatPads main dialog box.

4
coupon

85

90

95
price

100

105

Overview of StatPad Features

65

Multivariate Analysis
and Multiple Regression
Scatterplots
With multivariate data, scatterplots can
be drawn for each pair of variables. A
scatterplot is used to visually explore a
bivariate data set, showing the distribution of
the two measurements
StatPads results show scatterplots of all
pairs of variables (below) for a mail-order
firms multivariate data set consisting of
information on recently received catalog
orders with questionnaires attached: order
size, income, education, and region (East or
West). Scatterplots involving region (West)
look different because it is coded as an
indicator variable with 1 = West and 0=East.

100

100

80

80

80

60
40

60
40

20

20

0
0

10

15

20

Order

100

Order

Order

To create scatterplots for multivariate data using StatPads main dialog box, select
Multivariate as the situation and Scatterplots as the analysis. Choose one of your data sets to be
the Y variable, and select the others from the list of X variables (or use Add Data if your
columns of numbers are in the worksheet but are not in the lists). Next check the Output Range
and then select Do It.

25

0
0

50,000

150,000

0.8

0.8

0.6

0.6

0.4

Education

0.4

20

25

0.6

0.8

0.4
0.2

0
15

0.2

West

0.2
10

West

West

Income

100,000

Income

120,000
100,000
80,000
60,000
40,000
20,000
0
5

40
20

Education

60

0
0

10

15

Education

20

25

50,000

100,000

Income

150,000

66

Overview of StatPad Features

Correlations
With multivariate data, the correlation
can be calculated for each pair of variables
and these can be displayed in a table (the
correlation matrix). The correlation between
two variables indicates the strength of their
relationship as a pure number.
StatPads correlation matrix for the
catalog-order data is shown below. Note that
the diagonal values are all 1 because each
variable is perfectly correlated with itself.
The highest correlations are Order with West
(0.700), Education with Income (0.607), and
Order with Education (0.564), corresponding
to the scatterplots with the clearest tilt.
To find the correlations for multivariate data using StatPads main dialog box, select
Multivariate as the situation and Correlations as the analysis. Choose one of your data sets to
be the Y variable, and select the others from the list of X variables (or use Add Data if your
columns of numbers are in the worksheet but are not in the lists). Next check the Output Range
and then select Do It.

Correlation
Order
Education
Income
West

Order
1.000
0.564
0.158
0.700

Education
0.564
1.000
0.607
0.207

Income
0.158
0.607
1.000
0.011

West
0.700
0.207
0.011
1.000

Overview of StatPad Features

67

Multiple Regression
Multiple regression is used to predict or
explain the Y variable from two or more X
variables, using the best (least-squares) linear
relationship.
The
prediction
equation
summarizes the form of the relationship and
can be used to predict Y given new values for
each of the X variables.
StatPads regression analysis of the
catalog-order data (below) shows the
prediction equation, the R2 value, the standard
error of estimate, the number of observations,
the F statistic, and the p value. The regression
table gives confidence intervals, standard
errors, t, and p values for the constant term and
the regression coefficients (one line per X
variable). The practical interpretation of these results then follows.
To perform a multiple regression analysis using StatPads main dialog box, select Multivariate
as the situation and Regression as the analysis. Choose one of your data sets to be the Y variable (to
be predicted), and select the others from the list of X variables (or use Add Data if your columns of
numbers are in the worksheet but are not in the lists). You may optionally change the Confidence
level (from the default 95%). Next check the Output Range and then select Do It.
Multiple regression analysis to predict Order from Education, Income and West.
The prediction equation is:
Order =
-3.636
+3.356
Education
-0.0002 Income
+24.595 West
0.690
R squared
13.413
Standard error of estimate
16
Number of observations
8.898
F statistic
0.002
p value
95%
95%
Coeff
LowerCI
UpperCI
StdErr
t
p Significant?
Constant
-3.636
-39.557
32.285
16.487
-0.221
0.829 No (p>0.05)
Education
3.356
0.532
6.180
1.296
2.589
0.024 Yes (p<0.05)
Income
-0.00020
-0.00072
0.00033 0.00024
-0.809
0.434 No (p>0.05)
West
24.595
9.301
39.889
7.019
3.504
0.004 Yes (p<0.01)
The R-squared value, 69.0%, indicates the proportion of the variance of Order
that is explained by the regression model.
Thus Education, Income and West together explain
a highly significant proportion of the variation in Order, based on the F test (p<0.01).
The standard error of estimate, 13.413, indicates the typical size
of errors made in predicting Order using the regression model.
Holding the other X variables constant, we estimate that:
3.356
is the increase in Order associated with an increase in Education of 1 unit. This is significant (p<0.05).
-0.00020 is the increase in Order associated with an increase in Income of 1 unit. This is not significant (p>0.05).
24.595 is the increase in Order associated with an increase in West of 1 unit. This is highly significant (p<0.01).

68

Overview of StatPad Features

Predicted and Residuals


The predicted value indicates the value
expected for Y, on average, given a value for
each of the X variables.
The residual indicates by how much each
Y value is above (or below, if negative
residual) the expected value according to the
prediction equation. The standardized
residual measures this difference relative to
the standard error of estimate and may be
interpreted as a number of standard
deviations, making it easy to spot unusual
observations that are not close to the line.
Predicted and residual values from
StatPads regression analysis of the catalog-order data are shown below.
To find predicted and residual values using StatPads main dialog box, select Multivariate
as the situation and Predicted and Residuals as the analysis. Choose one of your data sets to be
the Y variable (to be predicted), and select the others from the list of X variables (or use Add Data if
your columns of numbers are in the worksheet but are not in the lists). Next check the Output Range
and then select Do It.

Predicting Order from Education, Income and West:


Predicted
Residuals
Std Resid
64.617
-8.617
-0.642
65.861
-6.861
-0.512
26.237
-5.237
-0.390
35.381
6.619
0.493
42.053
-5.053
-0.377
57.979
35.021
2.611
66.226
-6.226
-0.464
17.267
14.733
1.098
21.943
-2.943
-0.219
60.578
-6.578
-0.490
43.925
3.075
0.229
41.764
-15.764
-1.175
37.135
-0.135
-0.010
13.781
-3.781
-0.282
61.976
9.024
0.673
36.279
-7.279
-0.543

Overview of StatPad Features

69

Diagnostic Plot
The diagnostic plot is used to look for
potential problems in multiple regression. It
is a scatterplot of the residuals (vertically)
against the predicted values (horizontally).
Any useful structure that the regression
equation has failed to capture will be found
in the residuals. Consequently, if you see
structure in the diagnostic plot (a curve,
outliers, unequal variability) it suggests that
the multiple regression analysis is not
capturing all of the available structure in the
data.
StatPads diagnostic plot for the catalogorder data is shown below. No problems are
evident: it looks like random scatter without tilt or curvature, suggesting that multiple regression
has already extracted the available structure from the data.

Residual Order Values

To create a diagnostic plot using StatPads main dialog box, select Multivariate as the
situation and Diagnostic Plot as the analysis. Choose one of your data sets to be the Y variable (to
be predicted), and select the others from the list of X variables (or use Add Data if your columns of
numbers are in the worksheet but are not in the lists). Next check the Output Range and then select
Do It.

40
20
0
0

20

40

60

80

-20
Order Values Predicted from Education,
Income and West

70

Overview of StatPad Features

Univariate Summaries
Summaries are used to give you selected
numbers that represent and describe your
data sets. When you have multivariate data,
you can find summaries separately for each
variable.
StatPads univariate summaries for
multivariate data are shown below for the
catalog-order data.
To compute univariate summaries for
multivariate data using StatPads main
dialog box, select Multivariate as the
situation and Univariate Summaries as the
analysis. Choose one of your data sets to be the
Y variable, and select the others from the list of X variables (or use Add Data if your columns of
numbers are in the worksheet but are not in the lists). Next check the Output Range and then select
Do It.

Order Education
16
16
43.313 15.063
21.543
3.492
10.0
9.0
27.5
12.5
39.5
16.0
57.5
17.0
93.0
21.0
5.386
0.873

Income
16
$73,558
$18,360
$41,000
$65,418
$70,519
$79,839
$117,370
$4,590

West
16
0.438
0.512
0
0
0
1
1
0.128

Summaries
Count n
Mean or average
Standard deviation (variability of individuals)
Smallest
Lower quartile
Median
Upper quartile
Largest
Standard error (variability of sample average, if
random sample)

Overview of StatPad Features

71

Histograms
Histograms are used to visually explore
data sets. Data are concentrated where bars
are tall.
StatPads histograms are shown below
for the catalog-order data, one histogram for
each variable. The histogram for West
reflects the fact that each data value is either
0 or 1.
To create histograms for multivariate
data using StatPads main dialog box, select
Multivariate as the situation and Histograms
as the analysis. Choose one of your data sets
to be the Y variable, and select the others from
the list of X variables (or use Add Data if your columns of numbers are in the worksheet but are not
in the lists). Next check the Output Range and then select Do It.
StatPad chooses a default bin width and landmark for the histogram bars. If you wish to
change these, use the Customize check-box found under One Sample, Histogram. Note that
Excel (not StatPad) chooses the minimum and maximum horizontal scale. These may be changed
by leaving StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to
find Minimum and Maximum as Axis Options.

Frequency

Frequency

6
4
2
0

6
4
2
0

20

40

60

80

100

Order

10

15

20

25

Education

12
10

10

Frequency

Frequency

8
6
4

8
6
4
2

2
0

0
0

50000

100000
Income

150000

0.5

1
West

1.5

72

Overview of StatPad Features

Box Plots
Box plots are used to visually explore
and compare data sets; they show you a
central box defined by the quartiles, with the
median indicated within the box. In the
ordinary box plot, a line extends from the box
on each side to the most extreme value. In the
detailed box plot, outliers are indicated
separately and these lines extend from the
box on each side to the most extreme value
(adjacent value) that is not an outlier.
StatPads detailed box plots are shown
below, on separate scales, for each variable
in the catalog-order data set. The box plot for
West reflects the fact that each data value is
either 0 or 1.
To create box plots for multivariate data using StatPads main dialog box, select
Multivariate as the situation and Box Plots as the analysis. Choose one of your data sets to be the
Y variable, and select the others from the list of X variables (or use Add Data if your columns of
numbers are in the worksheet but are not in the lists). Click on Detailed box plot if you wish
outliers to be displayed separately. Next check the Output Range and then select Do It.
Outliers are defined as data values more than 1.5 times the interquartile range away from
either quartile. With multivariate data, StatPad creates box plots separately for each variable
because multivariate data are often measured in different units. To see box plots on the same
scale, you would select Many Sample, Box Plots from StatPads main dialog box.

20

40

60

80

100

10

Order

50000

15

20

25

Education

100000

Income

150000

-0.5

0.5
West

1.5

Overview of StatPad Features

73

Time-Series Analysis
Trend-Seasonal
Trend-seasonal analysis of time series
provides understanding and forecasting of
data that show a repeating pattern, often
quarterly or monthly throughout the year.
There are four components: the trend (longterm, a straight line), the seasonal variation
(repeating each year), the cyclic variation
(medium-term wandering) and the irregular
component (randomness).
StatPad
performs
trend-seasonal
analysis of quarterly and monthly time-series
data. To show you what is available, StatPad
has a trend-seasonal dialog box that will
allow you to choose a combination of
components to chart or display, and then will return for further analysis of this same time-series
data set.
We will be working with sales numbers for a gift shop that tends to have higher sales in the
fourth quarter due to the holiday season. Data are quarterly from 2007 through 2010, starting in
the first quarter.
To begin trend-seasonal analysis using StatPads main dialog box, select Time Series as the
situation and Trend-Seasonal as the analysis. Please note that StatPad assumes that time
increases as you move down your column of data. Select your data from the list (or use Add Data
if your column of numbers is in the worksheet but is not in the list), type in the starting year, click
to select Quarterly or Monthly, select the Initial Quarter or Initial Month, check the Output
Range, and then select Do It. You will then see the trend-seasonal dialog box (below).

74

Overview of StatPad Features

Forecast with Series


The forecast, using trend-seasonal
analysis, consists of the long-term trend
overlaid with the pattern of seasonal
variation.
In the chart below, StatPad shows the
sales data series together with a quarterly 3year-ahead forecast. Note how the forecast
picks up the trend and the yearly pattern, but
without the randomness of real data.
To display a chart with the data series
and forecast using StatPads trend-seasonal
dialog box, be sure that Data Series and
Forecast are selected. You may choose the
number of years to forecast by clicking up or
down. Select Graph, check the Output Range, and then select Do It.

1800
1600
1400

Sales

1200
1000
800

Sales

600

Forecast

400
200
0
2007

2008

2009

2010 2011
Time

2012

2013

2014

Overview of StatPad Features

75

Moving Average (Smooth)


A moving average is a new series created
at each time point by taking the average of a
years worth of data: from a half-year before
to a half-year after. This has the effect of
placing a smooth curve through the original
series that shows you the long-term trend and
the medium-term cyclic component.
In the chart below, StatPad shows the
sales data series together with its moving
average. The moving average is unavailable
at the start and end of the series because it
needs a half-year of data on each side.
To display a chart of the data series with
its moving average using StatPads trendseasonal dialog box, be sure that Data Series and Moving Average (Smooth) are selected.
Select Graph, check the Output Range, and then select Do It.

1400
1200

Sales

1000
800

600

Sales
Smooth

400
200
0
2007

2008

2009
Time

2010

2011

76

Overview of StatPad Features

Seasonal Index
The seasonal index shows you the
repeating yearly pattern, centered near the
value 1 (or 100%). A period that is typically
higher than the rest of the year will have a
seasonal index larger than 1. Seasonal
adjustment is done by dividing each data
value by the appropriate seasonal index (for
its month or quarter). Forecasts are obtained
by multiplying the trend by the seasonal
index.
In the chart below, StatPad shows the
seasonal index. It is best to plot the seasonal
index alone, without any of the others,
because its values are near 1 and may be
obscured by the scale of the series itself.
To display the seasonal index values using StatPads trend-seasonal dialog box, be sure that
Seasonal Index is selected. Select Graph, check the Output Range, and then select Do It. It is
best not to choose any other items when charting the seasonal index.

2.0
1.8
1.6
1.4

Sales

1.2
1.0
0.8

Seasonal

0.6
0.4
0.2
0.0
2007

2008

2009
Time

2010

2011

Overview of StatPad Features

77

Seasonally Adjusted Series


The seasonally adjusted series is found
by dividing each value in the original series
by the appropriate seasonal index (for its
month or quarter). This has the effect of
removing the expected yearly pattern from the
data. This allows you to see whether values
have gone up or down relative to what you
would have expected at that time of year. The
seasonally-adjusted series is more variable
than the moving average because it also
contains the irregular random component.
In the chart below, StatPad shows the
sales data series together with its seasonally
adjusted series. Note that at the end the series
itself is up but the seasonally adjusted series is down. This is because the actual increase was
less than ordinarily expected at this time of year.
To display a chart of the data series with its seasonally adjusted series using StatPads
trend-seasonal dialog box, be sure that Data Series and Seasonally Adjusted Series are
selected. Select Graph, check the Output Range, and then select Do It.

1400
1200

Sales

1000
800
600

Sales

Seasonally Adjusted

400
200
0
2007

2008

2009
Time

2010

2011

78

Overview of StatPad Features

Long-Term Trend
The long-term trend summarizes the
basic behavior of the series as a line or very
smooth curve. It is often found by linear
regression (or exponential curve-fitting).
In the chart below, StatPad shows the
sales data series together with its long-term
trend.
To display a chart of the data series with
its long-term trend using StatPads trendseasonal dialog box, be sure that Data Series
and Long-Term Trend are selected. Select
Graph, check the Output Range, and then
select Do It.
StatPad finds the long-term trend by fitting the best straight line using regression..

1400
1200

Sales

1000
800

600

Sales
Trend

400
200
0
2007

2008

2009
Time

2010

2011

Overview of StatPad Features

79

Seasonalized Trend
The seasonalized trend is found by
multiplying each long-term trend value by the
appropriate seasonal index value (for its
month or quarter).
In the chart below, StatPad shows the
sales data series together with its
seasonalized trend.
To display a chart of the data series with
its seasonalized trend using StatPads trendseasonal dialog box, be sure that Data Series
and Seasonalized Trend are selected. Select
Graph, check the Output Range, and then
select Do It.

1400
1200

Sales

1000

800
600

Sales
Seasonalized Trend

400
200
0
2007

2008

2009
Time

2010

2011

80

Overview of StatPad Features

A Combination: Data Series With Long-Term Trend and Forecast


You may wish to create your own
combination of these components; StatPad
gives you complete freedom to do this.
In the chart below, StatPad shows the
sales data series together with its long-term
trend and a three-year forecast.
To display your combination chart using
StatPads trend-seasonal dialog box, make
your selections, choose Graph, check the
Output Range, and then select Do It.

1800
1600
1400

Sales

1200
1000
800

Sales

600

Trend

400

Forecast

200
0
2007 2008 2009 2010 2011 2012 2013 2014
Time

Overview of StatPad Features

81

Numeric Output
You may have a need for numbers as well
as charts for trend-seasonal analysis. All of
the options available for charting are also
there in StatPad for numeric output.
All options have been selected here
(including a three-year forecast) for
StatPads numeric output, shown below.
To
display
your
numeric-output
combination using StatPads trend-seasonal
dialog box, make your selections, select
Numbers, check the Output Range, and then
select Do It.

Year
2007
2007
2007
2007
2008
2008
2008
2008
2009
2009
2009
2009
2010
2010
2010
2010
2011
2011
2011
2011
2012
2012
2012
2012
2013
2013
2013
2013

Quarter
QI
QII
QIII
QIV
QI
QII
QIII
QIV
QI
QII
QIII
QIV
QI
QII
QIII
QIV
QI
QII
QIII
QIV
QI
QII
QIII
QIV
QI
QII
QIII
QIV

Sales
257
308
428
850
304
431
479
831
318
352
564
1,255
398
472
745
1,015

Smooth

466.6
487.9
509.6
513.6
513.0
504.9
505.6
569.3
632.3
657.3
694.9
687.5

Seasonal
0.599
0.715
0.914
1.766
0.599
0.715
0.914
1.766
0.599
0.715
0.914
1.766
0.599
0.715
0.914
1.766
0.599
0.715
0.914
1.766
0.599
0.715
0.914
1.766
0.599
0.715
0.914
1.766

Seasonally
Adjusted
428.8
431.0
468.1
481.3
507.2
603.1
523.9
470.6
530.5
492.5
616.8
710.7
664.0
660.4
814.8
574.8

Trend
424.4
442.6
460.9
479.1
497.3
515.6
533.8
552.0
570.3
588.5
606.7
625.0
643.2
661.4
679.7
697.9
716.1
734.4
752.6
770.8
789.1
807.3
825.5
843.8
862.0
880.3
898.5
916.7

Seasonalized
Trend
254.4
316.3
421.4
846.0
298.1
368.5
488.1
974.8
341.8
420.6
554.8
1,103.6
385.5
472.7
621.5
1,232.4

Forecast

429.3
524.8
688.1
1,361.2
473.0
577.0
754.8
1,490.0
516.7
629.1
821.5
1,618.8

82

Overview of StatPad Features

Quality Control
X-Bar, R Charts (No Standard Given)
Control charts, such as X-bar ( X ) and R
charts, show you whether your process is in
or out of control. A series of measurements is
divided into subgroups of a fixed size, e.g.,
five at a time. The average and range (largest
minus smallest) are computed for each
subgroup. Each is plotted together with a
central line and control limits (upper and
lower). If the series (averages or ranges) goes
outside the control limits, it indicates that the
process is not in control. However, even if the
series remains within the limits, it can still be
out of control, e.g., if there is a trend that will
clearly soon break out of the limits.
StatPads X-bar and R charts for the light intensity of laser units (below) show a process in
control. The averages and ranges move seemingly at random within the control limits. The range
chart comes very close to the upper control limit at group 13, but this by itself is not a problem.
The input data consists of a column of 125 individual measurements. StatPad groups and
averages them 5 at a time, as requested. The control limits are computed based only on the data
values because no standard was given.
To create X-bar and R charts using StatPads main dialog box, select Quality Control as the
situation and X-Bar, R Charts as the analysis. Select your data from the list (or use Add Data if
your column of numbers is in the worksheet but is not in the list). You may optionally change the
Subgroup Size from the default (5) to any whole number from 2 to 25. Then check the Output
Range, and then select Do It. You may, optionally, specify standards for the process mean and
standard deviation (see next item).

25.0
24.9
24.8
24.7
24.6
24.5

Ranges of Intensity

Averages of Intensity

StatPad computes control limits according to ASTM-STP 15D, American Society for Testing
and Materials.

10
15
20
Group Number

25

1.0
0.8
0.6
0.4
0.2
0.0
0

10
15
20
Group Number

25

Overview of StatPad Features

83

X-Bar, R Charts (Standard Given)


Sometimes you have external standards
for a process and would like the center line
and control limits to reflect them. Otherwise
the process may seem to be in control, but not
actually produce acceptable results.
In StatPads X-bar and R charts of laser
intensity shown below, standards are given as
mean 0 =25 and standard deviation 0 =0.2.
With respect to these standards, the process is
not in control. The averages go below the
lower control limit and always fall on one
side of the center line. This process is
consistently producing below standard.
To create X-bar and R charts, with
standard given, using StatPads main dialog box, select Quality Control as the situation and XBar, R Charts as the analysis. Select your data from the list (or use Add Data if your column of
numbers is in the worksheet but is not in the list). You may optionally change the Subgroup Size
from the default (5) to any whole number from 2 to 25. When you click on Standard Given, two
edit-boxes appear: for Mean (mu0) and for StdDev (sigma0). You may then click on each and
type in the value you wish. Then check the Output Range, and then select Do It.

25.3
25.2
25.1
25.0
24.9
24.8
24.7
24.6

Ranges of Intensity

Averages of Intensity

StatPad computes control limits according to ASTM-STP 15D, American Society for Testing
and Materials.

10
15
20
Group Number

25

1.2
1.0
0.8
0.6
0.4
0.2
0.0
0

10
15
20
Group Number

25

84

Overview of StatPad Features

Percentage or Count Chart (No Standard Given)


These charts are used to see if a process
involving counts or percentages is in control.
In Excel, a percentage number is actually a
proportion number (decimal fraction) from 0
to 1, formatted to be displayed as a
percentage (perhaps using the % icon in the
Number Group of the Home Ribbon). For
example, the number of failures is noted for
each group of 200 items produced, then
(optionally) converted to a proportion or
percent by dividing by 200. StatPad can work
from either counts or percentages.
StatPads percentage chart for the
proportion (or percent) of failures (below)
shows a process in control. The proportions move seemingly at random within the control limits.
The input data consists of a column of 35 individual percentages (each computed based on a
sample of 200 items). StatPad neither groups nor averages the data column for a percentage
chart. The control limits are computed based only on the data values because no standard was
given.
To create a percentage or count chart using StatPads main dialog box, select Quality
Control as the situation and Pct, Count Chart as the analysis. Select your data (either counts or
percentages) from the list (or use Add Data if your column of numbers is in the worksheet but is
not in the list). Your data should either be counts or be proportions (between 0 and 1,
representing percentages) based on a constant sample size. Specify this Sample Size. Check the
Output Range, and then select Do It. You may, optionally, specify a standard for the process
percentage (see next item).

Proportions for Failures

The center line for percentages is the average percentage p , and the lower and upper
control limits are found as follows: p 3 p(1 p) / n and p 3 p(1 p) / n . For counts,
multiply the center line and control limits by the sample size n.

15%
10%
5%

0%
0

10 15 20 25
Group Number

30

35

Overview of StatPad Features

85

Percentage or Count Chart (Standard Given)


Sometimes you have external standards
for a process and would like the center line
and control limits to reflect them. Otherwise
the process may seem to be in control, but not
actually produce acceptable results.
In StatPads percentage chart for the
proportion (or percent) of failures shown
below, a standard of 9% was specified. With
respect to this standard, the process is in
control. The proportions move seemingly at
random within the control limits. The input
data consists of a column of 35 individual
percentages (each based on a sample of 200).
StatPad neither groups nor averages the data
column for a percentage chart. The control limits are computed based only on the specified
standard.
To create a percentage or count chart, with standard given, using StatPads main dialog
box, select Quality Control as the situation and Pct, Count Chart as the analysis. Select your
data from the list (or use Add Data if your column of numbers is in the worksheet but is not in
the list). Your data should either be counts or be proportions (between 0 and 1, representing
percentages) based on a constant sample size. Specify this Sample Size. When you click on
Standard Given, an edit-box appears for Percentage (p0). You may then type in the standard
value. Then check the Output Range, and then select Do It.

Proportions for Failures

The center line for percentages is the standard p0 , and the lower and upper control limits
are found as follows: p0 3 p0 (1 p0 ) / n and p0 3 p0 (1 p0 ) / n . For counts, multiply the
center line and control limits by the sample size n.

20%
15%
10%
5%
0%
0

10 15 20 25
Group Number

30

35

You might also like