Introduction To Statistics For Biomedical Engineers

MC: Ropella FM_Page - 09/27/2007, 11:55PM Achorn Internatonal
Introduction to Statistics
for Biomedical Engineers
MC: Ropella FM_Page v - 09/27/2007, 11:55PM Achorn Internatonal
This text is dedicated to all the students who have completed my BIEN 084
statistics course for biomedical engineers and have taught me how to be
more effective in communicating the subject matter and making statistics
come alive for them. I also thank J. Claypool for his patience and
for encouraging me to fnally put this text together.
Finally, I thank my family for tolerating my time at home on the laptop.
ABSTRACT
There are many books wrtten about statstcs, some bref, some detaled, some humorous, some
colorful, and some qute dry. Each of these texts s desgned for a specfc audence. Too often, texts
about statstcs have been rather theoretcal and ntmdatng for those not practcng statstcal
analyss on a routne bass. Thus, many engneers and scentsts, who need to use statstcs much
more frequently than calculus or dfferental equatons, lack suffcent knowledge of the use of
statstcs. The audence that s addressed n ths text s the unversty-level bomedcal engneerng
student who needs a bare-bones coverage of the most basc statstcal analyss frequently used n
bomedcal engneerng practce. The text ntroduces students to the essental vocabulary and basc
concepts of probablty and statstcs that are requred to perform the numercal summary and sta-
tstcal analyss used n the bomedcal feld. Ths text s consdered a startng pont for mportant
ssues to consder when desgnng experments, summarzng data, assumng a probablty model for
the data, testng hypotheses, and drawng conclusons from sampled data.
A student who has completed ths text should have suffcent vocabulary to read more ad-
vanced texts on statstcs and further ther knowledge about addtonal numercal analyses that are
used n the bomedcal engneerng feld but are beyond the scope of ths text. Ths book s desgned
to supplement an undergraduate-level course n appled statstcs, specfcally n bomedcal eng-
neerng. Practcng engneers who have not had formal nstructon n statstcs may also use ths text
as a smple, bref ntroducton to statstcs used n bomedcal engneerng. The emphass s on the
applcaton of statstcs, the assumptons made n applyng the statstcal tests, the lmtatons of
these elementary statstcal methods, and the errors often commtted n usng statstcal analyss.
A number of examples from bomedcal engneerng research and ndustry practce are provded to
assst the reader n understandng concepts and applcaton. It s benefcal for the reader to have
some background n the lfe scences and physology and to be famlar wth basc bomedcal n-
strumentaton used n the clncal envronment.
KEywoRdS
probablty model, hypothess testng, physology, ANOVA, normal dstrbuton,
confdence nterval, power test

Contents
1. Introduction ....................................................................................................... 1
2. Collecting data and Experimental design ........................................................... 5
3. data Summary and descriptie Statistics ............................................................ 9
3.1 Why Do We Collect Data? ................................................................................ 9
3.2 Why Do We Need Statstcs? ............................................................................. 9
3.3 What Questons Do We Hope to Address Wth Our Statstcal Analyss? ..... 10
3.4 How Do We Graphcally Summarze Data? .................................................... 11
3.4.1 Scatterplots ........................................................................................... 11
3.4.2 Tme Seres ........................................................................................... 11
3.4.3 Box-and-Whsker Plots ........................................................................ 12
3.4.4 Hstogram ............................................................................................. 13
3.5 General Approach to Statstcal Analyss ......................................................... 17
3.6 Descrptve Statstcs ........................................................................................ 20
3.6.1 Measures of Central Tendency ............................................................. 21
3.6.2 Measures of Varablty ......................................................................... 22
4. Assuming a Probability Model From the Sample data ........................................ 25
4.1 The Standard Normal Dstrbuton .................................................................. 29
4.2 The Normal Dstrbuton and Sample Mean .................................................... 32
4.3 Confdence Interval for the Sample Mean ....................................................... 33
4.4 The t Dstrbuton ............................................................................................ 36
4.5 Confdence Interval Usng t Dstrbuton .......................................................... 38
5. Statistical Inference .......................................................................................... 41
5.1 Comparson of Populaton Means .................................................................... 41
5.1.1 The t Test ............................................................................................. 42
5.1.1.1 Hypothess Testng ................................................................ 42
5.1.1.2 Applyng the t Test ................................................................ 43
5.1.1.3 Unpared t Test ...................................................................... 44
5.1.1.4 Pared t Test ........................................................................... 49
5.1.1.5 Example of a Bomedcal Engneerng Challenge ................. 50
ii
5.2 Comparson of Two Varances .......................................................................... 54
5.3 Comparson of Three or More Populaton Means ........................................... 59
5.3.1 One-Factor Experments ...................................................................... 60
5.3.1.1 Example of Bomedcal Engneerng Challenge .................... 60
5.3.2 Two-Factor Experments ...................................................................... 69
5.3.3 Tukeys Multple Comparson Procedure ............................................. 73
6. Linear Regression and Correlation Analysis ....................................................... 75
7. Power Analysis and Sample Size ........................................................................ 81
7.1 Power of a Test ................................................................................................. 82
7.2 Power Tests to Determne Sample Sze ............................................................ 83
8. Just the Beginning ............................................................................................ 87
Bibliography ............................................................................................................. 91
Author Biography ...................................................................................................... 93
iii INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
1
MC: Ropella Ch01_Page 1 - 09/26/2007, 04:23PM Achorn Internatonal
C H A P T E R 1
Bomedcal engneers typcally collect all sorts of data, from patents, anmals, cell counters, mcro-
assays, magng systems, pressure transducers, bedsde montors, manufacturng processes, materal
testng systems, and other measurement systems that support a broad spectrum of research, desgn,
and manufacturng envronments. Ultmately, the reason for collectng data s to make a decson.
That decson may concern dfferentatng bologcal characterstcs among dfferent populatons
of people, determnng whether a pharmacologcal treatment s effectve, determnng whether t s
cost-effectve to nvest n multmllon-dollar medcal magng technology, determnng whether a
manufacturng process s under control, or selectng the best rehabltatve therapy for an ndvdual
patent.
The challenge n makng such decsons often les n the fact that all real-world data contans
some element of uncertanty because of random processes that underle most physcal phenomenon.
These random elements prevent us from predctng the exact value of any physcal quantty at any
moment of tme. In other words, when we collect a sample or data pont, we usually cannot predct
the exact value of that sample or expermental outcome. For example, although the average restng
heart rate of normal adults s about 70 beats per mnute, we cannot predct the exact arrval tme
of our next heartbeat. However, we can approxmate the lkelhood that the arrval tme of the next
heartbeat wll fall n a specfc tme nterval f we have a good probablty model to descrbe the
random phenomenon contrbutng to the tme nterval between heartbeats. The tmng of heart-
beats s nfuenced by a number of physologcal varables [1], ncludng the refractory perod of
the ndvdual cells that make up the heart muscle, the leakness of the cell membranes n the snus
node (the hearts natural pacemaker), and the actvty of the autonomc nervous system, whch may
speed up or slow down the heart rate n response to the bodys need for ncreased blood fow, oxygen,
and nutrents. The sum of these bologcal processes produces a pattern of heartbeats that we may
measure by countng the pulse rate from our wrst or carotd artery or by searchng for specfc QRS
waveforms n the ECG [2]. Although ths sum of events makes t dffcult for us to predct exactly
when the new heartbeat wll arrve, we can guess, wth a certanty amount of confdence when the
next beat wll arrve. In other words, we can assgn a probablty to the lkelhood that the next
heartbeat wll arrve n a specfed tme nterval. If we were to consder all possble arrval tmes and
Introduction
2 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
assgned a probablty to those arrval tmes, we would have a probablty model for the heartbeat
ntervals. If we can fnd a probablty model to descrbe the lkelhood of occurrence of a certan
event or expermental outcome, we can use statstcal methods to make decsons. The probablty
models descrbe characterstcs of the populaton or phenomenon beng studed. Statstcal analyss
then makes use of these models to help us make decsons about the populaton(s) or processes.
The conclusons that one may draw from usng statstcal analyss are only as good as the
underlyng model that s used to descrbe the real-world phenomenon, such as the tme nterval
between heartbeats. For example, a normally functonng heart exhbts consderable varablty n
beat-to-beat ntervals (Fgure 1.1). Ths varablty refects the bodys contnual effort to mantan
homeostass so that the body may contnue to perform ts most essental functons and supply the
body wth the oxygen and nutrents requred to functon normally. It has been demonstrated through
bomedcal research that there s a loss of heart rate varablty assocated wth some dseases, such
as dabetes and schemc heart dsease. Researchers seek to determne f ths dfference n varablty
between normal subjects and subjects wth heart dsease s sgnfcant (meanng, t s due to some
underlyng change n bology and not smply a result of chance) and whether t mght be used to
predct the progresson of the dsease [1]. One wll note that the probablty model changes as a
consequence of changes n the underlyng bologcal functon or process. In the case of manufactur-
ng, the probablty model used to descrbe the output of the manufacturng process may change as
1000
500
0
E
C
G

A
m
p
l
i
t
u
d
e

(
R
e
l
a
t
i
v
e

U
n
i
t
s
)
TME
R-R nterval
FIguRE 1.1: Example of an ECG recordng, where R-R nterval s defned as the tme nterval be-
tween successve R waves of the QRS complex, the most promnent waveform of the ECG.
INTRoduCTIoN 3
a functon of machne operaton or changes n the surroundng manufacturng envronment, such as
temperature, humdty, or human operator.
Besdes helpng us to descrbe the probablty model assocated wth real-world phenomenon,
statstcs help us to make decsons by gvng us quanttatve tools for testng hypotheses. We call
ths inferential statistics, whereby the outcome of a statstcal test allows us to draw conclusons or
make nferences about one or more populatons from whch samples are drawn. Most often, scen-
tsts and engneers are nterested n comparng data from two or more dfferent populatons or from
two or more dfferent processes. Typcally, the default hypothess s that there s no dfference n the
dstrbutons of two or more populatons or processes, and we use statstcal analyss to determne
whether there are true dfferences n the dstrbutons of the underlyng populatons to warrant df-
ferent probablty models be assgned to the ndvdual processes.
In summary, bomedcal engneers typcally collect data or samples from varous phenomena,
whch contan some element of randomness or unpredctable varablty, for the purposes of makng
decsons. To make sound decsons n the context of the uncertanty wth some level of confdence,
we need to assume some probablty model for the populatons from whch the samples have been
collected. Once we have assumed an underlyng model, we can select the approprate statstcal
tests for comparng two or more populatons and then use these tests to draw conclusons about
Collect samples Irom a population or
process
Assume a probability model Ior the
population
Estimate population statistics Irom
sample statistics
Compare populations and test
hypotheses
FIguRE 1.2: Steps n statstcal analyss.
our hypotheses for whch we collected the data n the frst place. Fgure 1.2 outlnes the steps for
performng statstcal analyss of data.
In the followng chapters, we wll descrbe methods for graphcally and numercally sum-
marzng collected data. We wll then talk about fttng a probablty model to the collected data by
brefy descrbng a number of well-known probablty models that are used to descrbe bologcal
phenomenon. Fnally, once we have assumed a model for the populatons from whch we have col-
lected our sample data, we wll dscuss the types of statstcal tests that may be used to compare data
from multple populatons and allow us to test hypotheses about the underlyng populatons.

5
MC: Ropella Ch02_Page 5 - 09/26/2007, 11:51AM Achorn Internatonal
C H A P T E R 2
Before we dscuss any type of data summary and statstcal analyss, t s mportant to recognze that
the value of any statstcal analyss s only as good as the data collected. Because we are usng data
or samples to draw conclusons about entre populatons or processes, t s crtcal that the data col-
lected (or samples collected) are representatve of the larger, underlyng populaton. In other words,
f we are tryng to determne whether men between the ages of 20 and 50 years respond postvely
to a drug that reduces cholesterol level, we need to carefully select the populaton of subjects for
whom we admnster the drug and take measurements. In other words, we have to have enough
samples to represent the varablty of the underlyng populaton. There s a great deal of varety n
the weght, heght, genetc makeup, det, exercse habts, and drug use n all men ages 20 to 50 years
who may also have hgh cholesterol. If we are to test the effectveness of a new drug n lowerng
cholesterol, we must collect enough data or samples to capture the varablty of bologcal makeup
and envronment of the populaton that we are nterested n treatng wth the new drug. Capturng
ths varablty s often the greatest challenge that bomedcal engneers face n collectng data and
usng statstcs to draw meanngful conclusons. The expermentalst must ask questons such as the
followng:
What type of person, object, or phenomenon do I sample?
What varables that mpact the measure or data can I control?
How many samples do I requre to capture the populaton varablty to apply the appro-
prate statstcs and draw meanngful conclusons?
How do I avod basng the data wth the expermental desgn?
Expermental desgn, although not the prmary focus of ths book, s the most crtcal step to sup-
port the statstcal analyss that wll lead to meanngful conclusons and hence sound decsons.
One of the most fundamental questons asked by bomedcal researchers s, What sze sam-
ple do I need? or How many subjects wll I need to make decsons wth any level of confdence?
Collecting data and

Experimental design
We wll address these mportant questons at the end of ths book when concepts such as varablty,
probablty models, and hypothess testng have already been covered. For example, power tests wll
be descrbed as a means for predctng the sample sze requred to detect sgnfcant dfferences n
two populaton means usng a t test.
Two elements of expermental desgn that are crtcal to prevent basng the data or selectng
samples that do not farly represent the underlyng populaton are randomzaton and blockng.
Randomzaton refers to the process by whch we randomly select samples or expermental
unts from the larger underlyng populaton such that we maxmze our chance of capturng the
varablty n the underlyng populaton. In other words, we do not lmt our samples such that
only a fracton of the characterstcs or behavors of the underlyng populaton are captured n the
samples. More mportantly, we do not bas the results by artfcally lmtng the varablty n the
samples such that we alter the probablty model of the sample populaton wth respect to the prob-
ablty model of the underlyng populaton.
In addton to randomzng our selecton of expermental unts from whch to take samples, we
mght also randomze our assgnment of treatments to our expermental unts. Or, we may random-
ze the order n whch we take data from the expermental unts. For example, f we are testng the
effectveness of two dfferent medcal magng methods n detectng bran tumor, we wll randomly
assgn all subjects suspect of havng bran tumor to one of the two magng methods. Thus, f we have
a mx of sex, age, and type of bran tumor partcpatng n the study, we reduce the chance of havng
all one sex or one age group assgned to one magng method and a very dfferent type of populaton
assgned to the second magng method. If a dfference s noted n the outcome of the two magng
methods, we wll not artfcally ntroduce sex or age as a factor nfuencng the magng results.
As another example, f one are testng the strength of three dfferent materals for use n
hp mplants usng several strength measures from a materals testng machne, one mght random-
ze the order n whch samples of the three dfferent test materals are submtted to the machne.
Machne performance can vary wth tme because of wear, temperature, humdty, deformaton,
stress, and user characterstcs. If the bomedcal engneer were asked to fnd the strongest materal
for an artfcal hp usng specfc strength crtera, he or she may conduct an experment. Let us
assume that the engneer s gven three boxes, wth each box contanng fve artfcal hp mplants
made from one of three materals: ttanum, steel, and plastc. For any one box, all fve mplant
samples are made from the same materal. To test the 15 dfferent mplants for materal strength,
the engneer mght randomze the order n whch each of the 15 mplants s tested n the mater-
als testng machne so that tme-dependent changes n machne performance or machne-materal
nteractons or tme-varyng envronmental condton do not bas the results for one or more of the
materals. Thus, to fully randomze the mplant testng, an engneer may lterally place the numbers
115 n a hat and also assgn the numbers 115 to each of the mplants to be tested. The engneer
wll then blndly draw one of the 15 numbers from a hat and test the mplant that corresponds to
CoLLECTINg dATA ANd ExPERIMENTAL dESIgN 7
that number. Ths way the engneer s not testng all of one materal n any partcular order, and we
avod ntroducng order effects nto the data.
The second aspect of expermental desgn s blockng. In many experments, we are nterested
n one or two specfc factors or varables that may mpact our measure or sample. However, there
may be other factors that also nfuence our measure and confound our statstcs. In good exper-
mental desgn, we try to collect samples such that dfferent treatments wthn the factor of nterest
are not based by the dfferng values of the confoundng factors. In other words, we should be cer-
tan that every treatment wthn our factor of nterest s tested wthn each value of the confoundng
factor. We refer to ths desgn as blockng by the confoundng factor. For example, we may want to
study weght loss as a functon of three dfferent det plls. One confoundng factor may be a persons
startng weght. Thus, n testng the effectveness of the three plls n reducng weght, we may want
to block the subjects by startng weght. Thus, we may frst group the subjects by ther startng
weght and then test each of the det plls wthn each group of startng weghts.
In bomedcal research, we often block by expermental unt. When ths type of blockng s
part of the expermental desgn, the expermentalst collects multple samples of data, wth each
sample representng dfferent expermental condtons, from each of the expermental unts. Fg-
ure 2.1 provdes a dagram of an experment n whch data are collected before and after patents
receves therapy, and the expermental desgn uses blockng (left) or no blockng (rght) by exper-
mental unt. In the case of blockng, data are collected before and after therapy from the same set of
human subjects. Thus, wthn an ndvdual, the same bologcal factors that nfuence the bologcal
response to the therapy are present before and after therapy. Each subject serves as hs or her own
control for factors that may randomly vary from subject to subject both before and after therapy.
In essence, wth blockng, we are elmnatng bases n the dfferences between the two populatons
Block (Repeated Measures) No Block (No repeated measures)
Subject Measure
before
treatment
Measure
after
treatment
Subject Measure
before
treatment
Subject Measure
after
treatment
1 M11 12 1 M1 K+1 M(K+1)
2 M21 M22 2 M2 K+2 M(K+2)
3 M31 M32 3 M3 K+3 M(K+3)
. . .
. . .
K MK1 MK2

K MK K+K M(K+K)
FIguRE 2.1: Samples are drawn from two populatons (before and after treatment), and the exper-
mental desgn uses block (left) or no block (rght). In ths case, the block s the expermental unt (sub-
ject) from whch the measures are made.
(before and after) that may result because we are usng two dfferent sets of expermental unts. For
example, f we used one set of subjects before therapy and then an entrely dfferent set of subjects
after therapy (Fgure 2.1, rght), there s a chance that the two sets of subjects may vary enough n
sex, age, weght, race, or genetc makeup, whch would lead to a dfference n response to the therapy
that has lttle to do wth the underlyng therapy. In other words, there may be confoundng factors
that contrbute to the dfference n the expermental outcome before and after therapy that are not
only a factor of the therapy but really an artfact of dfferences n the dstrbutons of the two dffer-
ent groups of subjects from whch the two samples sets were chosen. Blockng wll help to elmnate
the effect of ntersubject varablty.
However, blockng s not always possble, gven the nature of some bomedcal research stud-
es. For example, f one wanted to study the effectveness of two dfferent chemotherapy drugs n
reducng tumor sze, t s mpractcal to test both drugs on the same tumor mass. Thus, the two
drugs are tested on dfferent groups of ndvduals. The same type of desgn would be necessary for
testng the effectveness of weght-loss regmens.
Thus, some mportant concepts and defntons to keep n mnd when desgnng experments
nclude the followng:
experimental unit: the tem, object, or subject to whch we apply the treatment and from
whch we take sample measurements;
randomization: allocate the treatments randomly to the expermental unts;
blocking: assgnng all treatments wthn a factor to every level of the blockng factor.
Often, the blockng factor s the expermental unt. Note that n usng blockng, we stll
randomze the order n whch treatments are appled to each expermental unt to avod
orderng bas.
Fnally, the expermentalst must always thnk about how representatve the sample populaton s
wth respect to the greater underlyng populaton. Because t s vrtually mpossble to test every
member of a populaton or every product rollng down an assembly lne, especally when destruc-
tve testng methods are used, the bomedcal engneer must often collect data from a much smaller
sample drawn from the larger populaton. It s mportant, f the statstcs are gong to lead to useful
conclusons, that the sample populaton captures the varablty of the underlyng populaton. What
s even more challengng s that we often do not have a good grasp of the varablty of the underly-
ng populaton, and because of expense and respect for lfe, we are typcally lmted n the number of
samples we may collect n bomedcal research and manufacturng. These lmtatons are not easy to
address and requre that the engneer always consder how far the sample and data analyss s and
how well t represents the underlyng populaton(s) from whch the samples are drawn.

9
C H A P T E R 3
We assume now that we have collected our data through the use of good expermental desgn. We
now have a collecton of numbers, observatons, or descrptons to descrbe our data, and we would
lke to summarze the data to make decsons, test a hypothess, or draw a concluson.
3.1 wHy do wE CoLLECT dATA?
The world s full of uncertanty, n the sense that there are random or unpredctable factors that
nfuence every expermental measure we make. The unpredctable aspects of the expermental out-
comes also arse from the varablty n bologcal systems (due to genetc and envronmental fac-
tors) and manufacturng processes, human error n makng measurements, and other underlyng
processes that nfuence the measures beng made.
Despte the uncertanty regardng the exact outcome of an experment or occurrence of a fu-
ture event, we collect data to try to better understand the processes or populatons that nfuence an
expermental outcome so that we can make some predctons. Data provde nformaton to reduce
uncertanty and allow for decson makng. When properly collected and analyzed, data help us
solve problems. It cannot be stressed enough that the data must be properly collected and analyzed
f the data analyss and subsequent conclusons are to have any value.
3.2 wHy do wE NEEd STATISTICS?
We have three major reasons for usng statstcal data summary and analyss:
The real world s full of random events that cannot be descrbed by exact mathematcal
expressons.
Varablty s a natural and normal characterstc of the natural world.
We lke to make decsons wth some confdence. Ths means that we need to fnd trends
wthn the varablty.
1.
2.
3.
data Summary and
descriptie Statistics
3.3 wHAT QuESTIoNS do wE HoPE To AddRESS wITH
ouR STATISTICAL ANALySIS?
There are several basc questons we hope to address when usng numercal and graphcal summary
of data:
Can we dfferentate between groups or populatons?
Are there correlatons between varables or populatons?
Are processes under control?
Fndng physologcal dfferences between populatons s probably the most frequent am
of bomedcal research. For example, researchers may want to know f there s a dfference n lfe
expectancy between overweght and underweght people. Or, a pharmaceutcal company may want
to determne f one type of antbotc s more effectve n combatng bactera than another. Or, a
physcan wonders f dastolc blood pressure s reduced n a group of hypertensve subjects after
the consumpton of a pressure-reducng drug. Most often, bomedcal researchers are comparng
populatons of people or anmals that have been exposed to two or more dfferent treatments or d-
agnostc tests, and they want to know f there s dfference between the responses of the populatons
that have receved dfferent treatments or tests. Sometmes, we are drawng multple samples from
the same group of subjects or expermental unts. A common example s when the physologcal data
are taken before and after some treatment, such as drug ntake or electronc therapy, from one group
of patents. We call ths type of data collecton blocking n the expermental desgn. Ths concept of
blockng s dscussed more fully n Chapter 2.
Another queston that s frequently the target of bomedcal research s whether there s a cor-
relaton between two physologcal varables. For example, s there a correlaton between body buld
and mortalty? Or, s there a correlaton between fat ntake and the occurrence of cancerous tumors.
Or, s there a correlaton between the sze of the ventrcular muscle of the heart and the frequency of
abnormal heart rhythms? These type of questons nvolve collectng two set of data and performng
a correlaton analyss to determne how well one set of data may be predcted from another. When
we speak of correlaton analyss, we are referrng to the lnear relaton between two varables and the
ablty to predct one set of data by modelng the data as a lnear functon of the second set of data.
Because correlaton analyss only quantfes the lnear relaton between two processes or data sets,
nonlnear relatons between the two processes may not be evdent. A more detaled descrpton of
correlaton analyss may be found n Chapter 7.
Fnally, a bomedcal engneer, partcularly the engneer nvolved n manufacturng, may be
nterested n knowng whether a manufacturng process s under control. Such a queston may arse
f there are tght controls on the manufacturng specfcatons for a medcal devce. For example,
1.
2.
3.
dATA SuMMARy ANd dESCRIPTIvE STATISTICS 11
f the engneer s tryng to ensure qualty n producng ntravascular catheters that must have d-
ameters between 1 and 2 cm, the engneer may randomly collect samples of catheters from the
assembly lne at random ntervals durng the day, measure ther dameters, determne how many of
the catheters meet specfcatons, and determne whether there s a sudden change n the number
of catheters that fal to meet specfcatons. If there s such a change, the engneers may look for
elements of the manufacturng process that change over tme, changes n envronmental factors, or
user errors. The engneer can use control charts to assess whether the processes are under control.
These methods of statstcal analyss are not covered n ths text, but may be found n a number of
references, ncludng [3].
3.4 How do wE gRAPHICALLy SuMMARIZE dATA?
We can summarze data n graphcal or numercal form. The numercal form s what we refer to as
statstcs. Before blndly applyng the statstcal analyss, t s always good to look at the raw data,
usually n a graphcal form, and then use graphcal methods to summarze the data n an easy to
nterpret format.
The types of graphcal dsplays that are most frequently used by bomedcal engneers nclude
the followng: scatterplots, tme seres, box-and-whsker plots, and hstograms.
Detals for creatng these graphcal summares are descrbed n [36], but we wll brefy
descrbe them here.
3.4.1 Scatterplots
The scatterplot smply graphs the occurrence of one varable wth respect to another. In most cases,
one of the varables may be consdered the ndependent varable (such as tme or subject number),
and the second varable s consdered the dependent varable. Fgure 3.1 llustrates an example of a
scatterplot for two sets of data. In general, we are nterested n whether there s a predctable rela-
tonshp that maps our ndependent varable (such as respratory rate) nto our dependent varable
(such a heart rate). If there s a lnear relatonshp between the two varables, the data ponts should
fall close to a straght lne.
3.4.2 Time Series
A tme seres s used to plot the changes n a varable as a functon of tme. The varable s usually
a physologcal measure, such as electrcal actvaton n the bran or hormone concentraton n the
blood stream, that changes wth tme. Fgure 3.2 llustrates an example of a tme seres plot. In ths
fgure, we are lookng at a smple snusod functon as t changes wth tme.
3.4.3 Box-and-whisker Plots
These plots llustrate the frst, second, and thrd quartles as well as the mnmum and maxmum
values of the data collected. The second quartle (Q2) s also known as the medan of the data. Ths
quantty, as defned later n ths text, s the mddle data pont or sample value when the samples
are lsted n descendng order. The frst quartle (Q1) can be thought of as the medan value of the
samples that fall below the second quartle. Smlarly, the thrd quartle (Q3) can be thought of as
the medan value of the samples that fall above the second quartle. Box-and-whsker plots are use-
ful n that they hghlght whether there s skew to the data or any unusual outlers n the samples
(Fgure 3.3).
-2
-1
0
1
2
5 10 15 20
A
m
p
l
i
t
u
d
e
Time (msec)
FIguRE 3.2: Example of a tme seres plot. The ampltude of the samples s plotted as a functon of
tme.
20 10 0
10
9
8
7
6
5
4
3
2
1
0
Independent Variable
D
e
p
e
n
d
e
n
t

V
a
r
i
a
b
l
e
FIguRE 3.1: Example of a scatterplot.
1
10
9
8
7
6
5
4
3
2
1
0
Category
D
e
p
e
n
d
e
n
t

V
a
r
i
a
b
l
e
Q1
Q2
Q3
Box and Whisker Plot
FIguRE 3.3: Illustraton of a box-and-whsker plot for the data set lsted. The frst (Q1), second (Q2),
and thrd (Q3) quartles are shown. In addton, the whskers extend to the mnmum and maxmum
values of the sample set.
3.4.4 Histogram
The hstogram s defned as a frequency dstrbuton. Gven N samples or measurements, x
i
, whch
range from X
mn
to X
max
, the samples are grouped nto nonoverlappng ntervals (bns), usually of
equal wdth (Fgure 3.4). Typcally, the number of bns s on the order of 714, dependng on the
nature of the data. In addton, we typcally expect to have at least three samples per bn [7]. Stur-
gess rule [6] may also be used to estmate the number of bns and s gven by
k = 1 + 3.3 log(n).
where k s the number of bns and n s the number of samples.
Each bn of the hstogram has a lower boundary, upper boundary, and mdpont. The hsto-
gram s constructed by plottng the number of samples n each bn. Fgure 3.5 llustrates a hstogram
for 1000 samples drawn from a normal dstrbuton wth mean () = 0 and standard devaton () =
1.0. On the horzontal axs, we have the sample value, and on the vertcal axs, we have the number
of occurrences of samples that fall wthn a bn.
Two measures that we fnd useful n descrbng a hstogram are the absolute frequency and
relatve frequency n one or more bns. These quanttes are defned as
f
i
= absolute frequency n ith bn;
f
i
/n = relatve frequency n th bn, where n s the total number of samples beng summarzed
n the hstogram.
a)
b)
A number of algorthms used by bomedcal nstruments for dagnosng or detectng ab-
normaltes n bologcal functon make use of the hstogram of collected data and the assocated
relatve frequences of selected bns [8]. Often tmes, normal and abnormal physologcal functons
(breath sounds, heart rate varablty, frequency content of electrophysologcal sgnals) may be df-
ferentated by comparng the relatve frequences n targeted bns of the hstograms of data repre-
sentng these bologcal processes.
Lower Bound Upper Bound
Midpoint
FIguRE 3.4: One bn of a hstogram plot. The bn s defned by a lower bound, a mdpont, and an
upper bound.
-2 -1 0 1 2 3
0
10
20
Normalized Value
F
r
e
q
u
e
n
c
y
FIguRE 3.5: Example of a hstogram plot. The value of the measure or sample s plotted on the hor-
zontal axs, whereas the frequency of occurrence of that measure or sample s plotted along the vertcal
axs.
The hstogram can exhbt several shapes. The shapes, llustrated n Fgure 3.6, are referred
to as symmetrc, skewed, or bmodal.
A skewed hstogram may be attrbuted to the followng [9]:
mechansms of nterest that generate the data (e.g., the physologcal mechansms that
determne the beat-to-beat ntervals n the heart);
an artfact of the measurement process or a shft n the underlyng mechansm over tme
(e.g., there may be tme-varyng changes n a manufacturng process that lead to a change
n the statstcs of the manufacturng process over tme);
a mxng of populatons from whch samples are drawn (ths s typcally the source of a
bmodal hstogram).
The hstogram s mportant because t serves as a rough estmate of the true probablty den-
sty functon or probablty dstrbuton of the underlyng random process from whch the samples
are beng collected.
The probablty densty functon or probablty dstrbuton s a functon that quantfes the
probablty of a random event, x, occurrng. When the underlyng random event s dscrete n nature,
we refer to the probablty densty functon as the probablty mass functon [10]. In ether case, the
functon descrbes the probablstc nature of the underlyng random varable or event and allows us
to predct the probablty of observng a specfc outcome, x (represented by the random varable),
of an experment. The cumulatve dstrbuton functon s smply the sum of the probabltes for a
group of outcomes, where the outcome s less than or equal to some value, x.
Let us consder a random varable for whch the probablty densty functon s well defned
(for most real-world phenomenon, such a probablty model s not known.) The random varable s
the outcome of a sngle toss of a dce. Gven a sngle far dce wth sx sdes, the probablty of rollng
a sx on the throw of a dce s 1 of 6. In fact, the probablty of throwng a one s also 1 of 6. If we
consder all possble outcomes of the toss of a dce and plot the probablty of observng any one of
those sx outcomes n a sngle toss, we would have a plot such as that shown n Fgure 3.7.
Ths plot shows the probablty densty or probablty mass functon for the toss of a dce.
Ths type of probablty model s known as a unform dstrbuton because each outcome has the
exact same probablty of occurrng (1/6 n ths case).
For the toss of a dce, we know the true probablty dstrbuton. However, for most real-
world random processes, especally bologcal processes, we do not know what the true probablty
densty or mass functon looks lke. As a consequence, we have to use the hstogram, created from a
small sample, to try to estmate the best probablty dstrbuton or probablty model to descrbe the
real-world phenomenon. If we return to the example of the toss of a dce, we can actually toss the
dce a number of tmes and see how close the hstogram, obtaned from expermental data, matches
1.
2.
3.
-4 -3 -2 -1 0 1 2 3
0
100
200
Measure
F
r
e
q
u
e
n
c
y
Symmetric
0 1 5 0
0
100
200
300
400
Measure
F
r
e
q
u
e
n
c
y
Skewed
0 10 20
0
100
200
300
400
Measure
F
r
e
q
u
e
n
c
y
Bimodal
FIguRE 3.6: Examples of a symmetrc (top), skewed (mddle), and bmodal (bottom) hstogram. In
each case, 2000 sampled were drawn from the underlyng populatons.
the true probablty mass functon for the deal sx-sded dce. Fgure 3.8 llustrates the hstograms
for the outcomes of 50 and 1000 tosses of a sngle dce. Note that even wth 50 tosses or samples, t
s dffcult to determne what the true probablty dstrbuton mght look lke. However, as we ap-
proach 1000 samples, the hstogram s approachng the true probablty mass functon (the unform
dstrbuton) for the toss of a dce. But, there s stll some varablty from bn to bn that does not
look as unform as the deal probablty dstrbuton llustrated n Fgure 3.7. The message to take
away from ths llustraton s that most bomedcal research reports the outcomes of a small number
of samples. It s clear from the dce example that the statstcs of the underlyng random process
are very dffcult to dscern from a small sample, yet most bomedcal research reles on data from
small samples.
3.5 gENERAL APPRoACH To STATISTICAL ANALySIS
We have now collected our data and looked at some graphcal summares of the data. Now we wll
use numercal summary, also known as statstcs, to try to descrbe the nature of the underlyng
populaton or process from whch we have taken our samples. From these descrptve statstcs, we
assume a probablty model or probablty dstrbuton for the underlyng populaton or process and
then select the approprate statstcal tests to test hypotheses or make decsons. It s mportant to
note that the conclusons one may draw from a statstcal test depends on how well the assumed
probablty model fts the underlyng populaton or process.
1 2 3 4 5 6
0
1/6
Result of Toss of Single Dice
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
Probability Mass Function
FIguRE 3.7: The probablty densty functon for a dscrete random varable (probablty mass func-
ton). In ths case, the random varable s the value of a toss of a sngle dce. Note that each of the sx pos-
sble outcomes has a probablty of occurrence of 1 of 6. Ths probablty densty functon s also known
as a unform probablty dstrbuton.
6 5 4 3 2 1
0.2
0.1
0.0
Value of Dice Toss
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
Histogram of 50 Dice Tosses
6 5 4 3 2 1
0.2
0.1
0.0
Value of Dice Toss
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
Histogram of 2000 Dice Tosses
FIguRE 3.8: Hstograms representng the outcomes of experments n whch a sngle dce s tossed
50 (top) and 2000 tmes (lower), respectvely. Note that as the sample sze ncreases, the hstogram ap-
proaches the true probablty dstrbuton llustrated n Fgure 3.7.
As stated n the Introducton, bomedcal engneers are tryng to make decsons about popu-
latons or processes to whch they have lmted access. Thus, they desgn experments and collect
samples that they thnk wll farly represent the underlyng populaton or process. Regardless of
what type of statstcal analyss wll result from the nvestgaton or study, all statstcal analyss
should follow the same general approach:
Measure a lmted number of representatve samples from a larger populaton.
Estmate the true statstcs of larger populaton from the sample statstcs.
Some mportant concepts need to be addressed here. The frst concept s somewhat obvous. It s
often mpossble or mpractcal to take measurements or observatons from an entre populaton.
Thus, the bomedcal engneer wll typcally select a smaller, more practcal sample that represents
the underlyng populaton and the extent of varablty n the larger populaton. For example, we
cannot possbly measure the restng body temperature of every person on earth to get an estmate of
normal body temperature and normal range. We are nterested n knowng what the normal body
temperature s, on average, of a healthy human beng and the normal range of restng temperatures
as well as the lkelhood or probablty of measurng a specfc body temperature under healthy, rest-
ng condtons. In tryng to determne the characterstcs or underlyng probablty model for body
temperature for healthy, restng ndvduals, the researcher wll select, at random, a sample of healthy,
restng ndvduals and measure ther ndvdual restng body temperatures wth a thermometer. The
researchers wll have to consder the composton and sze of the sample populaton to adequately
represent the varablty n the overall populaton. The researcher wll have to defne what character-
zes a normal, healthy ndvdual, such as age, sze, race, sex, and other trats. If a researcher were to
collect body temperature data from such a sample of 3000 ndvduals, he or she may plot a hsto-
gram of temperatures measured from the 3000 subjects and end up wth the followng hstogram
(Fgure 3.9).The researcher may also calculate some basc descrptve statstcs for the 3000 samples,
such as sample average (mean), medan, and standard devaton.
1.
2.
95 96 97 98 99 100 101 102
0.0
0.1
0.2
0.3
0.4
0.5
Temperature (F)
D
e
n
s
i
t
y
Body Temperature
FIguRE 3.9: Hstogram for 2000 nternal body temperatures collected from a normally dstrbuted
populaton.
Once the researcher has estmated the sample statstcs from the sample populaton, he or she
wll try to draw conclusons about the larger (true) populaton. The most mportant queston to ask
when revewng the statstcs and conclusons drawn from the sample populaton s how well the
sample populaton represents the larger, underlyng populaton.
Once the data have been collected, we use some basc descrptve statstcs to summarze the
data. These basc descrptve statstcs nclude the followng general measures: central tendency,
varablty, and correlaton.
3.6 dESCRIPTIvE STATISTICS
There are a number of descrptve statstcs that help us to pcture the dstrbuton of the underlyng
populaton. In other words, our ultmate goal s to assume an underlyng probablty model for the
populaton and then select the statstcal analyses that are approprate for that probablty model.
When we try to draw conclusons about the larger underlyng populaton or process from our
smaller sample of data, we assume that the underlyng model for any sample, event, or measure
(the outcome of the experment) s as follows:
X = ndvdual dfferences stuatonal factors unknown varables,
where X s our measure or sample value and s nfuenced by , whch s the true populaton mean;
ndvdual dfferences such as genetcs, tranng, motvaton, and physcal condton; stuaton factors,
such as envronmental factors; and unknown varables such as undentfed/nonquantfed factors
that behave n an unpredctable fashon from moment to moment.
In other words, when we make a measurement or observaton, the measured value represents
or s nfuenced by not only the statstcs of the underlyng populaton, such as the populaton
mean, but factors such as bologcal varablty from ndvdual to ndvdual, envronmental factors
(tme, temperature, humdty, lghtng, drugs, etc.), and random factors that cannot be predcted
exactly from moment to moment. All of these factors wll gve rse to a hstogram for the sample
data, whch may or may not refect the true probablty densty functon of the underlyng popula-
ton. If we have done a good job wth our expermental desgn and collected a suffcent number of
samples, the hstogram and descrptve statstcs for the sample populaton should closely refect the
true probablty densty functon and descrptve statstcs for the true or underlyng populaton. If
ths s the case, then we can make conclusons about the larger populaton from the smaller sample
populaton. If the sample populaton does not refect varablty of the true populaton, then the
conclusons we draw from statstcal analyss of the sample data may be of lttle value.
There are a number of probablty models that are useful for descrbng bologcal and manu-
facturng processes. These nclude the normal, Posson, exponental, and gamma dstrbutons [10].
In ths book, we wll focus on populatons that follow a normal dstrbuton because ths s the most
frequently encountered probablty dstrbuton used n descrbng populatons. Moreover, the most
frequently used methods of statstcal analyss assume that the data are well modeled by a normal
(bell-curve) dstrbuton. It s mportant to note that many bologcal processes are not well mod-
eled by a normal dstrbuton (such as heart rate varablty), and the statstcs assocated wth the
normal dstrbuton are not approprate for such processes. In such cases, nonparametrc statstcs,
whch do not assume a specfc type of dstrbuton for the data, may serve the researcher better n
understandng processes and makng decsons. However, usng the normal dstrbuton and ts asso-
cated statstcs are often adequate gven the central lmt theorem, whch smply states that the sum
of random processes wth arbtrary dstrbutons wll result n a random varable wth a normal ds-
trbuton. One can assume that most bologcal phenomena result from a sum of random processes.
3.6.1 Measures of Central Tendency
There are several measures that refect the central tendency or concentraton of a sample populaton:
sample mean (arthmetc average), sample medan, and sample mode.
The sample mean may be estmated from a group of samples, x
i
, where i s sample number,
usng the formula below.
Gven n data ponts, x
1
, x
2
,, x
n
:
x
n
x
i
i
n
=
=
1
1
.
In practce, we typcally do not know the true mean, , of the underlyng populaton, nstead we
try to estmate true mean, , of the larger populaton. As the sample sze becomes large, the sample
mean, x, should approach the true mean, , assumng that the statstcs of the underlyng populaton
or process do not change over tme or space.
One of the problems wth usng the sample mean to represent the central tendency of a
populaton s that the sample mean s susceptble to outlers. Ths can be problematc and often
decevng when reportng the average of a populaton that s heavly skewed. For example, when
reportng ncome for a group of new college graduates for whch one s an NBA player who has just
sgned a multmllon-dollar contract, the estmated mean ncome wll be much greater than what
most graduates earns. The same msrepresentaton s often evdent when reportng mean value for
homes n a specfc geographc regon where a few homes valued on the order of a mllon can hde
the fact that several hundred other homes are valued at less than $200,000.
Another useful measure for summarzng the central tendency of a populaton s the sample
medan. The medan value of a group of observatons or samples, x
i
, s the mddle observaton when
samples, x
i
, are lsted n descendng order.
For example, f we have the followng values for tdal volume of the lung:
2, 1.5, 1.3, 1.8, 2.2, 2.5, 1.4, 1.3,
we can fnd the medan value by frst orderng the data n descendng order:
2.5, 2.2, 2.0, 1.8, 1.5, 1.4, 1.3, 1.3,
and then we cross of values on each end untl we reach a mddle value:
2.5, 2.2, 2.0, 1.8, 1.5, 1.4, 1.3, 1.3.
In ths case, there are two mddle values; thus, the medan s the average of those two values, whch
s 1.65.
Note that f the number of samples, n, s odd, the medan wll be the mddle observaton. If
the sample sze, n, s even, then the medan equals the average of two mddle observatons. Com-
pared wth the sample mean, the sample medan s less susceptble to outlers. It gnores the skew n
a group of samples or n the probablty densty functon of the underlyng populaton. In general,
to farly represent the central tendency of a collecton of samples or the underlyng populaton, we
use the followng rule of thumb:
If the sample hstogram or probablty densty functon of the underlyng populaton s
symmetrc, use mean as a central measure. For such populatons, the mean and medan
are about equal, and the mean estmate makes use of all the data.
If the sample hstogram or probablty densty functon of the underlyng populaton s
skewed, medan s a more far measure of center of dstrbuton.
Another measure of central tendency s mode, whch s smply the most frequent observaton n
a collecton of samples. In the tdal volume example gven above, 1.3 s the most frequently occurrng
sample value. Mode s not used as frequently as mean or medan n representng central tendency.
3.6.2 Measures of variability
Measures of central tendency alone are nsuffcent for representng the statstcs of a populaton or
process. In fact, t s usually the varablty n the populaton that makes thngs nterestng and leads
1.
2.
to uncertanty n decson makng. The varablty from subject to subject, especally n physologcal
functon, s what makes fndng fool-proof dagnoss and treatment often so dffcult. What works
for one person often fals for another, and, t s not the mean or medan that pcks up on those
subject-to-subject dfferences, but rather the varablty, whch s refected n dfferences n the prob-
ablty models underlyng those dfferent populatons.
When summarzng the varablty of a populaton or process, we typcally ask, How far from
the center (sample mean) do the samples (data) le? To answer ths queston, we typcally use the
followng estmates that represent the spread of the sample data: nterquartle ranges, sample var-
ance, and sample standard devaton.
The nterquartle range s the dfference between the frst and thrd quartles of the sample
data. For sampled data, the medan s also known as the second quartle, Q2. Gven Q2, we can fnd
the frst quartle, Q1, by smply takng the medan value of those samples that le below the second
quartle. We can fnd the thrd quartle, Q3, by takng the medan value of those samples that le
above the second quartle. As an llustraton, we have the followng samples:
1, 3, 3, 2, 5, 1, 1, 4, 3, 2.
If we lst these samples n descendng order,
5, 4, 3, 3, 3, 2, 2, 1, 1, 1,
the medan value and second quartle for these samples s 2.5. The frst quartle, Q1, can be found
by takng the medan of the followng samples,
2.5, 2, 2, 1, 1, 1,
whch s 1.5. In addton, the thrd quartle, Q3, may be found by takng the medan value of the
followng samples:
5, 4, 3, 3, 3, 2.5,
whch s 3. Thus, the nterquartle range, Q3 Q1 = 3 1.5 = 2.
Sample varance, s
2
, s defned as the average dstance of data from the mean and the formula
for estmatng s
2
from a collecton of samples, x
i
, s
s
n
x x
i
i
n
2 2
1
1
1
=
( ) .
Sample standard devaton, s, whch s more commonly referred to n descrbng the varablty of
the data s
=
2
s s (same unts as orgnal samples).
It s mportant to note that for normal dstrbutons (symmetrcal hstograms), sample mean
and sample devaton are the only parameters needed to descrbe the statstcs of the underlyng
phenomenon. Thus, f one were to compare two or more normally dstrbuted populatons, one only
need to test the equvalence of the means and varances of those populatons.

25
Now that we have collected the data, graphed the hstogram, estmated measures of central ten-
dency and varablty, such as mean, medan, and standard devaton, we are ready to assume a
probablty model for the underlyng populaton or process from whch we have obtaned samples.
At ths pont, we wll make a rough assumpton usng smple measures of mean, medan, standard
devaton and the hstogram. But t s mportant to note that there are more rgorous tests, such as
the c
2
test for normalty [7] to determne whether a partcular probablty model s approprate to
assume from a collecton of sample data.
Once we have assumed an approprate probablty model, we may select the approprate
statstcal tests that wll allow us to test hypotheses and draw conclusons wth some level of con-
fdence. The probablty model wll dctate what level of confdence we have when acceptng or
rejectng a hypothess.
There are two fundamental questons that we are tryng to address when assumng a prob-
ablty model for our underlyng populaton:
How confdent are we that the sample statstcs are representatve of the entre
populaton?
Are the dfferences n the statstcs between two populatons sgnfcant, resultng from
factors other than chance alone?
To declare any level of confdence n makng statstcal nference, we need a mathematcal model
that descrbes the probablty that any data value mght occur. These models are called probablty
dstrbutons.
There are a number of probablty models that are frequently assumed to descrbe bologcal
processes. For example, when descrbng heart rate varablty, the probablty of observng a specfc
tme nterval between consecutve heartbeats mght be descrbed by an exponental dstrbuton [1, 8].
Fgure 3.6 n Chapter 3 llustrates a hstogram for samples drawn from an exponental dstrbuton.
1.
2.
Assuming a Probability Model
From the Sample data
C H A P T E R 4
Note that ths dstrbuton s hghly skewed to the rght. For R-R ntervals, such a probablty func-
ton makes sense physologcally because the ndvdual heart cells have a refractory perod that pre-
vents them from contractng n less that a mnmum tme nterval. Yet, a very prolonged tme nterval
may occur between beats, gvng rse to some long tme ntervals that occur nfrequently.
The most frequently assumed probablty model for most scentfc and engneerng applca-
tons s the normal or Gaussan dstrbuton. Ths dstrbuton s llustrated by the sold black lne n
Fgure 4.1 and often referred to as the bell curve because t looks lke a muscal bell.
The equaton that gves the probablty, f (x), of observng a specfc value of x from the un-
derlyng normal populaton s

f x
x
( ) , =
-
-
1
2
1
2
2

< x <
where s the true mean of the underlyng populaton or process and s the standard devaton
of the same populaton or process. A graph of ths equaton s gven llustrated by the sold, smooth
curve n Fgure 4.1. The area under the curve equals one.
Note that the normal dstrbuton s
a symmetrc, bell-shaped curve completely descrbed by ts mean, , and standard deva-
ton, .
by changng and , we stretch and slde the dstrbuton.
1.
2.
0
0.05
0.1
Normalized Measure
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
Histogram of Measure, with Normal Curve
-4 -3 -2 -1 0 1 2 3
FIguRE 4.1: A hstogram of 1000 samples drawn from a normal dstrbuton s llustrated. Super-
mposed on the hstogram s the deal normal curve representng the normal probablty dstrbuton
functon.
ASSuMINg A PRoBABILITy ModEL FRoM THE SAMPLE dATA 27
Fgure 4.1 also llustrates a hstogram that s obtaned when we randomly select 1000 samples
from a populaton that s normally dstrbuted and has a mean of 0 and a varance of 1. It s mpor-
tant to recognze that as we ncrease the sample sze n, the hstogram approaches the deal normal
dstrbuton shown wth the sold, smooth lne. But, at small sample szes, the hstogram may look
very dfferent from the normal curve. Thus, from small sample szes, t may be dffcult to determne
f the assumed model s approprate for the underlyng populaton or process, and any statstcal
tests that we perform may not allow us to test hypotheses and draw conclusons wth any real level
of confdence.
We can perform lnear operatons on our normally dstrbuted random varable, x, to produce
another normally dstrbuted random varable, y. These operatons nclude multplcaton of x by a
constant and addton of a constant (offset) to x. Fgure 4.2 llustrates hstograms for samples drawn
from each of populatons x and y. We note that the dstrbuton for y s shfted (the mean s now
equal to 5) and the varance has ncreased wth respect to x.
One test that we may use to determne how well a normal probablty model fts our data
s to count how many samples fall wthn 1 and 2 standard devatons of the mean. If the data
and underlyng populaton or process s well modeled by a normal dstrbuton, 68% of the samples
should le wthn 1 standard devaton from the mean and 95% of the samples should le wthn
10 5 0
200
100
0
x or y value
F
r
e
q
u
e
n
c
y
y = 2x + 5
x
FIguRE 4.2: Hstograms are shown for samples drawn from populatons x and y, where y s smply a ln-
ear functon of x. Note that the mean and varance of y dffer from x, yet both are normal dstrbutons.
2 standard devatons from the mean. These percentages are llustrated n Fgure 4.3. It s mpor-
tant to remember these few numbers, because we wll frequently use ths 95% nterval when draw-
ng conclusons from our statstcal analyss.
Another means for determnng how well our sampled data, x, represent a normal dstrbu-
ton s the estmate Pearsons coeffcent of skew (PCS) [5]. The coeffcent of skew s gven by
PCS
median
=
3 x x
s
.
If the PCS > 0.5, we assume that our samples were not drawn from a normally dstrbuted populaton.
When we collect data, the data are typcally collected n many dfferent types of physcal unts
(volts, celsus, newtons, centmeters, grams, etc.). For us to use tables that have been developed for
probablty models, we need to normalze the data so that the normalzed data wll have a mean of
0 and a standard devaton of 1. Such a normal dstrbuton s called a standard normal dstrbuton
and s llustrated n Fgure 4.1.
3 2 1 0 -1 -2 -3
90
80
70
60
50
40
30
20
10
0
Normalized value (Z score)
F
r
e
q
u
e
n
c
y
68 %
95%
FIguRE 4.3: Hstogram for samples drawn from a normally dstrbuted populaton. For a normal ds-
trbuton, 68% of the samples should le wthn 1 standard devaton from the mean (0 n ths case) and
95% of the samples should le wthn 2 standard devatons (1.96 to be precse) of the mean.
The standard normal dstrbuton has a bell-shaped, symmetrc dstrbuton wth = 0 and
= 1.
To convert normally dstrbuted data to the standard normal value, we use the followng
formulas,
z = (x )/ or z = (x x
)/s,
dependng on f we know the true mean, , and standard devaton, a, or we only have the sample
estmates, x
or s.
For any ndvdual sample or data pont, x
i
, from a sample wth mean, x
, and standard deva-

ton, s, we can determne ts z score from the followng formula:

z
x x
s
i
i
=

.
For an ndvdual sample, the z score s a normalzed or standardzed value. We can use ths value
wth our equatons for probablty densty functon or our standardzed probablty tables [3] to de-
termne the probablty of observng such a sample value from the underlyng populaton.
The z score can also be thought of as a measure of the dstance of the ndvdual sample, x
i
,
from the sample average, x
, n unts of standard devaton. For example, f a sample pont, x

i
has a z
score of z
i
= 2, t means that the data pont, x
i
, s 2 standard devatons from the sample mean.
We use normalzed z scores nstead of the orgnal data when performng statstcal analyss
because the tables for the normalzed data are already worked out and avalable n most statstcs
texts or statstcal software packages. In addton, by usng normalzed values, we need not worry
about the absolute ampltude of the data or the unts used to measure the data.
4.1 THE STANdARd NoRMAL dISTRIBuTIoN
The standard normal dstrbuton s llustrated n Table 4.1.
The z table assocated wth ths fgure provdes table entres that gve the probablty that z
a, whch equals the area under the normal curve to the left of z = a. If our data come from a normal
dstrbuton, the table tells us the probablty or chance of our sample value or expermental out-
comes havng a value less than or equal to a.
Thus, we can take any sample and compute ts z score as descrbed above and then use the
z table to fnd the probablty of observng a z value that s less than or equal to some normalzed
value, a. For example, the probablty of observng a z value that s less than or equal to 1.96 s
97.5%. Thus, the probablty of observng a z value greater than 1.96 s 2.5%. In addton, because of
symmetry n the dstrbuton, we know that the probablty of observng a z value greater than 1.96
s also 97.5%, and the probablty of observng a z value less than or equal to 1.96 s 2.5%. Fnally,

3 2
1
0 -1 -2 -3 -4
Measure
F
r
e
q
u
e
n
c
y
Z Distribution
Z
Area to left of z
a
equals the Pr(z < z
a
) = 1 a; thus, the area n the tal to the rght of z
a
equals a.
TABLE 4.1: Standard z dstrbuton functon: areas under standardzed normal densty functon
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.l5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
the probablty of observng a z value between 1.96 and 1.96 s 95%. The reader should study the
z table and assocated graph of the z dstrbuton to verfy that the probabltes (or areas under the
probablty densty functon) descrbed above are correct.
Often, we need to determne the probablty that an expermental outcome falls between two
values or that the outcome s greater than some value a or less or greater than some value b. To fnd
these areas, we can use the followng mportant formulas, where Pr s the probablty:
Pr(a z b) = Pr(z b) Pr(z a)
= area between z = a and z = b.
Pr(z a) = 1 Pr(z < a)
= area to rght of z = a
= area n the rght tal.
Thus, for any observaton or measurement, x, from any normal dstrbuton:
Pr( ) Pr , a x b
a
z
b
=

where s the mean of normal dstrbuton and s the standard devaton of normal dstrbuton.
In other words, we need to normalze or fnd the z values for each of our parameters, a and b,
to fnd the area under the standard normal curve (z dstrbuton) that represents the expresson on
the left sde of the above equaton.
Example 4.1 The mean ntake of fat for males 6 to 9 years old s 28 g, wth a standard devaton
of 13.2 g. Assume that the ntake s normally dstrbuted. Steves ntake s 42 g and Bens ntake s
25 g.
AREA IN RIgHT TAIL, a Z
a
0.10 1.282
0.05 1.645
0.025 1.96
0.010 2.326
0.005 2.576
Commonly used z values:
What s the proporton of area between Steves daly ntake and Bens daly ntake?
If we were to randomly select a male between the ages of 6 and 9 years, what s the prob-
ablty that hs fat ntake would be 50 g or more?
Solution: x = fat ntake
The problem may be stated as: what s Pr(25 x 42)?
Assumng a normal dstrbuton, we convert to z scores:
What s Pr(((25 28)/13.2) < z < ((42 28)/13.2)))?
= Pr (0.227 z 1.06) = Pr (z 1.06) Pr (z 0.227) (usng formula for
Pr (a z b))
` = Pr (z 1.06) [1 Pr(z 0.227)] = 0.8554 [1 0.5910] = 0.4464 or 44.6% of
area under the z curve.
2. The problem may be stated as, What s Pr (x > 50)?
Normalzng to z score, what s Pr (z > (50 28)/13.2)?
= Pr (z > 1.67)
= 1 Pr (z 1.67) = 1 0.9525 = 0.0475, or 4.75% of the area
under the z curve.
Example 4.2 Suppose that the specfcatons on the range of dsplacement for a lmb ndentor are
0.5 0.001 mm. If these dsplacements are normally dstrbuted, wth mean = 0.47 and standard
devaton = 0.002, what percentage of ndentors are wthn specfcatons?
Solution: x = dsplacement.
The problem may be stated as, What s Pr(0.499 x 0.501)?
Usng z scores, Pr(0.499 x 0.501) = Pr((0.499 0.47)/0.002 z (0.501 0.47/
0.002))
= Pr (14.5 z 15.5) = Pr (z 15.5) Pr (z 14.5) = 1 1 = 0
It s useful to note that f the dstrbuton of the underlyng populaton and the assocated sample
data are not normal (.e. skewed), transformatons may often be used to make the data normal,
and the statstcs covered n ths text may then be used to perform statstcal analyss on the trans-
formed data. These transformatons on the raw data nclude logs, square root, and recprocal.
4.2 THE NoRMAL dISTRIBuTIoN ANd SAMPLE MEAN
All statstcal analyss follows the same general procedure:
Assume an underlyng dstrbuton for the data and assocated parameters (e.g., the
sample mean).
1.
2.
1.
1.
Scale the data or parameter to a standard dstrbuton.
Estmate confdence ntervals usng a standard table for the assumed dstrbuton. (The
queston we ask s, What s the probablty of observng the expermental outcome by
chance alone?)
Perform hypothess test (e.g., Students t test).
We are begnnng wth and focusng most of ths text on the normal dstrbuton or probablty
model because of ts prevalence n the bomedcal feld and somethng called the central lmt
theorem. One of the most basc statstcal tests we perform s a comparson of the means from
two or more populatons. The sample mean s t tself an estmate made from a fnte number
of samples. Thus, the sample mean, x
, s tself a random varable that s modeled wth a normal

dstrbuton [4].
Is ths model for x
legtmate? The answer s yes, for large samples, because of the central
lmt theorem, whch states [4, 10]:
If the individual data points or samples (each sample is a random variable), x
, come from any arbitrary

probability distribution, the sum (and hence, average) of those data points is normally distributed as the
sample size, n, becomes large.
Thus, even f each sample, such as the toss of a dce, comes from a nonnormal dstrbuton
(e.g., a unform dstrbuton, such as the toss of a dce), the sum of those ndvdual samples (such
as the sum we use to estmate the sample mean, x
) wll have a normal dstrbuton. One can eas-

ly assume that many of the bologcal or physologcal processes that we measure are the sum of a
number of random processes wth varous probablty dstrbutons; thus, the assumpton that our
samples come from a normal dstrbuton s not unreasonable.
4.3 CoNFIdENCE INTERvAL FoR THE SAMPLE MEAN
Every sample statstc s n tself a random varable wth some sort of probablty dstrbuton. Thus,
when we use samples to estmate the true statstcs of a populaton (whch n practce are usually not
known and not obtanable), we want to have some level of confdence that our sample estmates are
close to the true populaton statstcs or are representatve of the underlyng populaton or process.
In estmatng a confdence nterval for the sample mean, we are askng the queston: How
close s our sample mean (estmated from a fnte number of samples) to the true mean of the
populaton?
To assgn a level of confdence to our statstcal estmates or statstcal conclusons, we need
to frst assume a probablty dstrbuton or model for our samples and underlyng populaton and
then we need to estmate a confdence nterval usng the assumed dstrbuton.
2.
3.
4.
The smplest confdence nterval that we can begn wth regardng our descrptve statstcs s
a confdence nterval for the sample mean, x
. Our queston s how close s the sample mean to the

true populaton or process mean, ?
Before we can answer ths, we need to assume an underlyng probablty model for the sample
mean, x
, and true mean, . As stated earler, t may be shown that for a large samples sze, the sample
mean, x
, s well modeled by a normal dstrbuton or probablty model. Thus, we wll use ths model
when estmatng a confdence nterval for our sample mean, x
.
Thus, x
s estmated from the sample. We then ask, how close s x
(sample mean) to the true

populaton mean, ?
It may be shown that f we took many groups of n samples and estmated x
for each group,

the average or sample mean of x
= , and
the standard devaton of x
= s n.
Thus, as our sample sze, n, gets large, the dstrbuton for x
approaches a normal dstrbuton.

For large n, x
follows a normal dstrbuton, and the z score for x
may be used to estmate the

followng:

Pr( ) Pr
/ /
. a x b
a
n
z
b
n
=

Ths expresson assumes a large n and that we know .

Now we look at the case where we mght have a large n, but we do not know . In such cases,
we replace wth s to get the followng expresson:

Pr( ) Pr
/ /
, a x b
a
s n
z
b
s n
=

where s n s called the sample standard error and represents the standard devaton for x
.
Let us assume now for large n, we want to estmate the 95% confdence nterval for x
. We
frst scale the sample mean, x
, to a z value (because the central lmt theorem says that x
s normally
dstrbuted)
z
x
s n
=

/
.
1.
2.
We recall that 95% of z values fall between 1.96 (approxmately 2) of the mean, and for the z
dstrbuton,
Pr(1.96 z1.96) = 0.95.
Substtutng for z,
z
x
s n
=

/
.
we get
0 95 1 96 1 96 . Pr .
/
. . =

x
s n
If we use the followng notaton n terms of the sample standard error:

SE( ) . x
s
n
=
Rearrangng terms for the expresson above, we note that the probablty that les between 1.96
(or 2) standard devatons of x
s 95%:
0 95 1 96 1 96 . Pr . SE( ) . SE( ). = +
( )
x x x x
Note that 1.96 s referred to as z
a/2
. Ths z value s the value of z for whch the area n the rght
tal of the normal dstrbuton s a/2. If we were to estmate the 99% confdence nterval, we would
substtute z
0.01/2
, whch s 2.576, nto the 1.96 poston above.
Thus, For large n and any confdence level, 1 a, the 1 a confdence nterval for the true
populaton mean, , s gven by:

= x z SE x
/
( ).
2
Ths means that there s a (1 a)percent probablty that the true mean les wthn the above nter-
val centered about x
.
Example 4.3 Estmate of confdence ntervals
Problem: Gven a collecton of data wth, x
= 505 and s = 100. If the number of samples was 1000,

what s the 95% confdence nterval for the populaton mean, ?
Solution: If we assume a large sample sze, we may use the z dstrbuton to estmate the confdence
nterval for the sample mean usng the followng equaton:

= x z x
/
SE( ).
2
We plug n the followng values:
x

=
505;
SE( ) / / . x s n = = 100 1000
For 95% confdence nterval, a = 0.05.
Usng a z table to locate z(0.05/2), we fnd that the value of z that gves an area of 0.025 n
the rght tal s 1.96
Pluggng n x
, SE(x
), and z(a/2) nto the estmate for the confdence nterval above, we fnd
that the 95% confdence nterval for = [498.80, 511.20].
Note that f we wanted to estmate the 99% confdence nterval, we would smply use a df-
ferent z value, z (0.01/2) n the same equaton. The z value assocated wth an area of 0.005 n the
rght tal s 2.576. If we use ths z value, we estmate a confdence nterval for of [496.86, 515.14].
We note that the confdence nterval has wdened as we ncreased our confdence level.
4.4 THE t dISTRIBuTIoN
For small samples, x
s no longer normally dstrbuted. Therefore, we use Students t dstrbuton to

estmate the true statstcs of the populaton. The t dstrbuton, as llustrated n Table 4.2 looks lke
a z dstrbuton but wth slower taper at the tals and fatter central regon.

Measure
F
r
e
q
u
e
n
c
y
t Distribution
t
Curve changes with df

3 2
1
0 -1 -2 -3 -4
Table entry = t(a; df ), where a s the area n the tal to the rght of t(a; df ) and df s degrees
of freedom.
TABLE 4.2: Percentage ponts for Students t dstrbuton
df
a = area to rght of t(a; df )
0.10 0.05 0.025 0.01 0.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
10 1.372 1.812 2.228 2.764 3.169

11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
30 1.310 1.697 2.042 2.457 2.750

40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
1.282 1.645 1.960 2.326 2.576

z
a
(large sample) 1.282 1.645 1.960 2.326 2.576
We use t dstrbuton, wth slower tapered tals, because wth so few samples, we have less
certanty about our underlyng dstrbuton (probablty model).
Now our normalzed value for x
s gven by
s n

/
,
x
whch s known to have a t dstrbuton rather than the z dstrbuton that we have dscussed thus
far. The t dstrbuton was frst nvented by W.S. Gosset [4, 11], a chemst who worked for a brew-
ery. Gosset decded to publsh hs t dstrbuton under the alas of Student. Hence, we often refer to
ths dstrbuton as Students t dstrbuton.
The t dstrbuton s symmetrc lke the z dstrbuton and generally has a bell shape. But the
amount of spread of the dstrbuton to the tals, or the wdth of the bell, depends on the sample
sze, n. Unlke the z dstrbuton, whch assumes an nfnte sample sze, the t dstrbuton changes
shape wth sample sze. The result s that the confdence ntervals estmated wth t values are more
spread out than for z dstrbuton, especally for small sample szes, because wth such samples
szes, we are penalzed for not havng suffcent samples to represent the extent of varablty of
the underlyng populaton or process. Thus, when we are estmatng confdence ntervals for the
sample mean, x
, we do not have as much confdence n our estmate. Thus, the nterval wdens to
refect ths decreased certanty wth smaller sample szes. In the next secton, we estmate the con-
fdence nterval for the same example gven prevously, but usng the t dstrbuton nstead of the z
dstrbuton.
4.5 CoNFIdENCE INTERvAL uSINg t dISTRIBuTIoN
Lke the z tables, there are t tables where the values of t that are assocated wth dfferent areas
under the probablty curve are already calculated and may be used for statstcal analyss wthout
the need to recalculate the t values. The dfference between the z table and t table s that now the
t values are a functon of the samples sze or degrees of freedom. Table 4.2 gves a few lnes of the
t table from [3].
To use the table, one smply looks for the ntersecton of degrees of freedom, df, (related to
sample sze) and a value that one desres n the rght-hand tal. The ntersecton provdes the t value
for whch the area under the t curve n the rght tal s a. In other words, the probablty that t wll
be less than or equal to a specfc entry n the table s 1 a. For a specfc sample sze, n, the degrees
of freedom, df = n 1.
Now we smply substtute t for z to fnd our confdence ntervals. So, the confdence nterval
for the sample mean, x
, usng the t dstrbuton now becomes


=
1 SE( )

x t

2
n ;
where s the true mean of the underlyng populaton or process from whch we are drawng sam-
ples, SE(x
) s the standard error of the underlyng populaton or process, t s the t value for whch
there s an area of a/2 n the rght tal, and n s the sample sze.
Example 4.4 Confdence nterval usng t dstrbuton
Problem: We consder the same example used prevously for estmatng confdence ntervals usng
z values. In ths case, the sample sze s small (n = 20), so we now use a t dstrbuton.
Solution: Now, our estmate for confdence nterval for

=
1 SE( )

x t

2
n ;
.
Agan, we plug n the followng values:
x

=
505;
SE( ) / / . x s n = = 100 20
For 95% confdence nterval, a = 0.05.
Usng a t table to locate t(0.05/2, 20 1), we fnd that for 19 df, the value of t that gves an
area of 0.025 n the rght tal s 2.093.
Pluggng n x
, SE(x
), and t(a, n 1) nto the estmate for the confdence nterval above, we
fnd that the 95% confdence nterval for = [458.20, 551.80] We note that ths confdence nterval
s wdened compared wth that prevously estmated usng the z values. Ths s expected because the
t dstrbuton s wder than the z dstrbuton at small sample szes, refectng the fact that we have
less confdence n our estmate of x
and, hence, when the sample sze s small.

Confdence ntervals may be estmated for most descrptve statstcs, such as the sample
mean, sample varance, and even the lne of best ft determned through lnear regresson [3]. As
noted above, the confdence nterval refects the varablty n the parameter estmate and s n-
fuenced by the sample sze, the varablty of the populaton, and the level on confdence that we
desre. The greater the desred confdence, the wder the nterval. Lkewse the greater the varablty
n the samples, the wder the confdence nterval.
Example 4.5 Revew of probablty concepts
Problem: Assembly tmes were measured for a sample of 15 glucose nfuson pumps. The mean tme
to assemble a glucose nfuson pump was 15.8 mnutes, wth a standard devaton of 2.4 mnutes.
Assumng a relatvely symmetrc dstrbuton for assembly tmes,
What percentage of nfuson pumps requre more than 17 seconds to assemble?
What s the 99% confdence nterval for the true mean assembly tme ( )?
What s the 99% confdence nterval for mean assembly tme f the sample sze s 2500?
1.
2.
3.
Solution:
1. x = assembly tme
What s the Pr (x > 17)?
Pr (x > 17) = Pr (z > (17 15.8)/2.4) = Pr (z > 0.5) = 1 Pr (z 0.5) = 1 0.6915) = 0.3085,
or 38.05% of the nfuson pumps.
2. = x
t(a/2, n 1)SE(x)
= 15.8 t((0.01)/2; 15 1) ( 15 2.4
)
= 15.8 t(0.005, 14) (0.6196)
= 15.8 2.977 (0.6196)
= [13.96, 17.64]
Because the samples sze of 2500 s now large, we use a z value for estmatng the conf-
dence nterval, the = x
z (a/2)SE(x)
3. = x
z (a/2)SE(x)
= 15.8 z (0.01/2) (2.4
2500)
= 15.8 2.576 (0.048)
= [15.68, 15.92]

41
C H A P T E R 5
Now that we have collected our data, estmated some basc descrptve statstcs and assumed a
probablty model for the underlyng populaton or process, we are prepared to perform some stats-
tcal analyss that wll allow us to compare populatons and test hypotheses. For ths text, we are only
gong to dscuss statstcal analyss for whch the normal dstrbuton s a good probablty model
for the underlyng populaton(s). If the data or samples suggest that the underlyng populaton or
process s not well modeled by a normal dstrbuton, then we wll need to resort to other types of
statstcal analyss, such a nonparametrc technques [6], whch do not assume an underlyng prob-
ablty model. Some of these tests nclude the Wlcoxon rank sum test, MannWhtney U test,
KruskalWalls test, and the runs test [6, 12]. More advanced texts provde detals for admnsterng
these nonparametrc tests. Keep n mnd that f the statstcal analyss presented n ths text s used
for populatons or processes that are not normally dstrbuted, the results may be of lttle value, and
the nvestgator may mss mportant fndngs from the data.
Assumng our data represents a normally dstrbuted populaton or process, we are now pre-
pared to perform a varety of statstcal tests that allow us to test hypotheses about the equalty of
means and varances across two or more populatons. Remember that for normal dstrbutons, only
the mean and standard devaton are requred to completely characterze the probablstc nature of
the populaton or process. Thus, f we are to compare two normally dstrbuted populatons, we need
only compare the means and varances of the populatons. If the populatons or processes are not
normally dstrbuted, there may be other parameters, such as skew and kurtoss, whch dfferentate
two or more populatons or processes.
5.1 CoMPARISoN oF PoPuLATIoN MEANS
One of the most fundamental questons asked by scentsts or engneers performng experments s
whether two populatons, methods, or treatments are really dfferent n central tendency. More spe-
cfcally, are the two populaton means, refected n the sample means collected under two dfferent
expermental condtons, sgnfcantly dfferent? Or, s the observed dfference between two means
smply because of chance alone?
Statistical Inference
Some examples or questons asked by bomedcal engneers that requre comparng two pop-
ulaton means nclude the followng:
Is one chemotherapy drug more effectve than a second chemotherapy drug n reducng the
sze of a cancerous tumor?
Is there a dfference n postural stablty between young and elderly populatons?
Is there a dfference n bone densty for women who are premenopausal versus those who
are postmenopausal?
Is ttanum a stronger materal for bone mplants than steel?
For MRI, does one type of pulse sequence perform better than another n detectng whte
matter tracts n the bran?
Do drug-elutng ntravascular stents prevent restenoss more effectvely than noncoated stents?
In answerng these type of questons, bomedcal engneers often collect samples from two
dfferent groups of subjects or from one group of subjects but under two dfferent condtons. For a
specfc measure that represents the samples, a sample mean s typcally estmated for each group or
under each of two condtons. Comparng the means from two populatons refected n two sets of
data s probably the most reported statstcal analyss n the scentfc and engneerng lterature and
may be accomplshed usng somethng known as a t test.
5.1.1 The t Test
When performng the t test, we are askng the queston, Are the means of two populatons really
dfferent? or Would we see the observed dfferences smply because of random chance? The two
populatons are refected n data that have been collected under two dfferent condtons. These
condtons may nclude two dfferent treatments or processes.
To address ths queston, we use one of the followng two tests, dependng on whether the
two data sets are ndependent or dependent on one another:
Unpared t test for two sets of ndependent data
Pared t test for two sets of dependent data
Before we descrbe each type of t test, we need to dscuss the noton of hypothess testng.
5.1.1.1 Hypothesis Testing
Whenever we perform statstcal analyss, we are testng some hypothess. In fact, before we even
collect our data, we formulate a hypothess and then carefully desgn our experment and collect
1.
2.
3.
4.
5.
6.
1.
2.
and analyze the data to test the hypothess. The outcome of the statstcal test wll allow us, f the
assumed probablty model s vald, to accept or reject the hypothess and do so wth some level of
confdence.
There are bascally two forms of hypotheses that we test when performng statstcal analyss.
There s the null hypothess and the alternatve hypothess.
The null hypothess, denoted as H
0
, s expressed as follows for the t test comparng two
populaton means,
1
and
2
:
H
0
:
1
=
2
.
The alternatve hypothess, denoted as H
1
, s expressed as one of the followng for the t test
comparng two populaton means,
1
and
2
:
H
1
:
1

2
(two-taled t test),
H
1
:
1
<
2
(one taled t test),
or
H
1
:
1
>
2
(one-taled t test).
If the H
1
s of the frst form, where we do not know n advance of the data collecton whch
mean wll be greater than the other, we wll perform a two-taled test, whch smply means that the
level of sgnfcance for whch we accept or reject the null hypothess wll be double that of the case
n whch we predct n advance of the experment that one mean wll be lower than the second based
on the physology or the engneerng process or other condtons affectng expermental outcome.
Another way to express ths s that wth a one-taled test, we wll have greater confdence n rejectng
or acceptng the null hypothess than wth the two-taled condton.
5.1.1.2 Applying the t Test
Now that we have stated our hypotheses, we are prepared to perform the t test. Gven two populatons
wth n
1
and n
2
samples, we may compare the two populaton means usng a t test. It s mportant to
remember the underlyng assumptons made n usng the t test to compare two populaton means:
The underlyng dstrbutons for both populatons are normal.
The varances of the two populatons are approxmately equal: s
1
2
= s
2
2
.
These are bg assumptons we make when n
1
and n
2
are small. If these assumptons are poor
for the data beng analyzed, we need to fnd dfferent statstcs to compare the two populatons.
1.
2.
STATISTICAL INFERENCE 43
Gven two sets of sampled data, x
i
and y
i
, the means for the two populatons or processes
refected n the sampled data can be compared usng the followng t statstc:
5.1.1.3 unpaired t Test
The unpared t statstc may be estmated usng:

T
x y
n S n S
n n n n
x y
=

( ) + ( )
+
+
1
2
2
2
1 2 1 2
1 1
2
1 1
,

where n
1
s the number of x
i
observatons, n
2
s the number of y
i
observatons, S
x
2
s the sample
varance of x
i
, S
y
2
= sample varance of y
i
, x
s the sample average for x

i
, and y
s the sample average

for y
i
.
Once the T statstc has been computed, we can compare our estmated T value to t values
gven n a table for the t dstrbuton. More specfcally, f we want to reject the null hypothess wth
1 a level of confdence, then we need to determne whether our estmated T value s greater than
the t value entry n the t table assocated wth a sgnfcance level of a (one-sded t test) or a/2 (two-
sded t test). In other words, we need to know the t value from the t dstrbuton for whch the area
under the t curve to the rght of t s a. Our estmated T must be greater than ths t to reject the null
hypothess wth 1 a level of confdence.
Thus, we compare our T value to the t dstrbuton table entry for
t(a, n
1
+ n
2
2) (one-sded)
or
t(a/2, n
1
+ n
2
2) (two-sded),
where a s the level of sgnfcance (equal to 1 level of confdence), and n
1
and n
2
are the number
of samples from each of the two populatons beng compared.
Note that a s the level of sgnfcance at whch we want to accept or reject our hypothess.
For most research, we reject the null hypothess when a 0.05. Ths corresponds to a confdence
level of 95%.
For example, f we want to reject our null hypothess wth a confdence level of 95%, then
for a one-sded t test, our estmated T must be greater than t (0.05, n
1
+ n
2
2) to reject H
0
and
accept H
1
wth 95% confdence or a sgnfcance level of 0.05 or less. Remember that our conf-
dence = (1 0.05) 100%. Ths test was for H
1
:
1
<
2
or
1
>
2
(one-sded). If our H
1
was H
1
:
1

2
, then our measured T must be greater than t (0.05/2, n
1
+ n
2
2) to reject the null hypothess
wth the same 95% confdence.
Thus, to test the followng alternatve hypotheses:
For H
1
:
1

2
: Use two-tal t test. For (a < 0.05), the estmated T > t (0.025, n
1
+ n
2
2)
to reject H
0
wth 95% confdence.
For H
1
:
1
<
2
or H
1
:
1
<
2
: Use one-tal t test. For (a < 0.05), estmated T > t (0.05, n
1
+
n
2
2) to reject H
0
wth 95% confdence.
a s also referred to as the type I error. For the t test, a s the probablty of observng a measured T
value greater than the table entry, t (a; df ) f the true means of two underlyng populatons, x and y,
were actually equal. In other words, there s no sgnfcant (a < 0.05) dfference n the two populaton
means, but the sampled data analyss led us to conclude that there s a dfference and thus reject the
null hypothess. Ths s referred to as a type I error.
Example 5.1 An example of a bomedcal engneerng challenge where we mght use the t test to
mprove the functon of medcal devces s n the development of mplantable defbrllators for the
detecton and treatment of abnormal heart rhythms [13]. Implantable defbrllators are small elec-
tronc devces that are placed n the chest and have thn wres that are placed n the chambers of the
heart. These wres have electrodes that detect the small electrcal current traversng the heart muscle.
Under normal condtons, these currents follow a very orderly pattern of conducton through the heart
muscle. An example of an electrogram, or record of electrcal actvty that s sensed by an electrode, un-
der normal condtons s gven n Fgure 5.1. When the heart does not functon normally and enters a
state of fbrllaton (Fgure 5.1), whereby the heart no longer contracts normally or pumps blood to the
rest of the body, the devce should shock the heart wth a large electrcal current from the devce n an
attempt to convert the heart rhythm back to normal. For a number of reasons, t s mportant that the
devce accurately detect the onset of lfe-threatenng arrhythmas, such as fbrllaton, and admnster
an approprate shock. To admnster a shock, the devce must use some sort of sgnal processng algo-
rthms to automatcally determne that the electrogram s abnormal and characterstc of fbrllaton.
One algorthm that s used n most devces for dfferentatng normal heart rhythms from
fbrllaton s a rate algorthm. Ths algorthm s bascally an ampltude threshold crossng algorthm
whereby the devce determnes how often the electrogram exceeds an ampltude threshold n a spec-
fed perod of tme and then estmates a rate from the detected threshold crossngs. Fgure 5.2 l-
lustrates how ths algorthm works.
Before such an algorthm was put nto mplantable defbrllators, researchers and developers
had to demonstrate whether the rate estmated by such a devce truly dffered between normal and
1.
2.
fbrllatory rhythms. It was mportant to have lttle overlap between the normal and fbrllatory
rhythms so that a rate threshold could be establshed, whereby rates that exceeded the threshold
would lead to the devce admnsterng a shock. For ths algorthm to work, there must be a sg-
nfcant dfference n rates between normal and fbrllatory rhythms and lttle overlap that would
lead to shocks beng admnstered napproprately and causng great pan or rsk the nducton
of fbrllaton. Overlap n rates between normal and fbrllatory rhythms could also result n the
devce mssng detecton of fbrllaton because of low rates.
To determne f rate s a good algorthm for detectng fbrllatory rhythms, nvestgators
mght actually record electrogram sgnals from actual patents who have demonstrated normal
or fbrllatory rhythms and use the rate algorthm to estmate a rate and then compare the mean
rates for normal rhythms aganst mean rates for fbrllatory rhythms. For example, let us assume
0 1 2 3 4
0
0.5
1
0
0.5
1
AFLUT
time (s)
amplitude
0 1 2 3 4
0
0.5
1
0
0.5
1
AF
time (s)
amplitude
FIguRE 5.1: Electrogram recordngs measured from electrodes placed nsde the left atrum of the
heart. For each rhythm, there are two electrogram recordngs taken from two dfferent locatons n the
atrum: atral futter (AFLUT) and atral fbrllaton (AF).
nvestgators collected 15-second electrogram recordngs for examples of fbrllatory (n
1
= 10) and
nonfbrllatory rhythms (n
2
= 11) n 21 patents. We note that the fbrllatory data were collected
from dfferent subjects than the normal data. Thus, we do not have blockng n ths expermental
desgn.
A rate was estmated, usng the devce algorthm, for each ndvdual 15-second recordng.
Fgure 5.3 shows a box-and-whsker plot for each of the two data sets.
The descrptve statstcs for the two data sets are gven by the followng table:
NoNFIBRILLAToRy
ELECTRogRAMS
(N
1
= 11)
FIBRILLAToRy
ELECTRogRAMS
(N
2
= 10)
Mean 96.82 239.0
Standard devaton 22.25 55.3
There are several thngs to note from the plots and descrptve statstcs. Frst, the sample sze
s small; thus, t s lkely that the varablty of electrogram recordngs from normal and fbrllatory
rhythms s not adequately represented. Moreover, the data are skewed and do not appear to be well
modeled by a normal dstrbuton. Fnally, the varablty n rates appears to be much greater for
fbrllatory rhythms than for normal rhythms. Thus, some of the assumptons we make regardng
FIguRE 5.2: A heart rate s estmated from an electrogram usng an ampltude threshold algorthm.
Whenever the ampltude of the electrogram sgnal (sold waveform) exceeds an ampltude threshold
(sold gray horzontal lne) wthn a specfc tme nterval (shaded vertcal bars), an event s detected. A
rate s calculated by countng the number of events n a certan tme perod.
the normalty of the populatons and the equalty of varances are probably volated n applyng the t
test. However for the purpose of llustraton, we wll perform a t test usng the unpared t statstc.
To fnd an estmated T statstc from our sample data, we plug n the means, varances, and
sample szes nto the equaton for the unpared T statstc. We turn the crank and fnd that for ths
example, the estmated t value s 7.59. If we look at our t tables for [11 + 10 2] degrees of freedom,
we can reject H
0
at a < 0.005 because our estmated T value exceeds the t table entres for a = 0.05
(t = 1.729), a = 0.01 (t = 2.539), and a = 0.005 (t = 2.861).
Thus, we reject the null hypothess wth (1 0.005) 100% confdence and state that mean
heart rate for fbrllatory rhythms s greater than mean heart rate for normal rhythms. Thus, the rate
algorthm should perform farly well n dfferentatng normal from fbrllatory rhythms. However,
we have only tested the populaton means. As one may see n Fgure 5.3, there s some overlap n
ndvdual samples between normal and fbrllatory rhythms. Thus, we mght expect the devce to
make errors n admnsterng shock napproprately when the heart s n normal but accelerated
rhythms (as n exercse), or the devce may fal to shock when the heart s fbrllatng but at a slow
rate or wth low ampltude.
In applyng the t test, t s mportant to note that you can never prove that two means are
equal. The statstcs can only show that a specfc test cannot fnd a dfference n the populaton
means, and, not fndng a dfference s not the same as provng equalty. The null or default hypoth-
ess s that there s no dfference n the means, regardless of what the true dfference s between the
two means. Not fndng a dfference wth the collected data and approprate statstcal test does not
f ibrillatory nonf ibrillatory
300
200
100
Rhythm Type
E
s
t
i
m
a
t
e
d

R
a
t
e

(
B
P
M
)
Estimated Rate
FIguRE 5.3: Box-and-whsker plot for rate data estmated from samples of nonfbrllatory and fbrl-
latory heart rhythms.
mean that the means are proved equal. Thus, we do not accept the null hypothess wth a level of
sgnfcance. We smply accept the null hypothess and do not accept wth a confdence or sgnf-
cance level. We only assgn a level of confdence when we reject the null hypothess.
5.1.1.4 Paired t Test
In the prevous example, we used an unpared t test because the two data sets came from two dffer-
ent, unrelated, groups of patents. The problem wth such an expermental desgn s that dfferences
n the two patent populatons may lead to dfferences n mean heart rate that have nothng to do
wth the actual heart rhythm but rather dfferences n the sze or age of the hearts between the two
groups of patents or some other dfference between patent groups. A better way to conduct the
prevous study s to collect our normal and fbrllatory heart data from one group of patents. A
sngle defbrllator wll only need to dfferentate normal and fbrllatory rhythms wthn a sngle
patent. By blockng on subjects we can elmnate the ntersubject varablty n electrogram charac-
terstcs that may plague the rate algorthm from separatng populatons. It may be more reasonable
to assume that rates for normal and fbrllatory rhythms dffer more wthn a patent than across
patents. In other words, for each patent, we collect an electrogram durng normal heart functon
and fbrllaton. In such expermental desgn, we would compare the means of the two data sets us-
ng a pared t test.
We use the pared t test when there s a natural parng between each observaton n data set,
X, wth each observaton n data set, Y. For example, we mght have the followng scenaros that
warrant a pared t test:
Collect blood pressure measurements from 8 patents before and after exercse.
Collect both MR and CT data from the same 20 patents to compare qualty of blood ves-
sel mages.
Collect computng tme for an mage processng algorthm before and after a change s
made n the software.
In such cases, the X and Y data sets are no longer ndependent. In the frst example above, because
we are collectng the before and after data from the same expermental unts, the physology and
bologcal envronment that mpacts blood pressure before exercse n each patents also affects
blood pressure after exercse. Thus, there are a number of varables that mpact blood pressure that
we cannot drectly control but whose effects on blood pressure (besde the exercse effect) can be
controlled by usng the same expermental unts (human subjects n ths case) for each data set. In
such cases, the expermental unts serve as ther own control. Ths s typcally the preferred exper-
mental desgn for comparng means.
1.
2.
3.
For the pared t test, we agan have a null hypothess and an alternatve hypothess as stated
above for the unpared t test. However, n a pared t test, we use a t test on the dfference, W
i
= X
i

Y
i
, between the pared data ponts from each of the two populatons.
We now calculate the pared T statstc: (n = number of pars)

,
/ n S
W
T
w
=

where W
s the average dfference of the dfferences, W

i
, and S
w
s the standard devaton of the
dfferences, W
i
.
As for the unpared t test, we now have an estmated T value that we can compare wth the
t values n the table for the t dstrbuton to determne f the estmated T value les n the extreme
values (greater than 95%) of the t dstrbuton.
To reject the null hypothess at a sgnfcance level of a (confdence level of 1 a), our est-
mated T value must be greater than t (a, n 1), where n s the number of pars of data.
If the estmated T value exceeds t (a, n 1), we reject H
0
and accept H
1
at a sgnfcance level
of a or a confdence level of 1 a.
Once agan, f H
1
:
1
<
2
or H
1
:
1
<
2
, we perform a one-sded test where our T statstc
must be greater than t (a, n 1) to reject the null hypothess at a level of a. If the null hypothess that
we begn wth before the experment s H
1
: (
1

2
), then we perform a two-sded test, and the T
statstc must be greater than t (a/2, n 1) to reject the null hypothess at a sgnfcance level of a.
5.1.1.5 Example of a Biomedical Engineering Challenge
In relaton to abnormal heart rhythms dscussed prevously, antarrhythmc drugs may be used to
slow or termnate an abnormal rhythm, such as fbrllaton. For example, a drug such as procan-
amde may be used to termnate atral fbrllaton. The exact mechansm whereby the drug leads to
termnaton of the rhythm s not exactly known, but t s thought that the drug changes the refrac-
tory perod and conducton velocty of the heart cells [8]. Bomedcal engneers often use sgnal
processng on the electrcal sgnals generated by the heart as an ndrect means for studyng the
underlyng physology. More specfcally, engneers may use spectral analyss or Fourer analyss to
look at changes n the spectral content of the electrcal sgnal, such as the electrogram, over tme.
Such changes may tell us somethng about the underlyng electrophysology.
For example, spectral analyss has been used to look at changes n the frequency spectrum
of atral fbrllaton wth drug admnstraton. In one such study [8], bomedcal engneers were
nterested n lookng at changes n medan frequency of atral electrograms after drug admnstra-
ton. Fgure 5.4 shows an example of the frequency spectrum for atral fbrllaton and the locaton
of the medan frequency, whch s the frequency that dvdes the power of the spectrum (area under
the spectral curve between 4 and 9 Hz) n half. One queston posed by the nvestgators s whether
medan frequency decreases after admnstraton of a drug such procanamde, whch s thought to
slow the electrcal actvty of the heart cells.
Thus, an experment was conducted to determne whether there was a sgnfcant dfference
n mean medan frequency between fbrllaton before drug admnstraton and fbrllaton after
drug admnstraton. Electrograms were collected n the rght atrum n 11 patents before and after
the drug was admnstered. Ffteen-second recordngs were evaluated for the frequency spectrum,
and the medan frequency n the 4- to 9-Hz frequency band was estmated before and after drug
admnstraton.
Fgure 5.5 llustrates the summary statstcs for medan frequency before and after drug ad-
mnstraton. The queston s whether there was a sgnfcant decrease n medan frequency after

0 5 10 15 20 25 30
0
0.2
0.4
0.6
0.8
1
1.2
frequency (Hz)
n
o
r
m
a
l
i
z
e
d

p
o
w
e
r
peak power
median frequency (49 Hz)
FIguRE 5.4: Frequency spectrum for an example of atral fbrllaton. Medan frequency s defned as
that frequency that dvdes the area under the power curve n the 4- to 9-Hz band n half.
Before Drug After Drug
3
4
5
6
7
E
s
t
i
m
a
t
e
d

M
F
Median Frequency
FIguRE 5.5: Box-and-whsker plot for medan frequency estmated from samples of atral fbrllaton
recorded before, and after a drug was admnstered.
BEFoRE dRug AFTER dRug wI
4.30 2.90 1.4
4.15 2.97 1.18
3.80 3.20 0.60
5.10 3.30 1.80
4.30 3.75 0.55
7.20 5.35 1.85
6.40 5.10 1.30
6.20 4.90 1.30
6.10 4.80 1.30
5.00 3.70 1.30
5.80 4.50 1.30
admnstraton of procanamde. In ths case, the null hypothess s that there s no change n mean
medan frequency after drug admnstraton. The alternatve hypothess s that mean medan fre-
quency decreases after drug admnstraton. Thus, we are comparng two means, and the two data
sets have been collected from one set of patents. We wll requre a pared t test to reject or accept
the null hypothess.
To perform a pared t test, we create another column, W
i
, as noted n the table above. We
fnd the followng for W
i
: W
= 1.262 and S
w
= 0.402. If we use these estmates for W
and S
w
n our
equaton for the pared T statstc, we fnd that T = 10.42 for n = 11 pars of data. If we compare
our estmated T value to the t dstrbuton, we fnd that our T value s greater than the table entry
for t (0.005, 11 1);therefore, we may reject H
0
at a sgnfcance less than 0.005. In other words, we
reject the null hypothess at the [1 0.005] 100% confdence level.
Errors in drawing Conclusions From Statistical Tests
When we perform statstcal analyss, such as the t test, there s a chance that we are mstaken n
rejectng or acceptng the null hypothess. When we draw conclusons from a statstcal analyss, we
typcally assgn a confdence level to our concluson. Ths means that there s always some chance
that our conclusons are ncorrect.
There are two types of errors that may occur when we draw conclusons from a statstcal
analyss. These errors are referred to as types I and II.
Type I errors:
also referred to as a false-postve error;
occurs when we accept H
1
when H
0
s true;
may result n a false dagnoss of dsease.
Type II errors:
also referred to as a false-negatve error;
occurs when we accept H
0
when H
1
s true;
may result n a mssed dagnoss (often more serous than a type I error).
If we thnk about the medcal envronment, a type I error mght occur when a person s gven a
dagnostc test to detect streptococcus bactera, and the test ndcates that the person has the strep-
tococcus bactera when n fact, the person does not have the bactera. The result of such an error
means that the person spends money for an antbotc that s servng no purpose.
A type II error would occur when the same person actually has the streptococcus bactera, but
the dagnostc test results n a negatve outcome, and we conclude that the person does not have the
streptococcus bactera. In ths example, ths type of error s more serous than the type I error because
the streptococcus bactera left untreated can lead to a number of serous complcatons for the body.
The a value that we refer to as the level of sgnfcance s also the probablty of a type I error.
Thus, the smaller the level of sgnfcance at whch we may reject the null hypothess, the smaller
the type I error and the lower the probablty of makng a type I error.
1.
2.
3.
1.
2.
3.
We typcally use to denote the type II error. We wll dscuss ths error further at the end of
chapter seven when we dscuss power tests.
5.2 CoMPARISoN oF Two vARIANCES
We used the t test to compare the means for two populatons or processes. Two populatons may
also be compared for dfferences n varance. As dscussed earler, populatons that are normally
dstrbuted are completely characterzed by ther mean and varance. Thus, f we want to test for
dfferences between two normal populatons, we need only compare ther two means and ther two
varances.
Fgure 5.6 llustrates the probablty densty functons for two normal populatons (black and
red traces). The four dagrams llustrate how two dfferent normally dstrbuted populatons may
compare wth each other. The rght two panels dffer from the left two panels n the means of the
populatons. The top two panels dffer from the bottom two panels n varance of the populatons.
Means Same Means Different
Variance
Same
Variances
Different
t - test
F - Test
FIguRE 5.6: Two normal populatons may dffer n ther means (top row), ther varances (left half ), or
both (bottom rght corner). t and F tests may be used to test for sgnfcant dfferences n the populaton
means and populaton varances, respectvely.
As ndcated across the top of the tracngs, a t test s used to test for dfferences n mean between
the two populatons. As ndcated along the vertcal drecton, an F test s used to test for sgnfcant
dfferences n the varances of the populatons. Note that two normal populatons may dffer sgnf-
cantly n both mean and varance.
To compare the varances of two populatons, we use what s referred to as an F test. As for
the t test, the F test assumes that the data consst of ndependent random samples from each of two
normal populatons. If the two populatons are not normally dstrbuted, the results of the F test
may be meanngless.
0 2 0 1 0
0
1000
2000
3000
Normalized Measure
F
r
e
q
u
e
n
c
y
F Distribution (f 10, 8)
0 1 2 3 4 5
0
500
1000
Normalized Measure
F
r
e
q
u
e
n
c
y
F Distribution (f 40,30)
FIguRE 5.7: Hstograms of samples drawn from two dfferent F dstrbutons. In the top panel, the
two degrees of freedom are 10 and 8. In the lower panel, the two degrees of freedom are 40 and 30.
TABLE 5.1: Values from the F dstrbuton for areas of a n the tal to the rght of
F (dn, dd, a)
dn
dd 1 2 3 4 5 10 11
1 161,
4052
200,
4999
216,
5403
225,
5625
230,
5764
242,
6056
243,
6082
2 18.51,
98.49
19.00,
99.01
19.16,
99.17
19.25,
99.25
19.30,
99.30
19.39,
99.40
19.40,
99.41
3 10.13,
34.12
9.55,
30.81
9.28,
29.46
9.12,
28.71
9.01,
28.24
8.78,
27.23
8.76,
27.13
4 7.71,
21.20
6.94,
18.00
6.59,
16.69
6.39,
15.98
6.26,
15.52
5.96,
14.54
5.93,
14.45
5 6.61,
16.26
5.79,
13.27
5.41,
12.06
5.19,
11.39
5.05,
10.97
4.74,
10.05
4.70,
9.96
10 4.96,
10.04
4.10,
7.56
3.71,
6.55
3.48,
5.99
3.33,
5.64
2.97,
4.85
2.94,
4.78
12 4.75,
9.33
3.88,
6.93
3.49,
5.95
3.26,
5.41
3.11,
5.06
2.76,
4.30
2.72,
4.22
15 4.54,
8.68
3.68,
6.36
3.29,
5.42
3.06,
4.89
2.90,
4.56
2.55,
3.80
2.51,
3.73
20 4.35,
8.10
3.49,
5.85
3.10,
4.94
2.87,
4.43
2.71,
4.10
2.35,
3.37
2.31,
3.30
50 4.03,
7.17
3.18,
5.06
2.79,
4.20
2.56,
3.72
2.40,
3.41
2.02,
2.70
1.98,
2.62
100 3.94,
6.90
3.09,
4.82
2.70,
3.98
2.46,
3.51
2.30,
3.20
1.92,
2.51
1.88,
2.43
200 3.89,
6.76
3.04,
4.71
2.65,
3.38
2.41,
3.41
2.26,
3.11
1.87,
2.41
1.83,
2.34
3.84,
6.64
2.99,
4.60
2.60,
3.78
2.37,
3.32
2.21,
3.02
1.83,
2.32
1.79,
2.24
dn = degrees of freedom for numerator; dd = degrees of freedom for denomnator; a = the area n
dstrbuton tal to rght of F (dn, dd, a) = 0.05 or 0.01.

dn
12 14 20 30 40 50 100 200

244,
6106
245,
6142
248,
6208
250,
6258
251,
6286
252,
6302
253,
6334
254,
6352
254,
6366
19.41,
99.42
19.42,
99.43
19.44,
99.45
19.46,
99.47
19.47,
99.48
19.47,
99.48
19.49,
99.49
19.49,
99.49
19.50,
99.50
8.74,
27.05
8.71,
26.92
8.66,
26.69
8.62,
26.50
8.60,
26.41
8.58,
26.30
8.56,
26.23
8.54,
26.18
8.53,
26.12
5.91,
14.37
5.87,
14.24
5.80,
14.02
5.74,
13.83
5.71,
13.74
5.70,
13.69
5.66,
13.57
5.65,
13.52
5.63,
13.46
4.68,
9.89
4.64,
9.77
4.56,
9.55
4.50,
9.38
4.46,
9.29
4.44,
9.24
4.40,
9.13
4.38,
9.07
4.36,
9.02
2.91,
4.71
2.86,
4.60
2.77,
4.41
2.70,
4.25
2.67,
4.17
2.64,
4.12
2.59,
4.01
2.56,
3.96
2.54,
3.91
2.69,
4.16
2.64,
4.05
2.54,
3.86
2.46,
3.70
2.42,
3.61
2.40,
3.56
2.35,
3.46
2.32,
3.41
2.30,
3.36
2.48,
3.67
2.43,
3.56
2.33,
3.36
2.25,
3.20
2.21,
3.12
2.18,
3.07
2.12,
2.97
2.10,
2.92
2.07,
2.87
2.28,
3.23
2.23,
3.13
2.12,
2.94
2.04,
2.77
1.99,
2.69
1.96,
2.63
1.90,
2.53
1.87,
2.47
1.84,
2.42
1.95,
2.56
1.90,
2.46
1.78,
2.26
1.69,
2.10
1.63,
2.00
1.60,
1.94
1.52,
1.82
1.48,
1.76
1.44,
1.68
1.85,
2.36
1.79,
2.26
1.68,
2.06
1.57,
1.89
1.51,
1.79
1.48,
1.73
1.39,
1.59
1.34,
1.51
1.28,
1.43
1.80,
2.28
1.74,
1.17
1.62,
1.97
1.52,
1.79
1.45,
1.69
1.42,
1.62
1.32,
1.48
1.26,
1.39
1.19,
1.28
1.75,
2.18
1.69,
2.07
1.57,
1.87
1.46,
1.69
1.40,
1.59
1.35,
1.52
1.24,
1.36
1.17,
1.25
1.00,
1.00
As wth the t test, the F test s used to test the followng hypotheses:
null hypothess: H
0
:
1
2
=
2
2
and
alternatve hypothess: H
1
:
1
2
>
2
2
,
where
1
2
and
2
2
are the varances of the two populatons.
To reject or accept the null hypothess, we compute the followng F statstc:

,
2
2
2
1
s
s
F =

where s
1
2
and s
2
2
are the sample varance estmates of the two populatons. The rato of two var-
ances from two normal populatons s also a random varable that follows an F dstrbuton. The
F dstrbuton s llustrated n Fgure 5.7. As wth the t dstrbuton, the F dstrbuton vares wth
two parameters, such as the samples szes of the two populatons. Table 5.1 shows a fracton of an
F table, where two degrees of freedom, dn and dd, are requred to locate an F value n the F table.
In ths fgure, the table entres are gven for sgnfcance levels (a values) of 0.05 and 0.01. The F
values assocated wth 0.05 and 0.01 sgnfcance levels are gven n regular type and boldface-talcs,
respectvely. Thus, for any two degrees of freedom, there are two F values provded, one for the 95%
confdence level and one for the 99% confdence level.
To make use of the F table wth the F test, we estmate an F statstc usng the sample var-
ance estmates from each of the two populatons we are tryng to compare. Note that for the use of
ths F table, the larger of the two varances should be put n the numerator of the equaton above.
We now compare our estmated F statstc to the entres n the F table assocated wth dn
and dd degrees of freedom and approprate confdence level (only the 95% and 99% F values are
provded n the table provded). dn = n
1
and dd = n
2
are the number of samples n each populaton,
wth n
1
beng the number of samples n the populaton wth varance placed n the numerator.
If we are to reject the null hypothess outlned above, our calculated F statstc must be > F (a,
dn, dd ) n table to reject H
0
wth confdence (1 a) 100%. The degrees of freedom, dn = n
1
, s
the value used to locate the table entry n the horzontal drecton (numerator), and dd = n
2
s the
degrees of freedom used to locate the table entry n the vertcal drecton (denomnator).
Example 5.2 F test
PoPuLATIoN A (N
1
= 9) PoPuLATIoN B (N
2
= 9)
Mean 0.026 0.027
Varance 2.0E 5 7.4E 5
1.
2.
Gven the data lsted n table above, where we have collected 9 samples from each of two popula-
tons, A and B, we can estmate our F statstc to be
F = (7.4E 5)/(2.0E 5) = 3.7.
Usng an F table that has table entres for both the 95% and 99% confdence levels, we fnd that the
table entry for a equal to 0.05 and 9 df for both populatons s
F(0.05, 9, 9) = 3.18.
Our estmated value of 3.7 exceeds the table entry of 3.18; thus, we may reject H
0
wth 95% conf-
dence and accept that populaton B has sgnfcantly greater varance than populaton A. However,
we note that the table entry for F (0.01, 9, 9) equals 5.35 and s greater than our estmated value.
Thus, we cannot reject the null hypothess wth 99% confdence.
5.3 CoMPARISoN oF THREE oR MoRE PoPuLATIoN MEANS
Thus far, we have dscussed statstcal analyss for comparng the means and varances (or stan-
dard devatons) for two populatons based on collectng two sets of data. These two sets of data
may be ndependent of each other or they may be dependent on each other, whch we refer to
as pared or blocked data. Another way to thnk of ths parng s to call t blockng on the ex-
permental unt, meanng that both sets of data were collected from the same expermental unts
but under dfferent condtons. For example, the expermental unts may be human subjects,
anmals, medcal nstruments, manufacturng lnes, cell cultures, electronc crcuts, and more. If,
for example, the expermental unt s human bengs, the two sets of data may be derved before
and after a drug has been admnstered or before and after the subjects have engaged n exercse.
In another example, we may measure two dfferent types of data, such as heart rate and blood
pressure, each from the same group of subjects. Ths s referred to as repeated measures because
we are takng multple sets of measures from the same expermental unts. In repeated measures,
somethng n the expermental condtons, expermental unt, or measure beng collected has
changed from one set of measures to the next, but the samplng s blocked by expermental
unt.
In many bomedcal engneerng applcatons, we need to compare the means from three or
more populatons, processes, or condtons. In such cases, we use a method of statstcal analyss
called analyss of varance, or ANOVA. Although the name mples that one s analyzng varances,
the conclusons that stem from such analyss are n regard to sgnfcant dfference n the means of
three or more populatons or processes.
5.3.1 one-Factor Experiments
We begn our dscussons of ANOVA by dscussng the desgn and analyss of one-factor exper-
ments. In such experments, we are drawng samples from three or more populatons for whch one
factor has been vared from one populaton to the next.
As stated n Chapter 2, expermental desgn should nclude randomzaton and blockng.
Randomzaton ensures that we do not ntroduce bas nto the data because of orderng effects n
collectng our data. Moreover, blockng helps us to reduce the effect of ntersubject varablty on
the dfferences between populatons.
Some bomedcal engneerng challenges that may requre the use of ANOVA nclude the
followng:
Compare wear tme between hp mplants of varous materals: In ths example, the treat-
ment = mplant materal (e.g., ttanum, steel, polymer resns).
Compare MR pulse sequences n the ablty to mage tssue damage after stroke: In ths
example, the treatment = MR pulse sequence.
Compare ablty of varous drugs to reduce hgh blood pressure: In ths example, the treat-
ment = drug.
5.3.1.1 Example of Biomedical Engineering Challenge
An example of a bomedcal engneerng problem that mght use ANOVA to test a hypothess s n
the study of refex mechansms n spnal cord njured patents [14]. One of the nterests of nves-
tgators s how hp fexon (torque) s dependent on ankle movement n patents wth spnal chord
njury. In ths example, there are actually two expermental factors that may affect hp fexon:
Range of ankle moton: n ths case, the treatment = range of ankle extenson.
Speed of ankle fexon: n ths case, the treatment = speed of ankle extenson.
We mght evaluate one factor at a tme usng a one-factor or one-way ANOVA. Or, we mght evalu-
ate the mpact of both factors on mean hp fexon usng two-factor or two-way ANOVA. In ether
case, we have a null hypothess and alternatve hypotheses for the effect of each factor, such as range
of ankle moton, on the populaton mean, such as hp fexon. Our null hypothess s that there s no
sgnfcant dfference n mean hp fexon across ankle moton,
H
0
:
1
=
2
=
3
= . =
n
.
The alternatve hypothess, H
1
, states that at least two of the populaton means, such as hp
fexon for two dfferent ankle motons, dffer sgnfcantly from each other.
1.
2.
3.
1.
2.
The basc desgn of the one-factor experment s gven n Table 5.2. In ths case, there are k
treatments n one factor. k s the number of treatments, y
ij
s the ndvdual j sample or data pont
for the ith treatment, y
i
s the sample mean for the ith treatment, and
yi
2
s the sample varance for
the ith treatment (adapted from [3]).
For example, we mght be nterested n comparng weght loss for three dfferent types of det
plls. In ths case, k = 3. Under the weght-loss column, one wll fnd the weght loss of ndvdual
subjects who used one of the three det plls. Note that n ths example, the number of samples vares
wth each drug type. In a balanced experment, there should be an equal number of samples for each
type of treatment. In the thrd and fourth columns, we have the sample mean and standard devaton
for weght loss for each of the det plls. The queston we are tryng to answer s whether there s a
sgnfcant dfference n mean weght loss as a functon of det pll type.
dIET PILL wEIgHT LoSS MEAN STANdARd dEvIATIoN
Placebo 10, 12, 5, 8, 5, 20 10.0 5.62
Drug B 30, 5, 12, 20 16.75 10.75
Drug C 2, 10, 5, 5, 10, 20, 25, 40 14.63 12.92
We make a number of assumptons when usng ANOVA to compare dfferences n means
between three or more populatons or processes:
TABLE 5.2: One-factor experment
TREATMENT oBSERvATIoNS SAMPLE MEAN SAMPLE vARIANCE
1 y
11
, y
22
,, y
1n
y
1

2
y
1
2 y
21
, y
22
,, y
2m
y
2

2
y
2
.
i y
i1
, y
i2
,, y
p
y
k

2
y
i
.
k y
k1
, y
k2
,y
kq
y
k

2
y
k
Subjects are randomly assgned to a specfc treatment (n ths case det pll).
The populatons or processes (such as weght loss) are approxmately normally dstrbuted.
Varance s approxmately equal across treatment groups.
In ths specfc weght-loss example, no blockng s used. In other words, n no case dd the
same subject receve more than one treatment or det pll. Ths may end up beng problematc and
lead to erroneous conclusons because we are not accountng for ntersubject varablty n response
to the det drug. For example, we mght assume that the amount of weght loss s related to startng
weght. It s possble that the subjects assgned to drug A all had startng weghts less than the subjects
gven drug B. Thus, dfferences n the weght loss may be sgnfcant but may have lttle to do wth
the actual det pll. The dfference n weght loss may be because of the dfferences n startng weght
rather than the det pll. Because the above experment does not use blockng, ntersubject varablty
that s not accounted for may confound our conclusons regardng the effectveness of det pll.
The queston we are attemptng to address n usng ANOVA s, Are the means of the k treat-
ments equal, or have our samples been drawn from treatments (populatons) wth dfferent means?
Wthn each treatment, i (or populaton, i), we assume that the varablty across observed
samples, Y
ij
, s nfuenced by the populaton (treatment) mean,
i
, and ndependent random var-
ables, e
ij
, that are normally dstrbuted, wth a mean of zero and varance
2
. In other words, the
samples, Y
ij
, collected for each tral, j, wthn each treatment, i, can be expressed as
Y
ij
=
i
+ e
ij
.
Wth our model, we are tryng to determne how much of the varablty n Y s because of
the factor or treatment (populaton wth mean,
i
), and how much s because of random effects, e
ij
,
whch we cannot control for or have not captured n the model provded above.
When we perform an ANOVA for a one-factor experment, we can organze the analyss and
results n the followng table (Table 5.3):
1.
2.
3.
TABLE 5.3: One-factor ANOVA wth k treatments (no blockng)
SouRCE df SS MS F
Treatment k 1 MS
treat
/MS
error
Error N k
N = total number of samples across all treatments; k = number of treatments wthn the one
factor; F = the statstc that we wll compare wth our F tables (F dstrbuton) to ether reject
or accept the null hypothess; SS
treatment
= between-treatment sum of squares, a measure of the
varablty among treatment means; SS
error
= wthn-treatment sum of squares, a measure of the
sum of varances across all k treatments; MS
treatment
= SS
treatment
/k 1; MS
error
= SS
error
/N k.
For k treatments, the specfc equatons for the SS elements are the followng:
SS ( )
treatment
grand
=
=
n y y
i
i
i
k
2
1
and
SS ( ) .
error
= =
= = =
Y Y
ij
i
j
n
i
k
yi
i
k i
2
1 1
2
1

grand Y s the sample mean for all samples across all treatments combned.
Fnally, the F statstc that we are most nterested n estmatng to reject or accept our null
hypothess s
F = MS
treatment
/MS
error
.
To test our hypothess at a sgnfcance level of a, we now compare our estmated F sta-
tstc to the F dstrbuton provded n the F tables for the table entry, F (a; k 1, N k). If our
estmated F value s greater than the table entry, we may reject the null hypothess wth (1 a)
100% confdence.
Example 5.3 A heart valve manufacturer has three dfferent processes for producng a leaf
valve. Random samples of 50 valves were selected fve tmes from each type of manufacturng
process. Each valve was tested for defectve openng mechansms. The number of defectve valves n
each sample of 50 valves s summarzed n the followng table:
PRoCESS A PRoCESS B PRoCESS C
1 5 3
4 8 1
3 6 1
7 9 4
5 10 0
Usng an a = 0.05, we want to determne whether the mean number of defects dffers between
processes (treatment). (Our null hypothess, H
0
, s that mean number of defects s the same across
processes.)
To answer the queston, we need to complete the followng ANOVA table.
One-way ANOVA: processes A, B, and C
SouRCE df SS MS f
Factor 2
Error 12
Total 14
We frst fnd SS
factor
(= SS
treatment
).
We know that

SS
treatment
grand
=
=
n y y
i
i
i
k
( ) .
2
1

From the data gven, we know that there are three treatments, A, B, and C; thus, k = 3. We
wll assgn A = 1, B = 2, and C = 3. We also know that n
1
= n
2
= n
3
= 5. We can estmate the sample
mean for each treatment or process to obtan,
y
1
= 4,
y
2
= 7.6,
y
3
= 1.8. We can also use all 15 samples
to fnd
y
grand
= 4.47.
Now, we can use these estmates n the equaton for SS
treatment
:
SS
treatment
= 5(4 4.47)
2
+ 5(7.6 4.47)
2
+ 5(1.8 4.47)
2
= 85.70.
Now we solve for SS
error
:
Gven that

SS ( ) ,
error
=
= =
Y Y
ij
i
j
n
i
k i
2
1 1

we need to estmate the nner summaton for each treatment, i, noted by the outer summaton.
Thus, for i = 1,
SS
1
= (1 4)
2
+ (4 4)
2
+ (3 4)
2
+ (7 4)
2
+ (5 4)
2
= 20;
for i = 2, SS
2
= (5 7.6)
2
+ (8 7.6)
2
+ (6 7.6)
2
+ (9 7.6)
2
+ (10 7.6)
2
= 17.2;
for i = 3, SS
3
= (3 1.8)
2
+ (1 1.8)
2
+ (1 1.8)
2
+ (4 1.8)
2
+ (0 1.8)
2
= 10.8.
Now, summng over all k treatments, where k = 3, we fnd
SS
error
= SS
1
+ SS
2
+ SS
3
= 48.0.
Now, we can fnd the mean squared errors:
MS
treatment
= SS
treatment
/DF
treatment
= 87.5/2 = 43.7.
MS
error
= SS
error
/DF
error
= 48.0/12.0 = 4.
Fnally, our F statstc s gven by
F = MS
treatment
/MS
error
= 43.7/4 = 10.92.
We now complete our ANOVA table.
One-way ANOVA: processes A, B, and C
SouRCE df SS MS f
Factor 2 87.5 43.7 10.92
Error 12 48.0 4.0
Now, we want to determne f the mean number of defects dffer among processes A, B, and
C. Our null hypothess s that the mean number of defects does not dffer among processes. To re-
ject ths hypothess at the 95% confdence level (a = 0.05), our estmated F statstc must be greater
than F(0.05, 2, 12) = 3.88, found n the F dstrbuton tables. Our F statstc = 10.92 > 3.88; thus,
we can reject the null hypothess wth 95% confdence and conclude that the number of defects does
vary wth manufacturng process and that our samples for treatments A, B, and C were drawn from
populatons wth dfferent means. In fact, the F dstrbuton table lst a value for F(0.01, 2, 12) =
6.93. Thus, we can reject the null hypothess at the 99% confdence level as well.
Example 5.4 Four types of MR scanners are beng evaluated for speed of mage acquston. The
followng table summarzes the speeds measured (n mnutes) for three samples of each scanner
type.
SCANNER A SCANNER B SCANNER C SCANNER d
2.0 4.0 3.0 6.0
1.8 4.5 2.5 5.5
2.7 5.5 2.0 3.5
Does the mean speed of mage acquston dffer amongst the four scanner types? (Use a = 0.01.)
To answer ths queston, we may work through the calculatons that we performed n Exam-
ple 5.3. However, as one dscovered n the prevous example, as the sample sze grows, the number
of calculatons quckly grows. Thus, most nvestgators wll use a statstcal software package to per-
form the ANOVA calculatons. For ths example, we used the Mntab software package (Mntab
Statstcal Software, Release 13.32, Mntab, 2000) to perform the ANOVA. ANOVA produces the
followng table:
ANOVA for speed
SouRCE df SS MS f
a
Factor 3 19.083 6.361 9.07 0.006
Error 8 5.613 0.702
Total 11 24.697
The F statstc of 9.07 results n an area wth a = 0.006 n the rght tal of the F dstrbuton.
Because a < 0.01, the value of a at whch we were testng our hypothess, we can reject the null
hypothess and accept that alternatve hypothess that at least one of the scanners dffers from the
remanng scanners n the mean speed of acquston.
In the examples gven prevously, we dd not use blockng n the expermental desgn. In other
words, the populatons from whch we collected samples dffered from treatment to treatment. In
some experments, such as testng weght loss for det plls, t s not practcal or possble to test more
than one type of treatment on the same expermental unt.
However, when blockng may be used, t should be used to compare treatments to reduce
ntersubject varablty or dfferences (that cannot be controlled for) n the expermental outcome.
Table 5.4 outlnes the expermental desgn for a one-factor experment that makes use of blockng.
In ths expermental desgn, all expermental unts are subject to the every treatment. The result s
that we now have treatment means and block means. Thus, when we perform the ANOVA analyss,
we may test two dfferent sets of hypotheses. The frst null hypothess s that there s not sgnfcance
dfference n treatment means. The second null hypothess s that there s no sgnfcant dfference
n means across subjects or expermental unts (ntersubject varablty s not sgnfcant).
In ths example, treatment could be ankle speed, and the block would be the patent, such
that each ankle speed s tested on each and every patent subject. In such a desgn, t s mportant to
randomze the order n whch dfferent ankle speeds are tested so that there s no bas n hp fexon
due to orderng effects of ankle speed or machne wear.
Note that the example above s also a balanced expermental desgn because there are same
numbers of data ponts n every cell of the table.
Wthn each treatment, i (or populaton, i), we now assume that the varablty across ob-
served samples, Y
ij
, s nfuenced by the populaton (treatment) mean,
i
, the block effects,
j
, and
ndependent random varables, e
ij
, whch are normally dstrbuted wth a mean of zero and varance

2
. In other words, the samples, Y
ij
, collected for each tral, j, wthn each treatment, i, can be ex-
pressed as
Y
ij
=
i
+
j
+ e
ij
.
In other words, we are tryng to determne how much of the varablty n Y s because of
the factor or treatment (populaton wth mean,
i
) and how much s due to block effects,
j
, and
random effects, e
ij
, that we cannot control for or have not captured n the model provded above. In
such a model, we assume that treatment effects and block effects are addtve. Ths s not a good as-
sumpton when there are nteracton effects between treatment and block. Interacton effects mean
TABLE 5.4: One factor wth k treatment and blockng
TREATMENT
(ANKLE SPEEd)
SuBJECT NuMBER TREATMENT
MEAN
1 2 3 4
10 5 2 5 6
20 10 2 8 4
30 15 8 15 10
Block mean
that the effect of a specfc treatment may depend on a specfc block. When we address two-factor
experments n the next secton, we wll dscuss further these nteracton effects.
We can summarze the ANOVA for one factor wth block wth the followng ANOVA table
(assumng no nteracton effects) (Table 5.5; adapted from [3]).
Note that we now have two estmated F values to consder. F
treatment
s compared wth the
table entry for F (a; k 1, (b 1)(k 1)), and the null hypothess statng that there s no dfference
n mean hp fexon across treatments s rejected f the estmated F value s greater than the table
entry.
F
block
s compared wth the table entry for F(a; b 1, (b 1)(k 1)), and the null hypothess
statng that there s no dfference n mean hp fexon across subjects s rejected f the estmated F
statstc s greater than the table entry.
Example 5.5 One-factor experment wth block
HeartSync manufactures four types of defbrllators that dffer n the strength of the electrcal shock
gven for an epsode of fbrllaton. A total of 280,000 patents were dvded nto four groups of
70,000 patents each. Each group was assgned to one of the four defbrllators, and the number
of shocks that faled to defbrllate was recorded for four consecutve years. The results were as
follows:
yEAR AFTER
IMPLANT
dEvICE A dEvICE B dEvICE C dEvICE d
1 6 1 9 2
2 8 1 10 2
3 5 3 8 0
4 10 2 11 5
TABLE 5.5: One-factor ANOVA wth block
SouRCE df SS MS F
Treatment k 1 MS
treat
/MS
error
Block b 1 MS
block
/MS
error
Error (b 1)(k 1)
There are two questons we wsh to address wth these data:
Usng a = 0.01, does the mean number of falures dffer sgnfcantly as a functon of devce
type?
Usng a = 0.01, does the mean number of falures dffer sgnfcantly as a functon of year
after mplant? In other words, s year after mplant a major source of varablty between
populatons?
Agan, rather than estmate the calculatons by hand, we may use statstcal software, such as
Mntab, to obtan the followng results usng ANOVA wth block:
ANOVA for number of falures
SouRCE df SS MS F
Devce 3 173.188 57.729 35.67
Year 3 20.688 6.896 4.26
Error 9 14.563 1.618
Total 15 208.438
The F statstc for devce, F = 35.67, s greater than the crtcal F value, F(0.01, 3, 9) = 6.99, gven
n the F table. Thus, we can reject the null hypothess and accept that alternatve hypothess that at
least one of the devces dffers from the remanng devces n the mean number of falure. The sec-
ond part of the questons tests the hypothess that the mean number of falures dffers sgnfcantly
as a functon of year after mplant (the blockng factor). The F statstc for year, F = 4.26, s less than
the crtcal F value, F(0.01, 3, 9) = 6.99, gven n the F table; thus, we accept our null hypothess,
whch means falure rate does not dffer between years after mplant.
5.3.2 Two-Factor Experiments
In the orgnal bomedcal engneerng challenge descrbng hp fexon refexes, we dscussed two
factors that nfuence hp fexon: ankle speed and range of ankle extenson.
In a two-factor experment leadng to a two-way ANOVA, there are two factors beng vared,
A and B, where A has a treatments and B has b treatments, and there are n samples at every com-
bnaton of A and B.
The two-factor experment s sad to be completely crossed f there are samples collected for
every combnaton of factors A and B. In addton, the experment s sad to be balanced f we have
same number of samples for every combnaton of factors A and B.
1.
2.
Table 5.6 below llustrates a two-factor experment that s completely balanced and crossed.
Each of the two treatments wthn factor A s crossed wth each of the three treatments wthn fac-
tor B. Also note that for each combnaton of A and B, we have three samples.
The queston that we are tryng to address wth a two-factor ANOVA s whether there are
dfferences n treatment means for each of the two factors. In addton, we wsh to know f there
are nteracton effects such that there are sgnfcant dfferences n means as a functon of the cross-
nteracton between factors. In other words, there are sgnfcant dfferences n sample means, and
hence, the means of the underlyng populatons, when specfc combnatons of factors A and B
occur together.
Once the data have been collected, as llustrated n Table 5.6 above, we may perform a two-
factor ANOVA to test the followng three null hypotheses:
H
0
:
A1
=
A2
=
A3
= . =
Aa
;
H
0
:
B1
=
B2
=
B3
= . =
Bb
;
H
0
:
A1B1
=
A1B2
=
A1B3
=
A2B1
=
A2B2
= =
AaBb
.
For each of the three null hypotheses, the assocated alternatve hypothess s that there s a
sgnfcant dfference n at least two of the populaton means for a gven factor or combnaton of
factors.
The analyss and results of a two-factor ANOVA may be organzed as n Table 5.7 [3].
The equatons for the SS and MS for each of the factors and nteracton factors are beyond
the scope of ths text but may be found n [3]. In practce, the nvestgator wll use a popular statst-
cal software package such as Mntab, SPSS, or SAS to estmate these SS and MS values (because of
computatonal burden) and smply refer to the F statstcs to reject or accept the null hypotheses.
TABLE 5.6: Two-factor experment
FACToR B
Factor A 1 2 3
1 1.2, 1.4, 2.1 2.3, 2.2, 2.6 6.4, 5.8, 3.2
2 3.2, 4.1, 3.6 4.1, 4.3, 4.0 8.2, 7.8, 8.3
We note that there are three F statstcs to test each of the three null hypotheses descrbed
earler. In each case, we compare our estmated F statstc wth the F values n our F table represent-
ng the F dstrbuton. More specfcally, we compare our estmates F values to the followng table
entres:
To test H
0
:
A1
=
A2
=
A3
= . =
Aa
, compare F
A
wth F (a; a 1, ab(n 1));
To test H
0
:
B1
=
B2
=
B3
= . =
Bb
, compare F
B
wth F (a; b 1, ab(n 1));
To test H
0
:
A1B1
=
A1B2
=
A1B3
=
A2B1
=
A2B2
= =
AaBb
,
compare F
AB
wth F (a; (a 1)(b 1), ab(n 1)).
In each of the three tests, f the estmated F statstc s greater than the table entres for the F
dstrbuton, one may reject the null hypothess and accept the alternatve hypothess wth (1 a)
100% confdence.
Example 5.6 An example of a two-factor experment that wll be evaluated usng two-factor
ANOVA occurs when bomedcal engneers are lookng at the effectveness of rehabltatve therapy
and pharmacologcal therapy on the recovery of movement n a lmb after stroke. Detals of the
expermental desgn nclude the followng:
Factor T: therapy used (there are three types of therapes, T1, T2 and T3);
Factor D: drug used (there are three types of Drugs, D1, D2 and D3);
36 patents are randomly assgned to each combnaton of T and D;
measure: number of days to meet recovery crtera.
1.
2.
3.
4.
TABLE 5.7: Two-factor ANOVA table (each wth multple treatments)
SouRCE df SS MS F
A a 1 MS
A
/MS
error
B b 1 MS
B
/MS
error
AB (a 1)(b 1) MS
AB
/MS
error
Error ab(n 1)
Expermental desgn for two factors
T1 T2 T3
D1 D2 D3 D1 D2 D3 D1 D2 D3
20 25 13 22 8 16 9 15 7
15 16 12 16 10 19 12 10 10
18 10 22 17 9 11 8 9 9
24 20 10 12 11 21 8 10 9
The questons we are tryng to address s whether there s a sgnfcant dfference n mean
days of recovery for the three types of rehabltatve therapy, the mean days of recovery for the three
types of drug therapy, and dfferences n mean days of recovery for the nne combnatons of reha-
bltatve therapy and drug therapy (nteracton effects).
ANOVA analyss performed usng statstcal software, known as Mntab, produces the fol-
lowng table summarzng the ANOVA analyss. Note that there are three estmated F statstcs. We
can use the three F values to test our hypotheses:
H
0
:
T1
=
T2
=
T3
;
H
0
:
D1
=
D2
=
D3
;
H
0
:
T1D1
=
T1D2
=
T1D3
=
T2B1
=
T2B2
= =
T3D3
.
Two-way ANOVA for days of recovery versus T and D
SouRCE df SS MS F
T 2 337.4 168.7 11.44 0.000
D 2 36.2 18.1 1.23 0.309
Interacton 4 167.8 41.9 2.84 0.043
Error 27 398.3 14.8
Note that the estmated F statstc for rehabltatve therapy, F = 11.44, does exceed the crtcal
value for a < 0.05 (actually, a s <0.0001); thus, we conclude that the samples for the three dfferent
rehabltatve therapes represent populatons wth dfferent means. In other words, the dfferent
rehabltatve therapes produce dfferent days of recovery. However, the estmated F statstc for the
drug therapy does not exceed the crtcal table F value for a = 0.05; thus, we accept the null hypoth-
ess that drug therapy does not sgnfcantly affect the days of recovery and that the samples are not
drawn from populatons wth dfferent means. Moreover, the thrd F statstc, F = 2.84, s greater
than the table F value for a < 0.05, whch suggests that there may be a sgnfcant dfference n days
of recovery due to an nteracton between rehabltatve therapy and drug therapy.
5.3.3 Tukeys Multiple Comparison Procedure
Once we have establshed that there s a sgnfcant dfference n means across treatments wthn
a factor, we may use post hoc tests, such as the Tukeys HSD multcomparson parwse test [3, 9].
ANOVA smply shows that there s at least one treatment mean that dffers from the others. However,
ANOVA does not provde nformaton on specfcally whch treatment mean(s) dffers from whch
treatment mean(s). The Tukeys HSD test allows us to compare the statstcal dfferences n means
between all pars of treatments. For a one-factor experment wth k treatments, there are k(k 1) /2
parwse comparsons to test. The mportant pont to note s that f a s the probablty of a type I
error for one comparson, the probablty of makng at least a type I error for multple comparsons s
much greater. So, f we want (1 a) 100% confdence for all possble parwse comparsons, we must
start wth a much smaller a. Tukeys multple comparson procedure allows for such an adjustment n
sgnfcance when performng parwse comparsons.

75
Oftentmes, n bomedcal engneerng research or desgn, we are nterested n whether there s a
correlaton between two varables, populatons, or processes. These correlatons may gve us nfor-
maton about the underlyng bologcal processes n normal and pathologc states and ultmately
help us to model the processes, allowng us to predct the behavor of one process gven the state of
another correlated process.
Gven two sets of samples, X and Y, we ask the queston, Are two varables or random pro-
cesses, X and Y, correlated? In other words, can y be modeled as a lnear functon of x such that
y = mx + b.
Look at the followng graph of expermental data (Fgure 6.1), where we have plotted the
data set, y
i
, aganst the data set, x
i
:
We note that the data tend to fall on a straght lne. There tends to be a trend such that y n-
creases n proporton to ncreases n x. Our goal s to determne the lne (the lnear model) that best
fts these data and how close the measured data ponts le wth respect to the ftted lne (generated
by the model). In other words, f the modeled lne s a good ft to the data, we are demonstratng
that y may be accurately modeled as a lnear functon of x, and thus, we may predct y gven x usng
the lnear model.
The key to fttng a lne that best predcts process y from process x, s to fnd the parameters
m and b, whch mnmze the error between model and actual data n a least-squares sense:
mn [(y y)
2
].
In other words, as llustrated n Fgure 6.2, for each measured value of the ndependent var-
able, x, there wll be the measured value of the dependent varable, y, as well as the predcted or
C H A P T E R 6
Linear Regression and
Correlation Analysis
FIguRE 6.2: Lnear regresson can be used to estmate a straght lne that best fts the measured data
ponts (flled crcles). In ths llustraton, x
and y
represent the measured ndependent and dependent

varables, respectvely. Lnear regresson s used to model the dependent varable, y, as a lnear functon of
the ndependent varable, x. The straght lne passng through the measured data ponts s the result of
lnear regresson whereby the error, e
, between the predcted value (open crcles) of the dependent var-

able, y
, and the measured value of the dependent varable, y
, s mnmzed over all data ponts.

20 10 0
10
5
0
Independent Variable, x
i
D
e
p
e
n
d
e
n
t

V
a
r
i
a
b
l
e
,

y
i
r = 0.67
Dependent = 2.96667 + 0.307576 Independent
95% CI
Regression
Regression Plot
FIguRE 6.1: Results of lnear regresson appled to the samples llustrated n the scatterplot (black
dots). The sold black lne llustrates the lne of best ft (model parameters lsted above the graph) as
determned by lnear regresson. The red dotted curves llustrate the confdence nterval for the slope.
Fnally, the r value s the correlaton coeffcent.
LINEAR REgRESSIoN ANd CoRRELATIoN ANALySIS 77
modeled value of y, denoted as , that one would obtan f the equaton y = mx + b s used to predct
y. The equaton e
i
= y
i
y
i
denotes the errors that occur at each par of (x
i
,y
i
) when the modeled value
does not exactly algn wth the predcted value because of factors not accounted for n the model
(nose, random effects, and nonlneartes).
In tryng to ft a lne to the expermental data, our goal s to mnmze these e
i
between the
measured and predcted values of the dependent varable, y. The method used n lnear regresson
and many other bomedcal modelng technques s to fnd model parameters, such as m and b, that
mnmze the sum of squared errors, e
i
2
.
For lnear regresson, we seek a least-squares estmate of m and usng the followng
approach:
Suppose we have N samples each of processes x and y. We try to predct y from measured x
usng the followng model:
y = mx + b.
The error n predcton at each data pont, x
i
, s
error
i
= y
i
y
i
.
In the least-squares method, we choose m and b to mnmze the sum of squared errors:

(y
i
y
i
)
2
for i = 1 to N.
To fnd a closed form soluton for m and b, we can wrte an expresson for the sum of squared
errors:
i
N
=1

e
i
2
=
i
N
=1

(y
i
y
i
)
2
.
We then replace y
i
wth (mx
i
+ b), our model, and carry out the squarng operatons [3, 5].
We can then take dervatves of the above expresson wth respect to m and then agan wth re-
spect to b. If we set the dervatve expressons to zero to fnd the mnmums, we wll have two equa-
tons n two unknowns, m and b, and we can smply use algebra to solve for the unknown parameters,
m and b. We wll get the followng expressons for m and b n terms of the measured x
i
and y
i
:
m
x y x y N
x x
i i i
i
N
i
i
N
i
N
i i
i
N
=
0
1
0
1
0
1
2
0
1
/
=
2
0
1
/ N
i
N

and
b y mx = ,
where

x
=
1
N
i=0
N-1
x
i
and y
i
y
1
N
=
i=0
N-1
.
Hence, once we have our measured data, we can smply use our equatons for m and b to fnd
the lne, or lnear model, of best ft.
The Correlation Coeffcient
It s mportant to realze that lnear regresson wll ft a lne to any two sets of data regardless of how
well the data are modeled by a lnear model. Even f the data, when plotted as a scatterplot, look
nothng lke a lne, lnear regresson wll ft a lne to the data. As bomedcal engneers, we have to
ask, How well does the measured data ft the lne estmated through lnear regresson?
One measure of how well the expermental data ft the lnear model s the correlaton coef-
fcent. The correlaton coeffcent, r, has a value between 1 and 1 and ndcates how well the lnear
model fts to the data.
The correlaton coeffcent, r, may be estmated from the expermental data, x
i
and y
i
, usng
the followng equaton:
r
x x y y
x x y y
i
i
N
i
i i
i
N
i
N
=
( )

( )
( )

( )
=
0
1
2 2
0
1
0
1
1 2 /
,,

[ ]
where

x
=
1
N
i=0
N-1
x
i
and y
i
y
1
N
=
i=0
N-1
.
It s mportant to note that an r = 0 does not mean that the two processes, x and y, are nde-
pendent. It smply ndcates that any dependency between x and y s not well descrbed or modeled
by a lnear relaton. There could be a nonlnear relaton between x and y. An r = 0 smply means
LINEAR REgRESSIoN ANd CoRRELATIoN ANALySIS 79
that x and y are uncorrelated n a lnear sense. That s, one may not predct y from x usng a lnear
model, y = mx + b.
A measure related to the correlaton coeffcent, r, s the coeffcent of determnaton, R
2
,
whch s a summary statstc that tells us how well our regresson model fts our data. R
2
can be used
as measure of goodness of ft for any regresson model, not just lnear regresson. For lnear regres-
son, R
2
s the square of the correlaton coeffcent and has a value between 0 and 1. The coeffcent
of determnaton tells us how much of the varablty n the data may be explaned by the model
parameters as a fracton of total varablty n the data.
It s mportant to realze that the estmated slope of best ft and the correlaton coeffcent
are statstcs that may or may not be sgnfcant. Thus, t tests may be performed to test f the slope
estmated through lnear s sgnfcantly dfferent from zero [3]. Lkewse, t tests may be performed
to test f the correlaton coeffcent s sgnfcantly dfferent from zero. Fnally, we may also compute
confdence ntervals for the estmated slope [3].

81
Up to ths pont, we have dscussed mportant aspects of expermental desgn, data summary, and
statstcal analyss that wll allow us to test hypotheses and draw conclusons wth some level of
confdence.
However, we have not yet addressed a very mportant queston. The queston we ask now s,
how large should my sample be to capture the varablty n my underlyng populaton so that my
types I and II error rates wll be small? In other words, how large of a sample s requred such that
the probablty of makng a type I or II error n rejectng or acceptng a null hypothess wll be ac-
ceptable under the crcumstances. Dfferent stuatons call for dfferent error rates. An error such as
dagnosng streptococcal throat nfecton when streptococcus bactera are not present s lkely not as
serous as mssng the dagnoss of cancer. Another way of phrasng the queston s, How powerful
s my statstcal analyss n acceptng or rejectng the null hypothess?
If the sample sze s too small, the consequence may be that we mss an effect (type II error)
because we do not have enough power n the test to demonstrate, wth confdence, an effect.
However, when choosng a sample sze, t s too easy to smply say that the sample sze should
be as large as possble. Even f an nvestgator had access to as many samples as he or she desred,
there are practcal consderatons and constrants that lmt the sample sze. If the sample sze s too
large, there are economc and ethcal problems to consder. Frst, there are expenses assocated wth
runnng an experment, such as a clncal tral. There are costs assocated wth the personnel who
run the experments, the expermental unts (anmals, cell cultures, compensaton for human tme),
perhaps drugs and other medcal procedures that are admnstered, and others. Thus, the greater the
number of samples, the greater the expense. Clncal trals are typcally very expensve to run.
The second consderaton for lmtng sample sze s an ethcal concern. Many bomedcal-
related experments or trals nvolve human or anmal subjects. These subjects may be exposed to
expermental drugs or therapes that nvolve some rsk, and n the case of anmal studes, the anmal
may be sacrfced at the end of an experment. Bottom lne, we do not wsh to use human or anmal
subjects for no good reason, especally f we gan nothng n terms of the power of our statstcal
analyss by ncreasng the sample sze.
C H A P T E R 7
Power Analysis and Sample Size
We recall agan that there are two types of errors assocated wth performng a statstcal
analyss:
Type I: rejectng H
0
when H
0
s true (probablty of type I error = a);
Type II: acceptng H
0
when H
0
s false; mssng an effect (probablty of type II error = ).
7.1 PowER oF A TEST
When we refer to the power of a gven statstcal analyss or test, we are quantfyng the lkelhood,
for a gven a value (e.g., 0.05), that a statstcal test (t test, correlaton coeffcent, ANOVA, etc.) wll
detect a real effect. For example, f there truly s a dfference between the means of two populatons,
what s the lkelhood that we wll detect that dfference wth our t test? If the chance of a type II
error s , then the lkelhood of detectng a true dfference s smply 1 , whch we refer to as the
power of the test. Some practcal ponts about the power of a statstcal test are the followng:
power of test = 1 ;
0 < power < 1
In general, we desre power 0.8 for practcal applcatons.
When tryng to establsh a sample sze for an experment, we have to decde n advance how
much power we want for our statstcal analyss, whch, n turn, s determned by the amount of
types I and II errors we are wllng to rsk. For some bomedcal engneerng applcatons, we can
rsk greater error than others. Mssng a dagnoss may not be problematc n one case or may result
n a lfe-threatenng stuaton. If an mplantable defbrllator fals to detect and electrcally shock
a heart rhythm, such as ventrcular fbrllaton, t could mean the loss of lfe. On the other hand,
shockng the heart when there s no fbrllaton present, whch may happen when the devce senses
electrcal actvty generated by nearby skeletal muscle, can result n pan and may even ntate a
dangerous heart rhythm.
It s often dffcult to mnmze both a and at the same tme. One s usually mnmzed at
the expense of another. For example, we would rather error on the sde of an mplantable defbrlla-
tor overdetectng fbrllaton so as not to mss an occurrence of fbrllaton. In the case of detectng
breast cancer usng mammography, overdetecton or underdetecton can both be problematc. Un-
derdetecton can mean ncreased chance of death. However, false-postves can lead to unnecessary
removal of healthy tssue. One of the bggest challenges for researchers n the bomedcal feld s to
try to fnd real dfferences between populatons that wll mprove dagnoss and detecton of dsease
or the performance of medcal nstruments.
1.
2.
1.
2.
3.
PowER ANALySIS ANd SAMPLE SIZE 83
7.2 PowER TESTS To dETERMINE SAMPLE SIZE
Power tests are used to determne sample sze and take nto account the effect that we wsh to detect
wth our statstcal analyss, the types I and II error rates we wll tolerate, and the varablty of the
populatons beng sampled n our experments.
To perform a power test, we can use equatons that express power n terms of the factors
stated above, or we can make use of power curve and power tables that have already estmated the
sample szes for us. Power curves and tables show the relatonshp between power (1 ) and the
effect we are tryng to detect. These effects can be dfference n two means, dfference n two var-
ances, a correlaton coeffcent, dfference n treatment means, and other populaton dfferences that
we are tryng to detect through experment and data analyss.
It s mportant to note that there are dfferent equatons and, thus, dfferent power curves for
dfferent effects, whch n turn are detected through dfferent statstcal tests. A dfference n two
means (the effect) s detected wth a t test, whereas the dfference n two varances (the effect) s
detected usng an F test. The power curves for the t test are dfferent from the power curves for the
F test.
Power curves most frequently used n estmatng a sample sze for a bomedcal experment
n nclude the followng:
unpared or pared t test (dfference n one or two populaton means);
Pearsons correlaton coeffcent (correlaton between two populatons);
ANOVA (dfference n three or more populaton means).
To perform a power test usng the power curves, we need the followng:
the sze of effect we want to detect (.e. dfference n two means);
an estmate of populaton parameters (.e. standard devaton for the populaton(s) based
on plot data);
a level (probablty of type I error);
power level = (1 ) (probablty of type II error).
The a and levels are selected by the nvestgator before desgnng the experment. The sze
of effect to detect s also chosen by the nvestgator. The nvestgator needs to determne n advance
how large the effect has to be to be sgnfcant for drawng conclusons. In other words, how dffer-
ent do two populatons means need to be to sgnal somethng dfferent n the underlyng physology,
drug effects, or a change n the manufacturng process. Or, how strong does the correlaton have to
be to mean somethng to the nvestgator, n terms of underlyng bology or processes? In general,
1.
2.
3.
1.
2.
3.
4.
the smaller the dfference to be detected wth a t test or the greater the correlaton value to be de-
tected, the greater the sample sze.
One mght argue that the most dffcult pece of nformaton to obtan s an estmate of the
standard devaton, or varance, of the underlyng populatons. It s the assumed varance of the
underlyng populaton that plays a large role n determnng sample sze. In general, the larger
the varance n the populaton, the greater the sample sze needed to capture the statstcs of the
populaton or process and, consequently, the greater the sample sze needed to gve power to a
statstcal analyss.
Table 7.1 shows an example of a power table for the unpared t test. Ths table was estmated
for a = 0.05. But smlar tables exst for other a levels. These curves and tables allow us to estmate
a sample sze f we frst select a normalzed dfference n the two means that we wsh to detect
and we select a power at whch to perform the t test. The normalzed dfference s obtaned when
we take the absolute dfference that we wsh to detect and dvde that dfference by the estmated
standard devaton of the underlyng populaton. Agan, we have a guess for the standard devaton
based on plot data. By normalzng the dfference to detect, we need not worry about unts of mea-
sure and can make used of standardzed tables.
TABLE 7.1: Table for power for Students unpared (one-tal) t test (a = 0.05)
n
difference in means (expressed as z score)
0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 1.0 1.2
10 0.08 0.11 0.16 0.22 0.29 0.36 0.45 0.53 0.70 0.83
13 0.08 0.13 0.18 0.26 0.34 0.44 0.54 0.63 0.80 0.91
20 0.09 0.15 0.24 0.34 0.46 0.59 0.70 0.80 0.93 0.98
30 0.10 0.19 0.31 0.46 0.61 0.74 0.85 0.92 0.99
40 0.11 0.22 0.38 0.55 0.72 0.84 0.93 0.97
60 0.13 0.29 0.50 0.70 0.86 0.95 0.98
80 0.15 0.35 0.60 0.81 0.93 0.98
100 0.17 0.41 0.68 0.88 0.97
200 0.26 0.64 0.91 0.99
PowER ANALySIS ANd SAMPLE SIZE 85
The best way to llustrate the use of power tests s to work through an example. We wll use
Table 7.1 to fnd the sample sze for the followng example:
Aim: To determne f there s there a dfference n mean heght for college-age men
and women lvng n the cty of make-beleve?
Experiment: Measure the heghts n random samples of college-age men and women
n the cty of make-beleve.
Question: How many samples do we need to reject our null hypothess that there s
no dfference n mean heght between college-age men and women n the cty of
make-beleve f ndeed there s a true dfference n the two populatons?
We need to perform a power test to determne how many samples we need to collect from the
college-age men and women f we are to detect a true dfference n the underlyng populatons. We
are assumng that the two underlyng populatons are normally dstrbuted. The effect that we are
tryng to detect s whether there s a sgnfcant dfference n mean heghts; thus, we wll be usng
an unpared t test to analyze the data once they are collected.
Before performng the power test, we need to decde on the magntude of effect we wsh to
detect as beng sgnfcant when we perform our t test. For ths example, we are gong to choose a
3-n. dfference n mean heghts as beng sgnfcant. For argument sake, we wll clam that a 3-n.
dfference n populaton means s of bologcal sgnfcance.
In ths example, the effect beng tested (
1

2
) s 3 n. To use the standardzed curves, we
need to normalze ths dfference by an estmate of the standard devaton, , of the underlyng
populatons. Ths s bascally a z score for the dfference n the two means. Note that we assume the
two populatons to have roughly equal varance f we are to use a t test. To obtan an estmate of ,
let us assume we have collected some plot data and estmated the sample standard devaton, s = 5
n. Now, the normalzed dfference we wsh to detect when we apply our t test s 3/5.
We next choose a = 0.05 and power = 0.8. Agan, these error rates are our choce.
Now, usng the magntude of our effect (3/5), power value (0.8), and a value (0.05), we used
the tables or curves n Table 7.1 to look up the number of requred samples from the power tables.
For ths example, we fnd the sample sze to be approxmately 35 samples.
Some general observatons may be made about the mpact of populaton varance, types I and
II error rates, and the magntude of the effect on the sample sze:
For a gven power and varance, the smaller the effect to detect, the greater the sample
sze.
The greater the power, the greater the sample sze.
The greater the populaton varance, the greater the sample sze.
1.
2.
3.
If we use the same example above and the Mntab software to calculate sample sze for an
unpared t test as we vary several of the nputs to the power test, we fnd the followng sample szes
(Table 7.2):
TABLE 7.2: Relatons among sample sze, statstcal power, standard devaton, and magntude of
effect to detect (a = 0.05)
Size of difference in
means (in.)
Standard deiation
(in) Power of test Size of sample
1 5 0.8 394
2 5 0.8 100
3 5 0.8 45
4 5 0.8 26
3 1 0.8 4
3 2 0.8 9
3 10 0.8 176
3 5 0.9 60
Thus, we can estmate, n advance of performng an experment, a mnmal sample sze that
would allow us to draw conclusons from our data wth a certan level of confdence and power.

87
Compared wth most textbooks about statstcs, the prevous seven chapters have provded a bref
elevator ptch about the fundamentals of statstcs and ther use n bomedcal applcatons. The
reader should be famlar wth the basc noton of probablty models, the use of such models to de-
scrbe real-world data, and the use of statstcs to compare populatons. The reader should be aware
of the mpact of randomzaton and blockng on the outcome of statstcal analyss. In addton,
the reader should have an apprecaton for the mportance of the normal dstrbuton n descrbng
populatons, the use of standardzed tables, and the noton that statstcal analyss s amed at test-
ng a hypothess wth some level of confdence. Fnally, the reader should know that we can fnd
confdence ntervals for any estmated statstc.
In ths text, we only covered those statstcal analyses that are vald for populatons or pro-
cesses that are well modeled by a normal dstrbuton. Of course, there are many bologcal processes
that are not well modeled by a normal dstrbuton. For other types of dstrbutons, one should
read more advanced texts to learn more about nonparametrc statstcs and statstcs for nonnormal
dstrbutons or populatons. These nonparametrc tests do not assume an underlyng dstrbuton
for the data and are often useful for small sze samples.
How do we determne whether our data, and hence underlyng populaton, s well modeled
by a normal dstrbuton? We can begn by smply lookng at the hstogram of the sampled data
for symmetry and the proporton of samples that le wthn one, two, and three standard devatons
of the mean. We can also compare the sample mean to the sample medan. A more formal means
of quantfyng the normalty of a sample s to use a
2
test [7]. The
2
test nvolves comparng the
actual sample dstrbuton to the dstrbuton that would be expected f the sample were drawn from
a normal dstrbuton. The dfferences between the expected frequences of sample occurrence and
the true frequences of sample occurrence are used to estmate a
2
test statstc. Ths test statstc s
then compared wth the crtcal values of the
2
dstrbuton to determne the level of sgnfcance
or confdence n rejectng the null hypothess. In ths case, the null hypothess s that the frequency
dstrbuton (or probablty model) for the underlyng dstrbuton s no dfferent from a normal
dstrbuton. The
2
test s also referred to as the goodness-of-ft test. We note that the
2
test can
also be used to compare a sample dstrbuton wth other probablty models besdes the normal
dstrbuton.
Just the Beginning
C H A P T E R 8
We have covered one-and two-factor ANOVA. But one wll also encounter multvarate
ANOVA, n whch there s more that one dependent varable (more than one outcome or mea-
sure beng sampled.) Such analyss s referred to as MANOVA and s used n a more generalzed
multple regresson analyss n whch there may be more than one ndependent varable and more
than one dependent varable. The dependent varables are multple measures (or repeated measures)
drawn from the same expermental unts beng subjected to one or more factors (the ndependent
varables). In lnear regresson, we sought to determne how well we could predct the behavor of
one varable gven the behavor of a second varable. In other words, we had one ndependent var-
able and one dependent varable. In the case of multple regresson analyss, we may assume more
than one dependent varable and more than one ndependent varable to explan the varance n
the dependent varable(s). For example, we may have one dependent varable, such as body weght,
whch we model as a lnear functon of three ndependent varables: calore ntake, exercse, and age.
In performng a multple regresson analyss, we are tryng to predct how much of the varablty
n body weght s because of each of the three ndependent varables. Oftentmes, one ndependent
varable s not enough to predct the outcome of the dependent varable. However, we need to keep
n mnd that one should only add addtonal predctors (ndependent varables) that contrbute to
the dependent varable n a manner that the frst predctor does not. In other words, the two or
more predctors (ndependent varables) together must predct the outcome, or dependent varable,
better than ether ndependent varable can predct alone. Note that a full, generalzed lnear model
allows for multple dependent and ndependent varables. In essence, the multple regresson model
s lookng for smple correlatons between several nput and output varables.
Although we have focused on lnear regresson, the reader should also be aware that there
are models for nonlnear regresson that may be qute powerful n descrbng bologcal phenomena.
Also, there s a whole lterature on the analyss of errors that result when a model, used to predct
the data, s compared wth the actual measured data [3]. One can actually look at the resdual er-
rors between the modeled or predcted data and the actually measured data. Sgnfcant pattern or
trends n the magntude and orderng of resduals typcally ndcate a poor ft of the model to the
data. Such patterns suggest that not all of the predctable varablty n the measured data has been
accounted for by the model.
Other multvarate analyses nclude cluster, dscrmnant, and factor analyses. These topcs
are covered n numerous statstcal texts. These analyses allow one to group a populaton nto sub-
populatons and explan complex data of seemngly many dmensons of factors nto a smaller set of
sgnfcant factors.
In ths text, we have also not covered recever operator characterstc curves. These are sta-
tstcal analyses used frequently n the desgn and assessment of the medcal devces or dagnostc
tests used to detect dsease or abnormaltes. These curves, often summarzng terms such as sen-
JuST THE BEgINNINg 89
sitivity, specifcity, and accuracy, provde a means for determnng how accurately a dagnostc tool,
test, or algorthm s n detectng the dsease or abnormal physologcal functon. We have prevously
dscussed types I and II errors, whch can also be used to estmate the senstvty, specfcty, and ac-
curacy of a dagnostc test. There s often a trade-off between senstvty and specfcty, whch can
rase frustraton for the bomedcal engneer tryng to develop safe, accurate, practcal, and nexpen-
sve dagnostc tests. The recever operator characterstc curve s a graph that plots the senstvty
of the test (probablty of a true postve result) aganst the probablty of a false-postve test. The
operator usually chooses to operate at the pont on the recever operator characterstc curve where
senstvty and specfcty are both maxmzed. In some cases, reducng specfcty at the expense of
senstvty may be preferred. For example, when admnsterng a test to detect streptococcal throat
nfecton, we may prefer to maxmze senstvty n order not to mss a dagnoss of strep throat.
The trade-off s that we may reduce specfcty, and a person wthout streptococcal nfecton may
be msdagnosed as havng the streptococcus bactera. The result s that the person ends up payng
for and consumng antbotcs that are servng no purpose. In other clncal stuatons, the trade-off
between senstvty and specfcty may be much more complex. For example, we often wsh to use
a nonnvasve magng tool to detect cancerous breast lesons. Hgh senstvty s desred so that the
test does not mss an occurrence of a cancerous leson. However, the specfcty must also be hgh so
that a patent wll not have unnecessary surgery to remove healthy or noncancerous tssue.
Fnally, there s a vast area of statstcal analyss related to tme seres, that s, data collected
over tme. Tme s the ndependent varable beng used to predct the dependent varable, whch
n bomedcal applcatons, s often a bologcal measure. Statstcal tme seres analyss s ncely
ntroduced n Bendat and Persol [7] and plays an mportant role n most bomedcal research. Such
analyss s often used to develop automated detecton algorthms for patent montorng systems,
mplantable devce, and medcal magng systems.
Ths text alludes to, but does not explctly cover, the use of statstcal software packages for
the analyss of data. There are a number of statstcal software programs that are commercally aval-
able for the engneer who needs to perform statstcal analyss. Some of the software packages are
freely avalable, whereas others can be qute expensve. Many of the software packages offer some
tutoral assstance n the use and nterpretaton of statstcal and graphcal analyss. Some have user-
frendly nterfaces that allow the user to quckly load the data and analyze the data wthout much
nstructon. Other packages requre consderable tranng and practce. Some of the more popular
software packages for statstcal analyss used n the bomedcal feld nclude SPSS, Mntab, Excel,
StatVew, and Matlab.
Fnally, an mportant takeaway message for the reader s that statstcs s nether cookbook
nor cut-and-dry. Even the most experenced bostatstcans may debate over the best analyses
to use for bomedcal data that are collected under complex clncal condtons. Useful statstcal
analyss requres that the user frst frame a testable hypothess, that the data be collected from a
representatve sample, and that the expermental desgn controls for confoundng factors as much
as s practcal. Once the data are collected, the user needs to be as far as possble n summarzng
and analyzng the data. Ths requres that the user have a good apprecaton for the assumptons
made about the underlyng populatons when usng a statstcal test as well as the lmtatons of the
statstcal test n drawng specfc conclusons. When used properly, statstcs certanly help us to
make good decsons and useful predctons, even n the context of uncertanty and random factors
over whch we have no control. Bomedcal engneers need to embrace statstcs and learn to be as
comfortable wth the applcaton of statstcs as they are wth the applcaton of algebra, calculus,
and dfferental equatons.

91
MC: Ropella BM_Page 91 - 09/26/2007, 2:14PM Achorn Internatonal
[1] Task Force of the European Socety of Cardology and the North Amercan Socety of Pac-
ng and Electrophysology, Heart rate varablty. Standards of measurement, physologc
nterpretaton and clncal use, Circulation, vol. 93, pp. 10431065, 1996.
[2] Wagner, G.S., Marriotts Practical Electrocardiology, 10th ed., Lppncott Wllams & Wlkns,
Phladelpha, PA, 2001.
[3] Hogg, R.V., and Ledolter, J., Engineering Statistics, Macmllan Publshng, New York, 1987.
[4] Gonck, L., and Smth, W., The Cartoon Guide to Statistics, HarperPerennal, New York,
1993.
[5] Salknd, N.J., Statistics for People Who (Think They) Hate Statistics, 2nd ed., Sage Publcatons,
Thousand Oaks, CA, 2004.
[6] Runyon, R.P., Fundamentals of Statistics in the Biological, Medical and Health Sciences, PWS
Publshers, Boston, MA, 1985.
[7] Bendat, J.S., and Persol, A.G., Random Data. Analysis and Measurement Procedures, 2nd ed.,
Wley-Interscence, John Wley & Sons, New York, 1986.
[8] Ropella, K.M., Sahakan, A.V., Baerman, J.M., and Swryn, S., Effects of procanamde
on ntra-atral electrograms durng atral fbrllaton: Implcatons for detecton algorthms,
Circulation, vol. 77, pp. 10471054, 1988.
[9] Fung, C.A., Descriptive Statistics. Module 1 of a Three-Module Series on Basic Statistics for Ab-
bott Labs, Abbott Labs, Abbott Park, IL, December 11, 1995.
[10] Olkn, I., Gleser, L.J., and Derman, C., Probability Models and Applications, Macmllan Pub-
lshng, New York, 1980.
[11] Fung, C.A., Comparing Two or More Populations. Module 2 of a Three-Module Series on Basic
Statistics for Abbott Labs, Abbott Labs, Abbott Park, IL, December 12, 1995.
[12] Minitab StatGuide. Minitab Statistical Software, release 13.32, Mntab, State College, PA,
2000.
[13] Ropella, K.M. Baerman, J.M., Sahakan, A.V., and Swryn, S., Dfferentaton of ventrcular
tachyarrhythmas, Circulation, vol. 82, pp. 20352043, 1990.
[14] Schmt, B.D., Benz, E., and Rymer, W.Z., Afferent mechansms of fexor refexes n spnal
cord njury trggered by mposed ankle movements, Experimental Brain Research, vol. 145,
pp. 4049, 2002.
Bibliography
MC: Ropella BM_Page 94 - 09/26/2007, 2:14PM Achorn Internatonal

Introduction To Statistics For Biomedical Engineers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Statistics For Biomedical Engineers

Uploaded by

Copyright:

Available Formats

MC: Ropella FM_Page - 09/27/2007, 11:55PM Achorn Internatonal

MC: Ropella FM_Page v - 09/27/2007, 11:55PM Achorn Internatonal

Collecting data and

, and standard deva-

, n unts of standard devaton. For example, f a sample pont, x

, s tself a random varable that s modeled wth a normal

, come from any arbitrary

) wll have a normal dstrbuton. One can eas-

. Our queston s how close s the sample mean to the

s estmated from the sample. We then ask, how close s x

(sample mean) to the true

for each group,

approaches a normal dstrbuton.

follows a normal dstrbuton, and the z score for x

may be used to estmate the

Ths expresson assumes a large n and that we know .

, to a z value (because the central lmt theorem says that x

If we use the followng notaton n terms of the sample standard error:

= 505 and s = 100. If the number of samples was 1000,

s no longer normally dstrbuted. Therefore, we use Students t dstrbuton to

Curve changes with df

10 1.372 1.812 2.228 2.764 3.169

30 1.310 1.697 2.042 2.457 2.750

1.282 1.645 1.960 2.326 2.576

, usng the t dstrbuton now becomes

and, hence, when the sample sze s small.

s the sample average for x

s the sample average

s the average dfference of the dfferences, W

represent the measured ndependent and dependent

, between the predcted value (open crcles) of the dependent var-

, and the measured value of the dependent varable, y

, s mnmzed over all data ponts.

You might also like