You are on page 1of 105

SIMPLE

STATISTICS
'When I first studied slotistics I found greot difficulty in
understanding whof I now regord OS very bosic concepts. With
hindsight I con see fhot this wos lorgely due to my own
rnothemoficol ineptitude combined with the disoster of
"/ntroductory Statistics" fought by highly quolified ond competent
rnothemoticions. Yeors loter, when I become involved in teoching
introductory statistics to psychology students myself, I would
recognise my eorly struggles i n olmost every student - ond I
resolved to make the experience less arduous for them, and more
enjoyoble.'

This i s part of a letter from Dr Clegg written in 1981. Her resolve has
produced Simple Statistics - a book that is 'less arduous' and very
certainly 'more enjoyable'. It is suitable primarily for A-level students
and undergraduates following courses on psychology and, to a lesser
degree, sociology, economics and geography.

I S B N 0-523-28802-9

Cam bridge
UNIVERSITY PRESS
SIMPLE
STATISTICS
A course book for the social sciences

To my son, David Frances Clegg


Honorary Research Associate, University of Hull

CAMBRIDGE UNIVERSITY PRESS


Cam bridgc
Kc\r York Porr Chcstcr
h4clbournc Sydncy
Published by the Press Syndicate of thc Univel-sily ol' Cambridge
The Pitt Building. Trumpington Street. Ca~nbridgcCB? I R P
40 West 20th S~rcet,New York. NY 1001 1 4 2 1 1 . USA
I0 Stamford Koad, Oakleigh, Melbourne 3 166. Aus~ralia

0 Cambridge Univcrsily Press 1982

First published 1982


Eighth printing 1991

Prinled in Grcat Britain at the


University Press, Cambridge

Library of Congress catalogue card number: 82- 12883 Page


Acknowlcdgetncnts vii

Clepg, Frances 1 W h y d o w e necd statistics?


Simple sta~isrics. 2 Measures of central tcndcncy
I. Social sciences - Statistical methods
I. Title 3 Measures of dispersion
5 19.5'02430 1 HA29 4 T h e normal distribution
5 Probability
ISBN 0 521 28802 9
6 W h a t are statistical tcsts all a b o u t ?
7 Hypotheses
8 Significance
9 Simple statistical tcsts
10 What's in a n u m b e r ?
11 T w o parametric tests
12 Tests of goodness of fit
13 T h e design of experiments
14 Sampling
15 Correlation
16 In the last analysis. .;

Operation schedules
1 T h e mean
2 T h c median
3 T h e mean dcviation
3 T h c standard deviation
5 T h c standard dcviation (altcrnativc method)
h H o w to rank scts o f scores
7 T h c Wilcoxon matchcd-pairs signcd ranks tcst
8 T h c sign tcst
9 T h e Mann-Whitncy U test
I0 T h c I test for rclated samples
II T h c I test for unrclatcd sarrlplcs
I ? Simple chi-square
172
1.3 Complex chi-square
14 Spearman's rho
15 The Pearson product-moment correlation
I S.?

Answers to exercises
Appendix
lndes

Like many other teachers of introductory statistics, I felt a need to supplement the
available textbooks with my own handouts. When time permitted I compiled these
The report 'Some observations on the diseases of Brunus edwardii (species nova)' b? into a single manual; at that point, thanks to illustrations drawn by Chris Hinds and
D. K. Blackmore. D. G . Owen and C M Young first appeared in The V~rerin(rrj. Patrick Sammon, the project took o n a life of its own. I still remember with gratitude
Record, I April 1972. The extracts from it on pages 148-52 are reproduced with the ideas, the time for discussion and the support which Chris and Patrick gave to me.
permission from the journal's editor, Edward Roden. It was quite a challenge to turn a statistics manual-potentially the dreariest and most
off-putting kind of book - into something which students might positively enjoy
Table S2 is based on table 2 of Some rcrpid approximate statistical procedures by F , using. However, the response to the manual was very encouraging and the book you
Wilcoxon and R. A. Wilcox (1964, New York: Lederle Laboratories) and is are now looking at is, I hope, an improved version which retains the early spirit. If
reproduced with the permission of the American Cyanamid Company. you can suggest further improvements, I will be pleased to receive your comments
Tables S4 and S7 are from tables C.7 and C.6 of The numbers gntnr: sta~isticsfor and ideas for further amendments.
p s y c h o i ~ g ,by
~ Joan Gay Snodgrass (1978, New York: Oxford University Press). The original illustrations have now been replaced by Elivia Savadier's skilfully
Table S4 draws on table 11.4 of Handbook ofstati.~ticalrablesby D. B. Owen (1962, drawn cartoons. The debt to Patrick and Chris remains - but Elivia has also
Reading, Mass.: Addison Wesley) and on D. Auble's article 'Extended tables for the succeeded in making another valued contribution to the finished product. I must also
Mann-Whitney statistic', Bulletin of the Institute of Edrrcational Research at Itldiur~a thank Dodie Masterman, a leading authority on Tennyson's poem 'Maud', for
University, vol. 1, no. 2, 1953. providing the lovely little illustration on page 180.
Several members of the Psychology and Mathematical Statistics Department at
Hull University also gave me advice and help. Professor A. D . B. Clarke and Dr Ann
Clarke, Dave and Jean Williams ar;d Lorraine Hudson must be named in particular.
More recently I have valued the enthusiasm and patient comments 1 have received
from Graham Hart, Sue Glover and Marcus Askwith at Cambridge University Press.
It will be largely due to their efforts that the book has improved over the two-year
interval. Needless to say, any errors which remain are due to my own misjudgements
and oversights, and should not be associated with anyone else who has been involved
in the book's production.
I am constantly amazed by the great tolerance shown to me by the members of my
family; without their understanding and assistance I doubt that I could have writtcn
the book. I must thank my husband, Brent Elliott, in particular. for he has helped me
considerably with many aspects of the final manuscript preparation.
Finally I would like to acknowledge the role of all the A-level students I ti~ughtat
Hull College of Further Education - a n d who were cxposcd to much of the written
material in 'live' form! It was their needs and responses which inspired my first
attempts to teach statistics, and from them I started not only to learn how t o d o i t , but
also to appreciate the value of humour in the classroom.
As a nonstatistician I feel some trepidation in producing a statistics textbook.
vii
Ironically thoupli. it seems that people who are lessexpert in statistics a r c better able
t o understard the problems that students (and particularly those who label them- 1 Why da we need statistics?
selves 'non-numerate') encounter when starting thc suhject, and arc thus in ;I
position to cover the ground more gently. If you are about t o embark on statistics,
then I hope that I manage to explain clearly what the subject is all ltbout. Whcn )IOU
have finished this book you will then be in a position to turn to standard textbooks for
further information - and I will consider that I have truly succeeded in my aim i f you
;Ire able t o d o this with interest and pleasure.

perhaps when you began your course in o n e of t h e life sciences, you felt dismayed to
discover that you would have to start doing statistics. You wouldn't b e the first person
to feel like this! Understandably, many students imagine that the new syllabus will
concentrate entirely on aspects of behaviour o r mental processes shown by living
organisms, and that knowledge of maths will not b e needed. S o why isit your bad luck
that you now have tostart statistics- just when you thought that at last you would be
able t o devote all your attention to a really irlreresrirlg subject? In the next sections I
shall outline the main uses of statistics in the life sciences, and conclude the chapter
by considering the matter o f just why it is that s o many students dislike the subject
and find it difficult.

Statistics for description


In the social and biological sciences, although we a r e very happy t o be able t o
understand precisely what makes o n e living organism 'tick', a t the same time, our
overall aim is t o be able to comprehend the mechanics which underlie the behaviour
of a n entire species. Then we can use o u r knowledge t o make predictions about
individuals o r groups of individuals which we have not previously encountered o r
studied. Thus in our studies of living beings and their activities, we will often be
working with several individuals a t any o n e time. In surveys, the numbers may run
into thousands, but there will normally be smaller numbers in the more carefully
controlled experimental type of investigation. Inevitably o u r efforts will reward us
with sets of data which usually, although not always, take the form of numbers. It is
in conveying inforni;~tionabout, and trying to interpret. these large sets of numbers
in an efficient and convenient manner that we really need dcscriprir~estatistics. A n
example will make this clear.
Suppose soniconc was studying road accidents. with a view to making road safety
recommendations. I'hc first thing to discover is ~ h c n .where. and under what
circumst.~ncesaccidcnts occur. W e will look at 'when' in more detail. T h e times of
road accidents can easily bc obtained from police records. and the researcher could
lind out how many accidents occur each year, nionth, week, day, and even hour. T h e
data could be put into the form of daily tables. Well, here it is, looking very
impressive, hut taking up an awful lot of space! Constantly wading through sheets of
daily accident tables is not going to be particularly useful. either, until some kind of
clverall picture o r summary can be gleaned. A good starting point would be an
indication of thc 'norm:~l'. o r 'usual'. number of accidents per year. month. week.
I
\\'It) do n c nccd statistics? Why do we need statistics:'

ere., these tigures being ci~llednverrrges. You all know, even i f only vaguely, what an
average is. O u r researcher might say:
11 arc studyirlg. 110th in terms of pico col patterns and thc ~ ~ r ~ r i l ~which
cxpcct.
( i o n \vc miglit

'On average. there a r e about 100 accidents per week in Dodge City,'
using as his basis the fact that 10 000 accident reports came in ovcr a two-year period.
Notice the word 'about'. 11 indicates that you would not expect precisely 1(H)
accidents t o occur each week. but that some variation around the tigirre of 100 is to
be expected. T h e researcher might then go o n to give niorc spccitic details . . .
'Usually, most of the accidents involving othcr- cars occur betwecrl 10.30 p n ~and
midnight on Fridays and Saturday.;. Of the accidcrlts involving children and
pedestrians, which comprise about 40 each week. roughly ;in eighth happcn between
8 and 9 a m , Mondays t o Fridays, a quarter on the same dxys. but between 3.30 and
6.30 pm, a n d the remainder during the weekend daylight hours.'
These sentences describe briefly, yet fairly accurately, the wealth of information
contained in the l 0 0 0 0 reported incidents. But no o n e is yawning, o r feeling the
minor panic induced in the researcher when confronted with the original data - in
twenty cardboard boxes! T h e average is o n e kind of descriptive statistic. It is a
number which indicates a 'typical' o r 'central' figure for a group of numbers, and is
officially called a measure of centralfendency. From the example just given, averages Statistics for drawing conclusions
could be quoted for any of the groups of numbers comprising yearly, weekly, daily o r T h e o t h e r main use o f statistics is in making decisions about situations where you are
hourly accident rates.
not entirely confident that the 'truth' has been revealed. In an experiment certain
Another type of descriptive statistic is used t o qualify the word 'about', as in the
events take place (hopefully, o n e s which are more o r less anticipated by the
sentence 'There are abour 100 accidents p e r week.' Clearly, there is a difference
experimenter!), changes are recorded. and the findings, which will usually comprise
between a town in which anything from 50 to 150 accidents is usual, and one where
numbers of some sort o r another, are used as a basis for drawing conclusions about
no less than 98 and no more than 103 accidents occur in any single week. Although
the underlying events. Statistics used in arriving at conclusions in this way a r e called
both towns might have an average of 100 accidents each week, 'about' signifies that
inferentialstatistics. Think about the following example.
there may be a very large departure from the average in the first town, but only two
Suppose you gave two people of similar age and intelligence a long list of words to
or three more o r less than t h e average in the second. Used on itsown the word'about'
read. a n d asked them t o recall the words later. Despite their similarity as humans.
is far too vague, and we need some means of giving more details a1,out the variation
and in age and intelligence, their recall of the information would undoubtedly differ
which occurs. T h e solution is to use the kind of descriptive statistic which is called a
measureofspread, o r sometimes, ameasrcreofrlirpersiot~; i r simply indicates just how
- o r show variability. N o doubt you can think of several reasons why this should b e
so. They may have concentrated t o different extents whilst reading the words; some
much the word 'about' means for a particular set of figures.
of the words might have conjured u p strong association or visual images for either of
A s living creatures show the most tremendous variety in their attributes.
the learners; one of them might have been very anxious about the purpose of the
behaviour, a n d just about every characteristic you care to name. variation is an
reading tack. whilst the other took it more light-heartedly; one of the learner5 mighr
inescapable fact of life. O n the whole, the simpler the organisnl, thc less variation it
have spent the immediate pre-learning period propping up a nearby b a r . . . These.
will display; but most readers of this book will he especially interested in studying the
o r a ccore of other factors. could have infli~enccdthe learners' recall.
hehaviour of m i ~ m m ; ~-\ the
s most complex anim~lls-and mirrl in particuli~r-themost
complex of the lot! If humans were fairly similar in thcir behaviour and charac-
teristics, then we would not need to study s o inany of them to h c able to make
statements about mankind as n whole. A s it is, humans v:rry trcmcndously, not only
on :I world-wide scale. and with rcg;rrtl t o cultural diffcrcr~cc.\;rnd ;Ipl)carance hut
also within cultures. :rnd. ;IS we all know, within nations and S;rnlilie.;. Evcn idcntici~l
twins. \\,h0 hilvc rhc \;rnic genetic make-up. are not cntircl!. iilikc, due to the effcct of
rhc different cspcricnccs they have had from conceptiori ollwartls. 111 otllcr words,
living organism\ arc ur~iqueentities, ; ~ n dtllc more COIIII~IC?; t l l ~organism. t l ~ cless
likcly it is to bc'l~irvcin the* s:rrnc way as its ncighhour. So \VC olterl need statistics to
tlcscribc adequately the 1;lt-g~numbers of people. othcr i ~ r l i ~ ~ l ;o~ rl >C .V I ' I I ~ S\vllich we
\\'lly do \VC 11ccdstatistics?

Suppose that you have j11st inventcd a ~ i c wn ~ c m o r i s i ntechnique.

must try out your techniquc o ~ lnorci

didn't havc such a group (callcd ;I control o r cot~tr.o/grolr/)).


~
find out whether i t works as wcll as you hol~c.('ommon-wnsc \\,ill tcll
i ~ n dyou wish to

than onc pcrson. and also that you must


compare the technique in actual use with memorisation which is carried out hy
another, similar, groupof people wlio havc not hat1 thc bcncfit ol'your wisdom. I f you

completed the investigator will be thc proud owner of scts of scores (the results,
which in this case represent success in memorising), obtained from thc \.ictims, who
arc usually referred to assubjects. Anothcr piecc of jargon used in cxpcrimental work
is the verb used to describe the participation of subjects in an expcriment. We say that
they ran in an experiment, and also talk of experimenters rrlrlnirrg either subjects or
experiments.
Let's return to the memory experiment, in which twogroups havc participated and
provided us with recall scores. Suppose that all the people who used the ncw
tcchnique recalled thc same words correctly, and that these wcrc 80% of the total
number of words on thc list, whilst thc unaided group, the control subjects, recallcd
the samc kinds of words, but only 40% of the list. Doubtless you would hurry off to
patcnt your new memory technique! This is not a plausible situation though, is it? It
would be much more likely that the aided group got about SOo%of thc uords right,
and the unaidcd group about 40%. Probably thc words rcc:~llcd would also bc
different for each pcrson. A diffcrent, but even more realistic, outcome would be the
aided group getting about 60% of the wqrds correct, and the unaided group about
50%. Would you be so certain now that your technique was an improvement? Let's
consider again thc word 'about'. It dcscribes a scattcring of rcsult scorcs which will
occur over and over again in experimental work. With thc last set of results
rnentioncd for thc mcmory experimcnt. it could have becn that thc lowest score in
the aided group was 45% and thc highcst 70%; in the unaidcd Sroup. the lon,cst 30'k
and the highest 80'%,. In other words. somc pcoplc in thc unaided group did better
than some in the aided group. The ovcrlap of scorcs is prcscnted visually in figure l .
It is this problem of overlapping sets of scores which crcatcs thc nced for statistical
t I i ; ~ tyou

then you uotrld have no


idea what sort of aid your technique is providing. For all you know. i t might turn out
to makc recall harder, rathcr than casicr! So you must havc [hi\ otlie~.group of
memoriscrs acting under identical conditions to thosc using the ncw metl~od.cxccpt
that they are not actually using thc ncw tcchniquc.
If the group using thc new method co~nprisedpeoplc \\.ho had good mcmosies,
whilst the other group was made up of poor memorisers, thcn thc con~parisonwould
hardly be a fair'one. But although it is easy to sec why thc two groups should hc
similar to each other, in practice it isoften difficult to achicve complcte similarity, as
you might havc guesscd. We shall return to this topic later i n the book. hlcanwhilc,
a set-up like the one just described is called nn experir?letlt. Whcn i t has hccn
t
I

-+
we may not I)c ;thlc to. I~cc:~usc
e ~ i o t ~ sSome

Figure I .
Why do we need statistics:'

our mctl~odsof assesslncnt arc not sophisticated


l ~ . 01. rhc ways in \\,hich \\,c mc.asurc psrsc7nality and intclli~e~lcc
still very crude. Thcrc may I)c other aspects of organisms which WC should
consider, l ~ u tour lack of knowlcdgc means that we have not yet Icnrned thc
irnl,ortancc or r c l c \ ~ ; ~ t of
~ c thcsc
c fcarurcs. and so \VC ignore thcm.
3 Even when we liavc m:~tclicdthc groups soundly, our cxpcrimcntai cfforts Inay still
not rcsult in tlicir prcscnting scorcs \vhich arc clcarly diffcrcnt, bec;lusc our
untlcrstancling of thc phenomenon undcr consideration was too limited. Put
another w;~y.tlic cxpcrimcnt didn't '\vork'!

Aided group scores

Unaided group scorcs


-l+ X X X X

% items recalled correctly

Overlap of scores in a memory experiment


X

These factors will bccome very real to you when you actually start to carry out
experiments - situations in which we change something and thcn try to decidc
whcther our change brought about other changes. S~rrveysprovide another way of
g:~thcring information about organisms or cvcnts. Our role is lcss active than in
cxpcrimcnts howcvcr, for here nrc dra\v data from. groups already occurring
naturally, and don't actually inducc any changcs. When it comes to analysing thc
results though, just as with cxpcriments, we find that our data may not indicate
clcarly distinguishable groups, but ones with a ccrtain degree of ovcrlap. Once more,
infcrcntial statistics come to our aid in helping us decidc thc cxtcnt to \vhich thc
groups really differ.
Exercise
I .Thc rcwltsoffo~~r

Table I .
X
arc

mcmory cxpcl-in~cnrba r c shown in table 1. Study the numhcrsnnddccidc


for yourself which expcrimcnts suggest that the memory techniquc being ~ricdout act(1a11y
clocs help pcople to nicniorisc hcttcr. Ansu'crs arc givcn at the end of [hc book.
Results from four separate memory experiments

Espcrimcnt I
Aidctl Unaidcd
Experiment 2
Aitlcd Unaidcd
Experiment 3
Aidcd Unaidcd
Expcrimcnl-l
Aidcd Unaidcd
analysis - and inferential tcchniques in particular. The overlapping is largcly due to -pp--

the following factors. Notice that the first two are n dircct rcsulr of the natural 55 30 50 40 50 45 30 50
variation which occurs in complcx organisms. h(l 35 55 -l5 55 50 40 52
1 We can nevcr match our comparison (control) group c~.vrrc.flywith thc cxpcrimental 65 40 60 50 6(l 55 50 58
group on every single relevant attribute (c.g. agc. intclligcncc, motivation, 70 45 65 55 h5 60 52 60
previous cxpcricnccs. family background, pcrsonality. ctc.). 75 50 70 5s 70 65 54 h5
2 Thcrc arc dimcnsions of pcrsonality o r experience which u.c .s/~oltlrlmatch on. hut NI 1 55 75 70
p_p p-p-.-_ __-
.
p p p -

S
\\'l\? do w e nctd statistics? WIly d o we need statistics:

y o u probably founcl that it \\.as most clifficult to decide \vllctlicr the memory else \\'ho needs to use tools for particul:~r tasks. \\,ill select instl-~~n>crlts
appropri:ire
i q u e in expcrimcnts 7 and 3. This is wliy we need inl'crcntial statistics-
~ ~ . ~ l ~ n\vorked for the job. Approl?riate~lcsswill I,e decided on the basis of tlrc ~?al.ticul;~r r~l:rtcrial%
or the dreaded statistical tests! When we can n~erelyglance at sets of scores, as in involved and the degree of precision sought. Think of statistics i l l the same way. .The
experiments 1 and 4, and see immediately that they are different, we call this, .job' we undertake is to describe events and attempt to draw conclusions from them:
jokingly. the 'eye-ball' test. Unfortunately, we are not able to get aw:ry wit11 tl~istest the 'tools' are the various statistical techniques which are av;~ilahle.In order to p;rs\
very often. I t is far more typical to obtain scores wliicli necd careful analysis in order statistics exams you will necd to know something about certain tcchniqt~es(thc
ro find out whether one of the groups is recrlly different from the other; tli;~tis to s:ry. tools), and, of course, how to use them.
\vIicn our experimental conditions ha\,c not crcatcd ;I sufficic~~t tlifl'crcncc for us ro I,c I f you asked a driver ho\v the engine of his car workcd. he \vouid prol,al,ly be able
; ~ h l to
c easily distinguish the two scts of scorc.3. to describe the b;rsic principles and name and locate tlre main parts. liowever. it is
Another explanation for the failure to show a clear difference in the sets of scores unlikely that he would be able to identify the causes 01'. or rectifv, an engine f:rilurc
lies in an element of luck. Our memory technique might be a perfectly good one, but other than a simple one. Most people who use engines and tools are similar in this
just through bad luck, the items listed for recall might have given rise to particularly respect. They know how to use the instrument. when to use it, and when nor to use it.
strong visual images or associations for members of the unaided group, thus making but have only a rough idea of how it actually works. The same is true of statistics. You
that set of scores as a whole higher. It might have been the other way round in are only required to have a rough idea of how the techniques work - more detailed
experiments 1 and 2. Chance factors may have made it seem as though our memory knowledge and understanding is the province of the mathem;~ticalstatistician. Like
aid groups were better, though we would find, i f we used different subjects. that engineers, statisticians are constantly devising new techniques and modifying
really the technique isn't as good as the two scts of results led us to believe initially. existing ones, and it is their expertise which filters down to the many people who use
Unfortunately, this element of luck can never be completely ruled out; even after we statistical techniques in their daily work. The workmen themselves are not expected
have carried out a statistical analysis, we usually feel that we cannot state our to understand in great detail how the tools work, o r to carry out modifc;~tionsor
conclusions with complete confidence, but must qualify them according to the role improvements.
we think that chance may have had. The qualifications we make -our cautiousness in Learning about statistics is also like beinga workman in another respect. Although
concluding whether an experiment 'worked' or not - are built into the statistical you may learn about the theoretical aspects of statistical techniques - uses of tests.
analysis techniques, and so at the end of our calculations we are able to estimate their strengths and weaknesses, etc. - your knowledge would not be entirely
precisely the part we consider chance factors (or luck!) to have played. Note that complete if it did not include a certain amount of practice at using the various
although I have attributed results to luck, or been forced to consider that an element procedures. So it is necessary to practise using the tools. There are several things t o
of luck is involved, what has happened is that we haven't really known enough about be gained: better learning and retention through the active use of information; a good
our subjects' memories, personalities, etc., to be able to control these variabilities understanding of the contexts in which certain techniques can be appropriately used;
precisely. If we knew all that there was to know, then of course we could choose our first-hand knowledge o f the various problems arising with statistical techniques and
subjects with exact precision, and would not be left with quite such a hard task of data analysis; and through the computational steps carried out, an appreciation of the
deciding whether o r not our new technique had altered events. principles which underlie the techniqucs, Asa final bonus, you begin tosee that even
So, one of the main uses of statistics in biological and social sciences is to dccide you can do statistics! For these reasons, many exercises are included in this text.
whether a particular treatment (e.g. using a new memory aid; seeing whether a
certain mineral affects plant growth; trying out a new drugz;-lookingfor links between But I'm hopeless at maths!
housing conditions and delinquency) causes one group under study to obtain scorcs
which are reall)! different from a comparable group or groups. The st;~tistical I f you are a typical social science student, you probably dislike maths and feel that it
techniques used for this are called inferential. because on the basis of the scores is one of your weakest subjects. You may also feel anxious o r inferior bec;ruse of this
which we obtain and analyse, we make inferences (or inspired guesses!) about what very fact. Let's look at thcse sources of worry - and I hope that I can offer some
has been happening to the groups of subjects o r materials which \VC ;Ire studying. reassurance to thosc of you \vho are approachingstatistics wit!] fear and foreboding!

Statistics in practice
Using statistics is rather like using a tool I)os. Ccrtain jobs have to be done, anti in
order to do them, you must select from the tool box irnplemcnts which :ire
;~l)propriate.If your dentist kept a handym;rn's drill amongst his or her instrumcnts.
no doubt you would hope that you never needed a lilling! Equally, yotr would I)c a
littie surprised to see a joiner attempting to cut a plank with a scalpel. or a decorator
putting plaster on with a ruler. Instead. dccor:~tors.joiners, dentists, and everyone
Why do we need statistics?

First, statistics isnot maths. True, it isa branch of rnathcmatics, but it only involvcs The arithmetic you need for statistics is hardly mind-boggling. Basically you will
tllc simplest of arithmetical operations. I cxpcct that you will liavc tllc usc of a need to add. subtract, multiply, use brackets, understand what squarir~gmeans, and
calculator, and so thcse days you can study statistics and cscapc doing cvcn basic know what a square root is. Thcse operations are usually covered in thc first fcur
aritlinictic. Both maths and statistics rely hcavily upon the usc of symbols, and this is pages of introductory arithmctic books. Of course calculators can carry out all thesc
possibly responsible for the confusion - and also. for tlic dislikc ;~nddrcad which tlic opcrations for you, but there are two things which thcy can't cope with. Calculators
subjects seem to gclieratc. can't think on your behalf, ncither can thcy count. Statistics involves both thinki~ig
WC all usc symhols a great deal. You arc now reading symbols - i.e. the lcttcrs of and counting now and again, I'm afraid!
thc alphabct which have been strung togethcr to makc the wordswrittcn on this p;~gc.
However, arc you cxpericncing difficulty and distastc ovcr thc act of rcading t h e ~ n ? The maths language
Of course not. You have been reading for long enough, and freque~itlyen.ough, to do
i t 'automatically'. No doubt a seven ycar old cliild would find thcsc pagcs a littlc No doubt you have at some timc pickcd up a book printed in a forcign language,
heavy going. You could imagine the youngster struggling to read and pronouncc the noted fairly rapidly that it was not written in a language with which you were familiar,
words 'arithmetic" or 'enough', and perhaps wondering what 'symbol' means. Thcsc and replaced it on the shelf before looking for a book which you could understand.
difficulties are quite reasonable, for seven ycar old children do not normally have an When you looked at the first book, did it make you entertain serious doubts about
adult vocabulary at their command, and there are many abstract concepts which arc your intellectual ability?
completely beyond their experience. Well, you are the cquivalcnt of ascven ycar old What if you saw
child as far as mathematical symbols go!
You may be quite happy about the symbols
+ and - and =,
perhaps scratch your head over
+
< and and #, These symbols don't make you feel inadequate, or worried about your intelligence
and definitely start to stammer when you see either, d o they? You recognise immediately that you don't understand what the
heiroglyphics stand for (unless you have recently been doing Arabic at night
X and X.
classes!); you don't feel threatened by them.
Yet you are already familiar with the operations which all thesesymbols refer to, and Now let's take another language:
may in fact use the concepts involved quite frequently, if only you knew it!
Other concepts, such as thosc expresscd by the symbols ( = di J(td2 - (W2/n
2,
ir and n(n - l)
are a little more specialist, and it is unlikely that you will have nccded to usc thcm in I expect that feelings of anxiety and inadequacy will promptly be aroused in most of
your evcryday (non-statistical) lifc. you! Is this reasonable, d o you think? Why are you somehow expecting that you
Remember though: should be able to make sense of these symbols - and maybe also thinking that you
SYMBOI-S ARE NOT IMPOSSIBLE'I-o UNDERSTAND! 'never will' manage thcm. when you face the Arabian squiggles quitc unrufflcd?
What you must undcrstand is that it takcs thought and patience and timc and practice What has happened to make you feel like this? Possibly a sequence of cvcnts. not
.- . . and yet morc practice, to become familiar and at ease with symbols. It is quitc uncommon, which took place in the murky past of your school days. Let'slook at the
possible to acquire a working knowledge of statistics without knowing much about maths language in a little morc detail.
the symbols which could bc used to dcscribc the various arithmetical opcrations
involved. In thc operational schedules included in this book I will tell you how to
carry out the various statistical proccdures in words, and through wcrkcd examples \ know!
.slrr)c\. you what to do. Idcally, you will get thc background knowledge from the text.
In thc schcdulcs I have also includcd the symbols for the various tcchniqucs or
formulae. and for two reasons. First,so that you gain familiarity -even if only a vague
familiarity - with thcm, and secondly because there may come a day when you
actually find it cc~sicrto work from statistic;rl steps assu~nmariscdin :I single formula.
rather than from a verbal description. which may involvc many small stcps. At tlic
rnomcnt you may wcll fccl that yorr will never achieve such cliz~y lieiglits of
compctcncc. but all I can say is that i t has hccn known for so-callcd 'i~iriunic~.atc'
pcoplc to come t(1prefcr symhols to words!
X
\j I I do
~ \ye nccd statistics? Why do we 11rc.dsti~ticlic\:'

~\l:~ths
symbols are just like tl~oseusccl i r any
~ langu;rge. They stand for something he sat at his slloenluking. Iw'd count liis .;tilchcs by fives, illid tIic11p111 ;I price. 0 1 1 hi\
zl.c. - in this case, operations with numbers - and unless you ul.e familiar with what stitclics, say h;llfa farthing, il rid thcri sccliow 111uchmoney Iic.coultlgcl in 211IIOUI'. . . .'
t h ~ ystand for, of course you can't translate them. The snag is that fluency in another
language takes time and effort and continuous practice - yet this is all that is necdcd 1 Nowadays, we are so accustomed to writing things down, unlike Bartle M:~sscy's
pupils, that we forget we can use our brains when our hands arc occupied. Wi~slling
~nastermathematical notation. There ;Ire no mystical skills or insights available
up, going to work on tlie bus, waiting in qucues, are thc rnodcrn cquivalcnt o f
only to a few lucky geniuscs - and denied to you. Maths is like any other language;
sl~oemaking.His pupils were learning accounts - a more appropriate examplc for
gii,en work and practice, anyone can becomc reasonably fluent. Unfortunately,
statistics studcnts might be . . .
nlnriy maths teachers fail to appreciate that tl~eyare using a f o r c i g ~language.
~ They
rattle on at a fair to moderate speed, Icaving tlie average pupil floundcring, simply A statistics clirss comprises 30 male and 21 k~nalefools. Twenty 11lalcsand I0 Ic~iiales
because the learner needs more time to interpret the symbols than the teacher (who pass the end of term exam; does this constiture evidence rh:it 1n:ile li)ols irrc Oettcr a t
has been speaking the language for years- if not decades) realises. The morc time the statislics tlian female fools?
pupil needs for translation, the further he or shc gets behind; the further behind, the Another good way of learningsomething- and also of finding out that you do not
more rlew information is not coped with, and the more exrra tirnc is ncedcd for understand quite as much as you imagined - is to explain the material to sonicone
translation and thought. This extra tirnc is not forthcoming, usually. I am sure that else. As you might imagine though, not all members of your family or acquaintances
you get the picture. The poor pupils slowly sink into a mire of incomprehension, are simply longing to learn aboutstatistics, but never had the courage to ask! I t might
frustration, fear, and finally, hatred of maths. The circle is complete when a person need considerable diplomacy to persuade someone you know to makc your studies a
actively avoids contact with the subject, and we have another self-confessed failure more worthwhile experience. Whilst it is more realistic to select a partner from your
at maths on our hands. It is very sad that such 'failures' tend to blame themselves, study group, and perhaps take i t in turns to explain things to each other, cven this
rather than perceive that they are rc:~sonablycompetent intelligent people, who have course of action is not completely free of hazards. Neglect of other subjects, choice
simply been exposed to a disastrous teaching programme. What lessons can be of partner, location of study sessions and the raising of rivalry feelings or anxiety all
learned from this analysis of an unfortunately common situation then? raise interesting possibilities. Consequently. much as I believe you should follow my
The first one is: don't blame yourself for thc nasty experiences you have had with advice, I disclaim all responsibility for the domestic and social consequences of your
maths in the past, but try to forget them and make this a fresh start. In other words, activities if you d o follo~vit!
stop thinking and worrying over the fact that you are 'hopeless' at maths. With
careful effort, you too can pass statistics exams!
Secondly, for success, you must regard maths as a language, and be prepared for The logic of maths
continuous practice. Would you go to a French class once a week, fail to do the set But learning statistics is not exactly similar to learning a language. Thcre i h one
homework, fail to speak in the language o r listen to it between sessions, and expect important difference which needs attention right from the start. This is that statistics
to make progress? I doubt it. You know as well as I do that halfway through the week and maths must be Icarned in a logical sequence. If you wcre ill for a language Icsson.
you would probably have forgotten nearly all the things you learned in the lesson, and and missed twenty new vocabulary words, no doubt you might expcricnce a little
during the first part of the next class you would be struggling to get back into theswing difficulty if you subsequently encountered one of the missed words, or needed to use
- - .. . .L.
of things. The same is true of maths. If,you~don't practisepretty regularly, you'll one. However, that small gap could easily be rectified. With mathematical subjects
rapidly forget what it's all about, and so bill always need extra time for thought and things are a littlc different though. Bccnuse they build up in a logical manner. it is thc
tr:rnslation. So d o try to work at statistics. the branch of maths central to this book, case that later learning is usually highly, or totally, dependent upon a good
frequently. Plenty of exercises are included to help you in this respect. Don't ignore undcrstanding of carlicr work. You can't miss out littlc bits, and cxpcct to go r:tiling
thcm. or just glance at them \vithout making ally effort to work th~.oughthcm. They on without thcm - or cvcn cxpcct to pop them i ~ c;~silyr ; ~ some
t f u t ~ ~tli~tc.
r c This i >
not only give you a chance to think ;rbout and apply ncw techniques, but they also another reason why many schoolchildren soon conic u~istuck;rt maths. Aftcr pcriod
m;rkc you more fluent in your new language. of absence from the classroom --and it may not ncccssarily he prolonged, as in illncs\.
)'oil \vould also do \vcll to folio\\, thc advice given ovcr a century ago by thc old but perhaps only hc tlic fleeting typc. known as 'day-drc~rming'- the pupil should
.;rhool master Bartlc Masscy. in Georgc Eliot's novel Arlutt~Betic. I-lc urgcs his rcccivc tuition which will make good the missing knowledge. WC all know that in the
pupils of accounts to make up and war-k through examples o f their olvn devising typical school class today such attention to an individu:~l'snccds is quitc out of Ihc
ivhil\t their hands are occupied hut their 11ii11ds :Ire frce. :-
question - and the penalty is the enormous numl~ero f pcoplc who cnd up failing. ancl
"Thc.rc'\ nothing you c:in'l t u r n into a . ; u ~ i i .f11r tlicrc's riotliing hut \vIi:ir'\ got numllcr ill hating. maths.
I!-cicr~:I fool. You m a y .;:I! lo!.o~~r\clvc\. '.l'rii 11nc Iwl, ;lndJackX\ anorher; i f m y lool'.; Do takc care t11c11.whilst you arc \vorkirrg from this hook or rccciving instructioti
11e:1diieighcd four pourid. ;incl J:lzL'\ tlirc~' pountl rhrcc ounce\ ~ i n t tl~rcc-clu;~rt~.r\.
l hcnv in statistics. to niakc sure that you grasp ;ill relcv;rnt infor~nirtionat a partic~rlar1cvc.l
man! pcn~~y-weight\ Iicavicr \voultl ni! I~eatlI,c rI1:iri Jack'\?" A m;in III;II lliid got I l i h hcfore you move on to Inore adv;~nccdn~atcrial.I f you don't understand sorrctliing.
I~~.:irt
in Iciirning figure<\vo111cirii:rkc \ L I I I ~ \f o r lii~~i\cll.
:11i(1work .~.rn111 hi\ IIL.ACI:~~IICII don't just think that you might begin to. if y o u just listen more c:rrcfitlly th:rn usu:~lt o
II
\\ I I do
~ u e l ~ e e dslatistics?

2 measures of cent

the next bit. Invarial>ly. the next bit will seem even worse, ancl s o on. Il'you find that
you have got problems o f understanding, then try t o discovcr at which particular
point you have failed t o grasp a point, return t o that stage, and proceed from there. I have ;ilrcady said that o n e of the main purposes of statistics is to dcscribe sets o f
working in small steps. a n d being absolutely s u r c that you (10 untlerst;~ndeverything. numl>ers Ixiefly, yet ;iccurately. T h e human brain is limited in its capacity t o deal
And remember thc saying 'If in d o u b t , ask!' rapidly with incoming information, a n d when faced with large groups of numbers,
Having said that nlaths a n d statistics a r e subjects which build up in a n orderly lr~ostpeople cannot normally hold them all in mind at oncc. Also, it takes a n
sequence, I would now like t o m a k e a slight qualification, a n d add that in statistics appreciable length of time t o read through big sets of numbers. S o WC find it
certain basic principles underlie almost all tcchniques. Once thcsc arc established, convenient t o dcscribe such groups of numbers by means of other, but fewer.
thc order in which intlividual topics a r e covcrcd is not too important. At the start o f numbers. You will all be familiar with the idea of a n 'average'- a number which is
each chapter I shall indicate whether any o f t h e earlier material is ncccssary for a used t o summarise information from a larger set of numbers - but you may not b e
good understanding of the new topic, o r whether the chapter can bc read ;IS ;tn aware that there a r e scvcral types of average. W e shall now look a t three of t h e m .
independent unit.
A s far a s symbols go, get used t o looking a t them, and try t o use them whenever The first kind of average - the mean
you can. This way you will gradually lose your f c a r o f mathematical-looking notation. It is quite likcly that if I gave you s o m e numbers and asked you t o find the average,
Indeed, I h o p e that y o ~ rmay even come t o enjoy the mathcniatical aspectsof material you would a d d together all the numbers in the set a n d then divide thc total by the
included in this book. I3c careful to whom you admit this though, because enjoying number of nurnbcrs you added up. This operation gives the kind of average which is
statistics is rathcr likc cating nettles - i t gets you t h c reputation of being rathcr odd! called the mean. Although it is vcry easy t o understand how we obtain a m e a n , the
Exercise procedure has becn broken down into the basicsteps o n the first operation schedule,
page 153, so that you can sec the way in which operation schedules work. Just t o
2 Dccidc whether tlcscl-iolivc or infcrcntial statistics are most appropriate for thc following
introduce a n easy symbol here, t h e number of scores in a set is normally referred t o
situations:
. . (a) Aman dccidcsto.l~rry a car from a fricnd; but discovers that'iris over-priced. In ordcr to a s N. Now that wasn't hard, was it?
pcrsuadc his wifc tl1;11 Ilrc purchasc isn't complctc folly, he invcstigalcs 11lcpriccs ofsinlil;~,. I<xcrcises
motlcls in cxpcnsivc Ioc;ll garafcs. 1 le obtains the priccs of ten r;rrs which Ilc t'ccls will help
his cabc. How does lie present this 'cvidcricc' to Iiis wifc? .- I 1-ollow thc stcps givcn in operation schcdulc 1 to obtain thc means of thc following sets of
(h) Somc children :~[c~ ~ ~ ~ d e r;~bout i d e du ~ I ~ ~ c of
I I two routes rcachcs thc I)c;lch ~ ~ l o(111ickIy.
re nurnbcrs:
Sornctimcs i t scccl~\~~rlickcr 10 so one way. nt others by tllc second routc. S~;ltistic\11ccdcd
(a) 25,28,31,24,30
for a decision'.) (b) 305.269,427.499. 385,377,280,316,392.4.54
(c) 011\vcckd;~y\I dit.1. ;III(I i l t the \\.cckcnds over-cal. My tot;tl \vciglrt c11;111gc c:~ehtvcck is (C) 5.6.4.5.7.2.0.6.3.0. 2.1, 1.9. 3.8
zero. hut this is tluc t t ) tivc day>' loss vcrsus two days' gain. 14owca1ithe tvciglrt c11;111gcs bc ( d ) 12. 15.32.21.29. 19.40.0
\umm;~riscd5 0 I ~ ; I IIllis ~);lllcrncan l)c discerned'? ( c ) 120.5.56. 131.99.67. 14. 107
( d ) I trcat half my loln;llo pl;lnls with Wonder Whizzo and lralf.&ith Gro-ktvik. When t11c.y (1) 40, 40.40.80. 80. 80
nl,iturc I count tlrc I I I I I I I ~ > of
C ~ lom;llocs 1 ol)t;rin from cnch. Wkrt kind ol'\~;~tistics \\.ill 1,s 1 G o through ;ill thc mcasurcs you h;rvc ju\t obtained. and consider how II,C//the mc;lns
nccdcd to scc w l ~ c t l ~rllcre
c ~ is any difference bctwccn the I\VO t?catnlcnts.! represent the numbers from which they wcrc obtained.
( c ) Roland Butter t)\vl~s:I r:lthcr krrgc book collcctic~n.i111dhe dccidcs to irrsurc i t . For this
he nccds to know tllc 1c1t;ll value o f ;rll the books. I t ufould be firi~oo~cdiousI ~ I tire COS! Trouble looming up!
c,fc;~chonc. and build 111) 1 1 1 t~~ t : l ltll;it \\.;I).. .so irl\tcild Ilc \vorks;nrl tllc v;~lucof5ii .\;lr~rplc' Often in life, something which ; ~ first t sight looks straightforward and easy turns o u t
to Il;ivc some hidden c o m p l i c ; ~ t i o n Tlic
~ . mean is n o exception. Consider again thc
13
hleasures of cetltrsl tendrue>

answer to exercise l ( ( ) for which you have just (I hol~c!)obtaincc1 ;I mcan of60. .fhe 4 Would the mcan bc suitable for usc uilh r h c . 5 ~set\ of ~u~mbcrs?
I,;~rticul;lrnumbers in thc set W C I T 30. 40. 30, SO, X 0 and 80. Supposc th;tt thcse (:l) 10, 10. 10, 10. 10, l0.?0.20
numbers referred to the height in inches or a long-lost tribe of Scots mountain (b) 10. 10, 10, 10, 10, 10, 1 1
dwellers. An enquirer, perhaps a kilt manufact~trcr,might rcceivc cluite a nasty (c) A hundred values of I0 and one valuc of S0
surprise when confronted with the tribe. having previously bccn given the (d) 10. 1 1 , 12, 13. 13, 14. IS. 15. Ih. 17. lS.21
information that their average height was 60 itlchcs. I-lc \ v o ~ l dhave to sell his kilts
elsewhere! The lesson to be learned is that when a mean is givcn for a sct of figures, Another average - the median
i t does not follow that any 01' thc figures in thc sct is of the samc valuc - or even I f you have a set of values, and wish to obtain a figure which reprcscnts thc centr;ll
retnotcly rcscmblcs it. point, thcn 3 sensible way of Joing this might be to arrange the numl)er-s i r l order- of
Someti~nesa mean may be worked out to several dccirnal placcs. Whilst this is fine size and pick the number which falls in the middle as being of typical v:llue. This can
in many cases, occasion:~llyit is done on numbers which can't easily be divided up in be illustrated with an example using apples ofdifferent sizes. the weight of each apple
'real life'. When this happens, quotinga mean may result in some odd interpretations. being'inscribed on its side in grams. The appleswould look like those shown in figure
For example, we are informed by sociologists that the mean nltmbcr of children per I, and when the set has been arranged in order of magnitude, like figure 2. The c~pplc
family in Britain these days is 2.45. I have yet to see 0.45 of a child looking alive and in the middle, i.e. the fourth from either end, weighs 130 g. This value then is the
median for the set of apples illustrated.
That was a nice easy example, where it was simple to find the mid-point of a set
comprising seven items. What would happen if there had been eight apples in the set?
Keeping them in order, but adding an extra one, the row now looks like figure 3.

Another little tendency which often creeps in is for means to have a long string of Figure 7
decimal places. Whilst this is fine if the original numbersare very precisely measured,
and the decimal places thus represent real accuracy, more often than not the original
numbers were whole ones. Particularly in the social sciences even these whole
numbers must often be regarded as, at best, approximations, and so a highly
accurate-looking tnean, with several decimal places, could be somewhat misleading.
This kind of false accuracy is called spurious accuracy - and is best avoided.
However, there are rare occasions when it is wrong to treat decimal places casually
instatisticsl calculations, and I shall indicate them when they arise.

When can the mean be used safely?


On the whole, when numbers in a particular group cluster closely around a central
I
value, the mean is a good way of indicating the 'typical' score, i.e. it is truly Figure 2
representative of the numbers. If, however, the numbers arc very widely spread, are
very unevenly distributed. or cluster round extrcme valucs (as in the giant and pygmy
example), then the mean can bc positively misleading, and other mcasurcs olccntral
tendency should be used instead. Two more of thehe - ancl they too are callcd
'avcr;~ges'- are described nest.
Exercises I
3 Would the mean be a suitable dcscriprive sti~tisricto use-with thc sets of r~ulnbcrsgivcn in
exercise l (a)-(c)? Figure 3
A/
Measures ol' central tendcllcy

Counting in from the ends, to find our centrc, wc find that as ciglit divides cvcnly into over-represented c ~ l d hut . thc values at the othcr end would remain unchanged. In
two sets of four, we don't have a single nunihcr at thc centrc. hut two ~ ~ u ~ n lor~ r s - c.;~scslike this. ancl \vo~-kingl'ro~ntht. original scores, thc median \vould hc :I suit;~l)lc
apples of weights 130 g and 140 g. We don't abandon finding the nicdi;~nin this sort clcscriptivc statistic to usc as a mcasurc of ccntral tendency. for thc dctailsof thc top
of casc, but now take thc (WO central numbers, and find the point which is halfway and bottom figures tlicmselves will not influence its value. It will only be affected by
between them. In this example then, the median will bc 135 g. thc actual ~ l u r ~ l h ~of, rrclativcly
s high or low scorcs.
If the two central numbers had bccn 140 g and 180 g, thc mcdian would bc 160 g, Whcn scorcs havc I ~ c c nohtaincd from a number of individuals or specimens, thcn
again the mid-point between the two figures. You may havc noticed that you can find the n~cdiunwill relate to cithcr one source. or two i f thc sct is cvcn-nu~nbcrcd.This
this mid-point by adding the two numbers in question and then dividing by two; that mcans that. if you wish, you can examine thc source of tlic median score further and
is, by finding their mean! look at othcr characte~-isticsshown by this spccimcn or individual. As \VC have
already seen, when the mean is obtained its value may bear little resemblance to the
Exercises actual scoresand so it cannot always be used to select a 'represcntativc' of the group.
5 Using the steps given in operation schedule 2, find the mcdian for thc following scts of Finally, if a set of numbers has a lop-sided pattern - if, for example, most of the
numbers: scores are small, several medium sized, but only one or two high - then the median
(a) 2 5 , 2 8 , 3 1 , 2 4 , 3 0 may again be more appropriate than thc mean, as its value will be closc to the
(b) 305,269,427,499,385,377,280,316,392,454 majorily of numbers in thc sct.
(C) 305,269,427,499,385,377,280,316,392,454,499
(d) 5.6,4.5,7.2,0.6,3.0,2.1,1.9.3.8
(e) 1 2 , 1 5 , 3 2 , 2 1 , 2 9 , 1 9 , 4 0 , 0
(f) 4 0 , 4 0 , 4 0 , 8 0 , 8 0 , 8 0
6 Compare the medians from groups (a), (b), (d), (e) and (f) above with the means which you
obtained from the same sets in exercise 1. Which measure of central tendency do you
consider to be the most representative of each?

The median - pros and cons The mode - a fashionable figure


It's all very well working out what the median of a set of numbers is when there are A third type of average is called the mode. 'Mode' means. among other things,
fewer than a dozen or so numbers in the group. Imagine how time-consuming it 'fashionable'; this word describes very well just what the statistical mode is. I t is
would be, though, to place a hundred o r more numbers in ascending order of simply the valuc in any set of scores\vhich occurs most often - o r is the most 'popular'.
magnitude. Another disadvantage of the median as a descriptive statistic is that if one Take the following set of numbers:
of the numbers near the middle of the distribution moves even slightly, then the
5,6,7,8,8,8,9,10,10,12
mcdian would alter, unlike the mean, which is relatively unaffected by a change in
one of the central numbers. As the number 8 occurs most often (three times), 8 is the value of the mode. It is as
In the median's favour though, it must be said that if one of the exlrerne values simple as that - and certainly doesn't justify an operation schedule!
changes (and often in experiments and measuring processes it is the cxtreme values What would the mode have been if onc of the 8s vanished? Thcn we would be left
which are least reliable), then the median remains unaltered. When extreme values with two8sand two 10s, and only one ofcach rcmainingscorc valuc. In thiscase there
alter, and particularly if they disappear, the mean can change tremendously. would be two modes. of values 8 and 10. O r on the other hand, take the list 5 , 6 , 7 , 8 ,
Sometimes (although one is reluctant to admit this) simple writing or typing errors 9, 10, in \\~hiclithcrc is no single number which occurs morc than oncc. What now?
can occur, such that a small value becomes very large, or vice versa. When the In this group thcrc is no mode. :IS all the numbcrsappear with thc same frequency. I f
numbers are analysed later, it might be quite difficult to find out whether the extreme a sct of numbers has t\vo modcs, then it is called Dbr~orlal-the prefix Oi meaning
value is a genuine score or an error. Single scorcs which are quite clearly 'deviant', two, as in 'bicycle'. An c s a ~ n p l cof a bimodal distribution \vas givcn in thc heights of
when compared with the others. are known as outliers. When they occur, and for thc pygmy and giant Scots, where thc numbers invol\,cd were 40, -10. 40, 80, 80.80.
whatcvcr reasons, the median becomesan appropriate choice of descriptive statistic. Renicmhcr that ::Ithough tllc mcan \\'as 60. this figure \\,as not a good indication of
Somctimcs, in collecting information or scorcs, it happens that thc procedure was the height of any single rncmbcr of thc tribe. An cncluircl- \vould be mu cl^ better ol'f,
such that you end up with scores all having rather high o r rather low values. For knowing that thcrc arc two modcs, of values 40 and 80.
instance a task in a psychology study might have been too hard for most people, a
simple tcst too easy, or a questionnaire may contain items which didn't givc enough What's the matter with the mode?
weighting to one end of the continuum. You know that if you readjusted thc task or As you might havc anticipated, such a plc;lsant andsinil>lcmcasurc as the mode must
scale, many scores would subsequently go beyond thc limits of the first scale at the havc somc drawh;~cks.Pcrliaps the saddcst thing about the modc is ;hat it is hardly
16 17
Measures of centr;~ltendcnc?

c.\.c.rused! One of the reasons for this is that it is :I very unst:~blcfigure, andcall swing l ' l ~ evertical line (or axis) labclled 'Ireque~lcy'refers to the number of instance\ o f
\\,ildly through the \\thole breadth o f a sct ofnumhcrs at the drop of ;I hat. Take lhcsc c;~chindivid~~al ~ ~ u r n h c\vl~icll
r occurs i l l tllc sct. Y o i ~can easily see that tllcrc arc
numbers: three number Ss, and that thisis the niost commonlyoccurringnumber, forwithscore
1, 1 , 6 , 7 , 8 , 10. 8, the vertical line is the highest. The mode for this set of numbers, then, is 8. If you
The modc herc is l , which is not a very representative figure of the group as :I wllole. obtained the mean of the group, you would find that i t was 10, a figure which is rather
However, change the score o f 1 into ;~nothcrvalue of 10, and the mode shifts right to on the high side to be regarded as typical. Remember that the mean is very sensitive
the other end of the scale. Thus i t can he seen that merely asingle numberchange can to extreme scores; it is the three high scores of 13,15 and 18 which are responsible for
:~ltcrthe modc dramatically. This is in great contrast to the mean and medi;~n.where giving i t a relatively high value. The medi:~n for these nun1bc1.s is the same as the
number changes can take place and leave them virtually unaffected. niodc, S (the sixth number along the list from either end), ancl for these particular
I f a distribution of numbers has more th:ln two modes - and with large scts of numbers. we would feel happier about summarising them by means of the median
numbers it might be possible to have many modes- then the modal values themselves and mode, rather than the mean.
could need summarising, and so the usefulness of the mode as a descriptive statistic
starts to dwindle. Averages as measures of central tendency
The three averages which have just been described are all descriptive statistics, and
What use is the mode? arc used in an attempt to supply a quick 'picture' of a set of numbers by indicating
Despite the criticisms just made, the mode can be a useful statistic. One of its main roughly where the middle of the set lies. They differ in that each is obtained by using
assets is that i t can be used to indicate a '11orrn;ll'or 'usual' figure. It isexactly opposite a slightly different definition of the word 'middle'- although with the modc, there is
to the mean in this respect, as the modal value ~r~rtst be a commonly occurring figure. merely an assumption that the most typical value will be in the middle, which is not
Often the value of the mean is a number with decimal points (and in cases where none always the case. However, these three averages are all known as measures of centrul
of the original scores had decimal points), and sometimes it may not even remotely tendency.
resemble any of the values in the set - as with the giants and pygmies. Often it is the The various advantages and disadvantages associated with each kind of average
mode which is used as an average in expressions such as 'the average person', or 'the have been noted, and in deciding which figure is most appropriate, you must now use
average holiday length'. The figure quoted is the usual, or typical, value - and quite this information. Ask yourself: 'Does the figure selected give a fair indication ofwhat
often will not be the mean. We might alsospeak ofthenormalvalueofsuch-and-such, the scores are like?', not forgetting that the descriptive statistic needn't be an
meaning the mode. It is important to realise that 'normal', in a st;~tisticalcontext, extremely precise number (for instance a figure with five decimal places), but rather
means 'usual', and not the opposite of 'abnormal', in its familiar sense of 'pecnliar' or one which is not misleading in some respect. Finally, remember that other peoples'
'odd'. motives when describing sets of numbers may not be quite as pure as one would wish.
The mode is also a useful descriptive statistic when the numbers in a distribution and so occasionally an 'average' figure may be quoted which, whilst it undoubtedly is
are not spread evenly around a central value. Such a lop-sided pattern is c:~llcd a an average, is very misleading. Consideration of figure 5 (page 20) should make this
sken~eddistribution, and you may recall that it was described in connection with the clear. The firm for which the figures are shown could quite legitimately claim that its
median, which is often also quoted in summaries of asymmetrically spread scores. average length of holiday is2 months. It is this kind of deliberate attempt to mislead,
The numbers by usingstatistical terms in such a way that those unfamiliar with the subject are likely
to come to the wrong conclusion, that gives statistics a bad reputation. W e have all
6.7.7,8,8,8,10,10,13,15,18
heard the saying 'lies, damned lies, and statistics'; quoting the mean when the mode
make a skewcd distribution, and they ;ire shown diagrammatically in figure 4. would be a more appropriate 'average' is a perfect illustration of the type of thing
\\thicll gives statistics a had name!
Exercises
7 Sr;~tcthe mean. median and mode for the follouring scts of scores:
(a) 9, 10. 1 1 . 12. 13. 13, 14, 15, 16, 18.20
(h) S. 10. 11. 12. 13, 14, 15.16. 18.20
(c) 7,S.IO. l 0 . 1 1 . 1 3 . l 7 , 1 S . 1 9 . 1 9 . 2 1 . 2 ~
(d) 10. 12, 15.20,22,23,23.24.25,25,25,2X. 30
S G o through thc mcasurcs of central tcndcncy you have just obtained, and decide which
Scores avcragc or nvcrsgcs convcy the hest picture of each. Give reasons.
') Make a tahlc summarising thc rclativc merits and problems associated with the thrcc
Figure 4. A skewed distribution ;tvcr;lgcs dcwrihcd in this chapter.
hlc;lsurcs of'central tcrldency
i data. Diagrams likc figurc 4 in this chaptcr arc called 'frequency diagranls' ol
A note on distributions
'frequency distributi;)ns'. and you \+*ill scc many more throughout !his book. l'hc
Up to now, groups of numbcrs used in examples have usually been rcfcrred to as vertical ;ixis is the one which tells you ho\\. many numbers thcrc arc having ;I
'groups' o r 'sets' of numbers. Occasionally the word 'distribution' has been uscd for particular value, or h o \ ~Jrccluently p;lrticular scores occur; it is usually lahellcd
a set of numbers. It means virtually thc samc as 'group', except that the word carrics '11-cqucncy'. o r just 1.
the vague implication that thc numbers fall into a particular kind o f pattcrn. Distrit~utionscar1 comprisc numbers covering a very wide range, o r at the opposilc
Distributions, o r groups o f numbers, are often shown by mcans of diagrams, as cxtrcrnc, can be matlc u p of numbers ~ r h i c hare closely huddled together. 111a fairly
these afford thc clearest way o f conveying information about large sets oI'nlrmeric;~l syrnmctr-ical distribution. likc the one shown in figure 6 , thc incan. nicilii~nitrid moclc

Boss ( l ) 10 months'
holiday

Scores
Cleaning lady ( l ) 7 months'
holiday Figure 6. The normal distribution

all havc the same value - o r are extremely similar. Whilst thc majority of scorcs arc
t
clustered around these measures, there are still quite a few which occur both below
and abovc the central arca. This type of distribution is called the norrr~uldistributio~~.
Accountant ( l ) 3 months' and it will b e examined in more dctail in chapter 4.
holiday In some distrihutionsthc numbcrs may be spread fairly cvcnly ahove and helu\v tllc
mid-point, but in othcrs thc sprcad may be uneven. Figure 4 showcd this kind of
distribution. There are two sorts; o n e in which the scores fall mainly below the mean
(as in figure 5. showing thc firm's holiday allo\vnnce). and those in which the srorcs
Boilcrnman ( l ) 2 months' fall mainly abovc the mcan. These arc called ~)o.sirivc,lj and rregafivelj ske\c*cvl.
holiday
respectively, and they arc shown, together with the relative positionsof the averages,
(mean)
in fipurc 7. You can see from the diagram why thc mcan alone is not a good
descriptive statistic to use: with skewed d i s t r i b u ~ i o r ~it sis usual to give the values of
II all thrcc avcrasc\. Notc that the actual value f o r each is obtained from thc scorcs
Sccretaries (5) 1 month's
holid;~!
(median)

Scrfh (0) 0.5 IIIOIII~I'S


lI1lIilJ:l~
(111odc)

I
A posilivc shcw A negative skcw
Figure 5. Annual leave taken by
employees of Cheatems Ltd Figure 7. Skewed distributions

2I
r i v e n along t h e 1torizont;rl axes. It is a l s o interesting to nor2 that in s k e w e d I I The 1;it)lc hclo\v \hon,s tllc exam marks (in broad bands) 0bt;linc.d hy 60 students. I'ul t l l c
distributions t h e m o d e is ; i p p r o x i n ~ a t c l yt\\?icc as far from t h e m e d i a n :I.; t h e m e a n . t1;11;1into thc forin of ;I diagr:anl, and cornlncnt on thc use of t l ~ cI ~ I ~ ;al~cdi;in
I I ~ , and ~ n o d c
in conncctioal \vith the figures.

Hitlt
i1 Table 2
You can easily remember which kind o l s k e w is positive and which negative by the following
strategy. Looking at tlic distribution from Icft to right (the nornl:~l reading direction), you
come first to eirltrr :I line which goes up kiirly sh;trply, o r to n line tr;ivclling rcl;itively Ilori- Marks 0-19 20-30 10-59 60-79 80-99
ztlntally. The distribution which swings upw;~rdsooner is the posifii~rskcw - and the (ne;arly) Numbcrofstudcnts 1 X 22 28 1
vertical line can he associated with the vertical stroke on thc positiw plus (+) sign. Thc earlier
relatively horizontal lilie scen on the other ske\vcd tlislrihution, thc t ~ c ~ t r r skcw,
i ~ ~ e can he
;issociatcd with the negative minus (-) sign.

Y o u have already b e e n given a n e x a m p l e of a b i m o d a l distribution, in t h e giant a n d


pygmy heights. S h o w n diagrammatically, it is characterised by t\vo 'humps', a s in
figure 8.

Scores

Figure 8. A bimodal distribution

T h e r e a r e m a n y o t h e r kinds of distributions, i.e. s c o r e s falling into recognisable


s h a p e s - b u t most of t h e m h a v e r a t h e r long names! F o r t u n a t e l y you will not o f t e n
e n c o u n t e r a n y of t h e m , a n d t h e o n e s which I have just described will b e t h e o n e s
which you will c o m e across most'frequently.

Exercises
I0 For each of [he following sets of descriptivc >tatistics, dra\v ;I rtlugh shape of the
distrihution thcy h;a\*ccome from.

Table 1
p---------
--

I)is~rihutioal hlcan hlcdian hlotlc


l Measures of dispersion

('lcal.ly thcn, thc range can only be used \cn\ihly ;is a dcscrilltivc statistic when all the
3 meosures of dispersion >cores arc fairly well bunched together. Otherwise it gives the impression of an
evenly widely spread group of members, whcn in fact the breadth of rangc may b e
due to only o n e or two extreme scorcs.

The mean deviation


The mcan deviation is ii n u n ~ l ~\s,hich cr indicates how much, 011 riiJerrrge, the scores in
;I distribution differ from :I central point, the mean. 1 did hear you ask 'which
average?', \\,hen I said 'on average', didn't l ? You'll soon find out. Suppose you take
the numbers:
In the last chapter you learned about some of the waysof describingsets of numbers, 8; 9. 10. 11, 12.
or distributions, by givinga rough indication of the middle of the set. Make sure that T h e mean is 10 and the ratlge is 4. T h e numher 8 is 2 points away from the mean, and
you are quite happy about using the three measures of central tendency before you so is the number 12. Numbers 9 and I 1 arc both 1 point away from the mean, and 10,
move on to the material presented in this chapter. the remaining number in the set is the mean. so does not differ at all. Listing these
You will have probably already realised that simply indicating the central point of differences, you get
a distribution certainly does not give a complete picture of itsshape, and on occasions 2+1+0+1+2=6.
may be quite misleading. In order to give a better impression of a distribution's There a r e five numbers in the group, and so you can say that the average (mean)
shape, a second kind of summarising statistic is normally used, a measure of amount they all vary from the mean i s 6 divided by 5 , i.e. 1.2 points. T h e differences,
dispersion, o r spread. A s the names suggest, these measures indicate how widely 2, 1 , 0 , etc., which were obtained are called deviations; i t might be better if the mean
scattered the numbers are. If o n e of these measures is used together with one of the deviation of 1.2, just calculated, was called the 'mean of the deviations from the
averages then the two summary numbers together will give an extremely concise and mean' as this is what it is.
useful description of the particular distribution. As was the case with the measures of Now let's look at another set of numbers, again with a mean of 10:
central tendency, there arc three commonly used measures of dispersion, and we
4 , 8 , 10, 12, 16.
shall consider thcm all in turn, starting with tlic simplest.
First, find the range.
The range You should have obtained 16 - 4 = 12, and this, a much larger number than the 4
which was obtained with the previous set, is a fair indication of the wider spread of
T h e range tells you over how many numbers altogether a distribution is spread. It is hcores. Again. find out how far each numbcr is away from the mean of the set, and
easily obtained by subtracting the smallest score from the largest in the particular thcn add u p the five numbers. You should get a total of 16, which, when divided by 5
distribution of numbers under consideration. For example, if I found that various (the number of scores comprising the group, or N), gives 3.2. This figure is the mean
kinds o f potatoes for sale in several greengrocers' shops were priced at deviation, and you can see that there has been an increase of only 2 from the first set
IOp. 25p. 12p, Rp. 14p, 14p. 24p and 1Sp per II> (mcan deviation 1.2) - not such a big increase as that shown by the range. which
thcn the range for these prices would be 25p - Xp = 17p. ~ncrcascdby 8 points. 14o\vevcr, there was an increase, reflecting the wider spread of
Consider next these two sets of numbers: \cores. but as the mean deviation is based on all the numbers in a distribution i t is a
In. I I . I I . I ? . I ~ , I1~3 ,,1 3 , 1 4 ~ n u c hmore stable statistic tlia~itlie range, which is only based on two of thcm.
'The method for working out the Incan deviation is given in operation schedule 3.
and You will see that if a11 the numbers arc subtracted from the mean. some of them \vill
10. 1 1 . 1 1 . 12. 12. 13. 13, 13.20. 11c ncgiitivc. When you worked out tlie mcnn deviations from the examples just
Work out the values o f the range for each set. iind sec \vlicthcr you can spot the given. you did not havc to deal with negative numbers, because we asked ho\v many
di\advantagc which is iissociated with tlie use of the rangc. 1)oints rlqferet~r cach score wiis from tlic mcan. WC didn't make any distinction
T h e two r;iligc\;irc 3 ( = 13 - 10) and 1 0 ( = 2 0 - 10) re\l>cc~i\~cl!..l'hc prol>lcm \\ith I~ct\\~ccn scorcs \vliich wcrc a l ~ o v cand ones \vhich were below it. When \\,c c;llculatc
tllc rangc i\ the o n e previously encountered \vith tlic niciin. niinicly that cxtrcmc the mcan deviation formitlly, we havc t o be a little more precise. and state how far
v;ilues havc ;I vcry big effect o n the descriptive statktic. In the >ccond cli\trilnttion. cach score is ohot,c or hc*lon~the mcan. by using plus ( + ) and minus ( - ) signs.
only one n u n ~ l i c rhas changed: 13 has I>ecomc 20. tlo\vc\~cr.thi\ \inylc cli;ingc ha> I'ro\,idcd that you usorkcd out thc mean ccirrcctly, and did your additions and
incrca5ed the range I>yh point>. O n thc other h;ind. outlicr> (;~typic;ilcytrcmc v;iluc\) \ulltractions accur;~tely.ritrd rcmcrnl~crctlto put the signs in tlic right \\ay round. you
m;iy c i ~ u > rli<triljution>
c \vhich ovcr;lll look very ~lilfcl.cntto Ii;i\c \imil;ii- r;ingc\. u i l l lincl t l i i i t the t o t ; i l o f the clcvi;ition\ i \ Lcrcl. Thi\rilic~tr~~s
h;~ppcn>.I f i t docsn't, thcn
bleasures of dispersion

there is something wrong with your arithmetic! I t h;~ppcnsbcc;~usethe very riilturc of mean devi;~tion.I t summarises an avcr;lgc clist:incc of all the sc.o~.t-~
from t l ~ crnc3all01
the mean makes i t the central score arithmetically, unlike the mode and median a particular set. hut it is c;~lculatedin a slightly different manner.
which are obtained by counting. Because the mean is an arithmetic centre, so the You may recall that the problem of the signs of the deviations from the mean was
deviations of all the scores on either side will always be equal, and will cancel each dwelt on at some length. If they are taken intoconsideration. tlic Incan dcviatiori will
other out when the signs are taken into consider;ltion. This isn't very helpful. The always be zero. but this was overcome by ignoring them. There isanother solution to
problem is overcome simply by ignoring the signs (or direction of the differences) in the problem. You Inay remember from your basic arithmetic that if you multiply two
adding up, just as we did in the examples given above. The m;~them:~tic;~l syrnl>ol nc>gati~vnumbers together you get a positi~eeresult. The came applies to sclu;~ring;I
which indicates that we f nd the diiierencc~between two scores, clisregn~.dingtheir negative number, for i t is a negative number being multiplied by another n e p t i v c
signs. is two vertical lines. For example, the difference between S and 9 (4) woulcl be number of the same value.
written 15. - 91. Now, suppose you have the set of deviations:
Exercises - 2 , - l , ( ] , + l , +2.
These equal 0 when added together 'correctly', or6when the signs are ignored. What
1 Work our the mean deviations for the followillg disrribu~ions:
(a) 12, 10, 8, 4 , 18, 8 (b) 0, 0, 4 , 5, 20, 20, 22, 19 happens if instead of adding the deviations you square each one'? You will now get
(c) 0,19,21,18,22 (d)2,3,3,3,4,4,4,4,5,5,5,6 +4, + 1 , 0 , + l , + 4
2 Stare whethcr rhe range would be a good me;tsurc ofdispcrsion to use in rhc di~rributions for the squared deviations - and all are positive numbers. Magic! Now add these
given in exercise 1. numbers and find their mean, just as you did with the mean deviation calculations.
3 In the next example rhe mean is 10. Find rllc mode, median, range and mean dcviarion; You should get a total of 10, divided by 5, to obtain the value 2. The figure 10 iscalled
comment upon the aprness of these descriptive statisrics for rhe particular distriburion. the sltrn of squares, or sometimes, the n u n of squared difierences, for obvious
0,1,2,20,1,3,51,20,1,1 reasons, and 2, the mean of the sum of squares, is called the variance. Notice that 2
is not the standard deviation; the operation for obtaining it is not quite complete yet,
although we are nearly there. As we sq~lnredallthe differences, the figure which we
obtained must now be 'unsquared', so to speak, to bring it back into the right kind of
perspective. No doubt you know hot\! to 'unsquare' a number - you find its square
root. In this case, the squ;lre root of 7 is 1.4142. This then is the standard deviation.
You can see that it is slightly higher than the figure of 1.2 which was the value of the
mean deviation for the same set of numbers.
The process of squaring and unsquaring numbers during this operation can be
likened to looking at something under a microscope. The lens makes the object in
view appear very much larger. An object seems just as real when enlarged, and it is
Final comments on the mean deviation easy to forget that something is being magnified - until the lens is removed, and it
rapidly shrinks back to the right proportions. Squaring the differences makes the
It is easy to remember what the mean d e v i i ~ t i oactually
~~ is, and how to work i t out, if total much larger, and in taking the square root, we are putting the number back into
you privately call it 'the mean of the deviation5 from the mean'. its correct 'perspective'. So be careful to remember to do this. and don't make the
Sadly, the way in which we add the differences together, ignoring the signs so that simple mistake of giving the variance instead of the standard deviation. Another
the total does not equal zero, is not pleasing to mathem;~ticians. Partly for this common error is to take the square root too early in the proceedings, obtaining i t
reason. but mainly because the mean deviation is u very simple figure without any from the sum of squares, rather than the variance. This mistake gives you a very
powerful mathematical properties, it is rarely used as a measure of dispersion. The convincing-looking ans\ver - unfortunately, not the correct one!
preferred measure, the standard do~iatiotl, is used far more oftcn, usually in
conjunction with the mean and range. You will learn why the standi~rddcvi;~tionis a
'better'statistic when wecome to exanline the nornlal distribution and its properties. l

Don't forget, for it's less obvious \\pith the mean deviation than with the range. that Exercises
the larger the mean deviation is. the nlore spread out the scores in the dihtribution 4 Ol?lail~lllc sum!. 01. !.qu;!rcs. v;~ri;~ncc!.
and !.rand:irrl dc.vi;~lion!.for rl~eI'oll~j\r,in_c\cl.; of
;Ire. number\:
(;I) l ? . 10. 8. 4. 18. 8 (17) 0. 0. 4. S. 2 0 . 22. l9
The standard deviation c l . l .l . l ? (cl) 2. 3 . 3 . 3 . -I. -1.4.4.5. S, 5 , h
5 Iloti do 111c.;rand;~rrldc.via~ion.;compar'with [he mean dcvi;~ric~n\~vhicl~ !ou ohraincd from
In principle. the standard devii~tion(oftcn shortcncd to SD) is ~ ~ - 1silnililr
.y to the lhc \ ; I I I I C figtlrC\i l l c.\crci\c I?
hleasl~resa l dispersion

Slight - but not serious - complications with the standard ;~ll_;rin.


\\.it11 tlic cstra l~urnI)crand new mcan. you \\,ill be rcwi~rdedfor your cfforts hy
deviation tinding tI1;1t t l ~ ccalcul:~tionsarc ~io\\.much morc lidclly. Subtracting 9.7 from cvcry
score takcs longcr than subtracting a mcan which is a wholc number. Suppose that
T h e first complication conccrns thc usc of N in obtaining thc variancc from thc sum your s:~ml,lc conipriscd a hundred scorcs. and thcir lncan was thc vcry inconvcnicnt
o f squarcs. T o refresh your memory. N rcfcrs to the number ofscorcs in the particular ligurc of 7.62. I t would be cxtrcmcly tcdious and timc-consuming to obtain all thc
set you arc working with. If you look at a sclcction of statistics hooks, you will find ~lcvi;~tions, cvcn with a calculator. and i t is not appropriate to round off dccim:tl
;\'
that many of them tcll you to divide the sum of squarcs by - 1 (i.e. o n e i t c n ~lcss ligurcs in tllc ttlitlrll(~of calculations. So. a simpler mcthod of obr:~iningthe stand:~r-d
than thc numbcr of scorcs making u p the sct), instcad of ;.l\ Thus in thc previous dcviation has bccn tlcviscd. which produces exactly the san:c figure. but going by a
cxamplc you would divide I0 by 4, rather than 5 . Before reading any furthcr, work dilfcrent rnatl~crnaticalroutc. It is given on opcration sclicdulc 5. Briefly, you square
out for yourself what effect this change is going to have o n the fil~alfigure. the individu;~lscores, sum the rcsuits. and then subtract from this (usually immense)
T h e cffcct, which I a m surc you did work out, is that dividing by a smaller numbcr total. thc ;Iveragc of the squarcd sum of all the observations. This figure is called the
will give a larger variance (of 2.5 in the example), and hcncc a largcr standard correc~iotr/uc~or, and whcn i t has been subtracted, the ren~ainingamount is rcfcrrcd
clc\riation. It will bc 1.58 instcad of I .41. T h c procedurc of using N - 1 is not always to as thc corrected slltn of squares. If you havc d o n e your sums carefully, this figurc
used, but it is very common in calculations involving data from experiments. This is will I,e identical to thc sum of squarcs which vou would havc obtaincd using thc first
hccausc whcn we obtain a sct of numbers which we wish t o describe o r analyse, we method. All this may sound technical and confusing. Don't worry! Just remember
have not usually collected every single score possible, but only a s a n ~ p l of
c scorcs. If that thcrc arc two methods for obtaining the standard dcviation. and that thesecond
we a r e measuring the height of all adults aged 20 to 40 years in Britain, we could not one is bcttcr with large sets of numbers andlor when the mean is not a convcnient
hope t o measure everyone who falls into the category, but instead, and after careful round figure - as is usually the case in 'real-life' work! Also. keep an eye on thc
planning, we would take o u r measurements from a representative sample. A sample symbols uscd in the second mcthod:
is only a sample though, and inevitably some error will occur. T h e sensible way to
ZX' nicans the sum of all the individual squared scores
compensate for this is to m a k e a n allowance for it which is included in tlic calculation
whilst ( Z X ) h c a n s the sum of the original scorcs, which is /he11squared
It is evcr s o easy to gct the totalling and squaring operations in the wrongordcr, so d o
takc care ovcr them.
Exercises
7 Summarise the following distributions as aptly as you can. Thcy arc all complcrc
populations.
(a) 4.4.4.5.5.5,5,5,5.6.6,6
(I,) 1 . 2 . 3 , 3 , 4 . 1 . 4 . 5 , 5 . 9
(c) 0.0.0, 10, 10, 10
8 Recalculatc the standard dcv~ationsfor the figures just givcn, but on the basis that they are
of the standard deviation. If the standard deviation has been made a littlc largcr, by now only samples, not coniplctc distributions.
0 To gain far~iilii~rity will1 tlic alternative mcthod ofobtainingtlic stand;~rddcviatior~,rework
subtracting 1 from N, then the figure obtaincd allows a margin of error. I t could
all the numhcrs provided ahove, but this time using thcgcncral mctliod which you tiidt~'!usc
perhaps be callcd a 'guesstimate'! T h u s a formula involving N - I is prcferrcd tllc first tinic rciurid. You sliould obtain identical \,alucs.
whenever we are working with samples rather than a set of scores which is ;~bsolutely
complete. Such a complete sct of scores is callcd a p o p u l r ~ ~ i o tand
r . you will hear much
Variance
more about samples and populations in latcr chaptcrs.
T h e term ~~urirrtrc.c,\rah mcntioncd carlicr in this clial7tcr. when I said that i t is tlic
Exercise
numbcr obtaincd inimcdiatcly prior to the st;~ndarddcviation. Vari;~ncc is tlic
h Check for yourself that the standard dcvintio~iincrc:~scswhcn .l'- I is uscd. h!. usi~igi t standard dcvi;~tion squilrcd. and as i t changcs cxactly in step with thc sti~ndarcl
instcad of N with the numbers given in cacrcisc 4. dcvi;~tion.\\,c can often usc i t as an altcrnativc mcasurc of sprcud. Later in this hook
r u ~ i o(or F) lest, in \vliicli \vc
\\,c will cncountcr a st;~tisticaltcst callcd thc ~f[~ri[~trce
'Thc second complicat~onI want to discuss conccrns another alternative proccdurc compare the spreads of diffcrcnt distributions by looking at thcir \,arianccs. In f;~cta
for obtaining thc standard dcviation. T h e steps givcn in opcration schcdulc 4 barely \\,hole l ~ u n c hof sti~tistical tcsts. coming under thc unibrclla tcrrn 'Anitlvsis of
diffcr from thosc givcn for obtaining thc mcan dcviation. Suppose though that in the Variance'. ancl shortcncd to A N O V A . esists. and as thc name iniplics, they focus o n
ligurcs givcn for cxcrcise 4 ( a ) thcrc had bccn an cstra score of R. This \vot~ldhavc variance. I'l~cy arc riot ~,articularl! difficult to understand. but thcrc arc scvcr:rl
meant a total of 78, and a new mcan of 9.7. If you bother to\vork tlircitrgh the figurcs kinds. and tlic 1,cginnc.r is not likcl!. to encounter thcni early in ;I sta~isticscourse. ;IS
29
hleasurcs of dispersion

they ;ire usually i~scdI'or cclmplcs c~pcrimcntalclesig11\ : c l ~ e l cla~;c:cil:ely\i.; ari\ing


eluriri!; fairly ;rilvancccl ~>r:iclici~l
work. Ilowcvcr. thc h;isic 1~rincil)lesur~clcrlying
4 The normol distribution
analysis of vari;incc d o not differ from rhosc behind the statistic;~ltcstscovcrcd i n this
l700 k.
As with the mcan deviation a n d sr;~nclardtlcviation. t l ~ clarger t t ~ cv;cri;~ncc.rhc
grcatcr the variation, or spread. seen i r i the nurnl>crsinvolved.

Bcforc you start this chapter. make sure that you are comfortable with the idea and
derails of describing sets of numbers by means of other, fewer, numbers, as covered
in chapters 2 and 3.

The characteristics of the normal distribution


l'hc normal distribution is a bell-shaped curve which is shown in figure I below. Its
I
main feature is that the three measures of central tendency, the mean, median and
mode, all lie at the same place on the curve. That is to say, they all have the same, or
nearly the same, value. If the scores making up a distribution are either very
squashed up, or very spread out, then we have the shapes shown in figures 2 and 3. i
overleaf. These are nofnormal distributions, despite the fact that their means, modes
and medians all fall at the same point (and this is what gives them their symmetry);
the normal distribution is always bell-shaped. A s it was 'discovered' by the
mathematician Gauss, it is sometimes called the Ga~issiarldistribution.
I t so happens that much of the data gathered during studies of living organisms fall
into this pattern. From the shape of the curve we can see that there are very few
extremely low and extremely high scores (the curve drops at the left- and right-hand
ends, this drop being due to the very low frequencies found), whilst the majority of I
scores lie at the values around the mean. We shall look at the pattern of scores much
more closely soon, but at this point another feature of the normal distribution must
he mentioned. This is that theorrticol(v the curve never actually descends to touch the
horizontal axis, but continues to approach it over an infinite distance. This is a
rnathern;~ticalproperty of the distribution, and one which is not reflected in 'reill-life'
4 l

Figure 7. A normal distribution


3l
The normal distribution The normal distribution

data gathering. We don't come across humans of absolutely gargantuan o r micro-


scopic dimensions - whatever creative yarn-spinners would likc to have us believe! The area under the curve
We now move on to a property of the normal distribution which is of tremendous
Leptokurtic i~nportance to life scientists, geographers, economists, market researchers and
statisticians. This is that when you have a normal distribution, you always have the
sattle relafive propurtions of scores falling between particular valucs of the numbers
involved. I said earlier that there will only be a few extreme scores occurring, and that
the majority of scores will lie in the middle region; we will now look at this in
somewhat more detail. It is usual for explanations of the distribution pattern of scores
to mention 'areas under thc curvc'. By this is meant the proportions of scores lying in
the various parts of the complete distribution.
T o begin - and using an easy example which does not involve mental exertion! -
Figure 2. A leptokurtic distribution
we know from property (i) that the normal distribution is symmetrical. Thus if we
Figure 3. A platykurtic distribution
draw a line down the middle, through the central point which is the value of the mean,
Properties of the normal distribution are then: mode and median, we know from our work on the median that we have 50% of all
(i) It is symmetrical. the scores above the line and 50% below. Did you have any problems with that?
(ii) It is bell-shaped. Think it over, and don't proceed if you can't see why, but return to the material on
(iii) Its mean, median and mode fall in the same place on the curve. measures of central tendency, paying particular attention to the median. It follows
(iv) T h e two tails never actually touch the horizontal axis. that if there is a point on the normal distribution at which 50% of the scores can be
obtained, that there must also be points along the curve where division into 25% and
You may wonder how strictly defined, in terms of scores, the normal distribution
75%; 30% and 70%; 80% and 20%; or indeed, any proportions totalling 100% can
is. In other words, how much can a shape deviate from the ideal bell-shape, before it
be made. The essential point is that division into parts - say 85% and 15% - will
has to be regarded as 'non-normal'? There are two approaches commonly used in
always lie at the same relative positions on any normal distributions. This is shown in
reaching a decision about this; the problem is a fairly important one for you to
figures 4 and 5 , where you have curves for the height of leprechauns and the speed of
consider actually, for some of the statistical tests described later in this book can only
reaction to a drug. The point bclow which 85% of the leprechaun population lies in
be carried out on sets of data which are normally distributed.
terms of height is 5 feet, whilst 85% of the subjects who take the drug Dynow will
One approach is to decide on the basis ofsimply looking at the scores - 'by
show a response within 15 minutes. The shaded areas contain 85% of all scores.
inspection', t o give it a more impressive-sounding label. If there is a large number of
Now the problem is, how can we describe exactly where any given portion of the
scores in a set, then drawing a frequency distribution will make the task of inspecting
population, as shown on the curve, is going to fall? In figures3 and 5, it can bc lnorc
and deciding much easier. The other way is to follow o n e of the mathematical
or lcss guessed that the cut-off point for 85% of the scores is going to fall in the places
procedures available for determining whether a set of scores is normally distributed.
shown. How can it be known precisely though, and how can it be described to others
A version of the chi-square test which is included in this book is one of these. In fact
without the aid of a diagram? The answer :o the first question is that the mathematical
it is fairly unlikely that at this stage of your statistical career you would need to know
properties of the normal distribution enable us to specify the precise location of any
with great precision whether or not a distribution can be regarded as normal, and you
proportional division of the curve; to the second, that we are able to specify locations
will usually be able to get away with the 'eye-ball' test! However, you will beexpected
by means of the standard deviation - the measure of dispersion described in the
to show some awareness that the problem exists fairly early in the
previous chapter.
Exercises
1 For revision, and without referring back to the previous figures, draw a skewed distribution f +85% of
and mark on it the positions of the mean, median and mode. You should be able to work out scores -I
where they lie, even if you can't remember.
2 State, after looking at the following numbers, or drawing frequency distributions, whether
each set is normally distributed:
(a) 0,5,5,5.5.5.5,5.5.5,0
(b) O , 0 , 1 , l , 1 , 2 , 2 . 2 , 2 , 3 . 3 . 6 . 6 , 8 , 1 0
(c) 0,2,3.3,5,5,5,6.7,9, 12 . -
(d)~,2,3.3.3.4,5,8.9.9,10.10,11,11.11,11,12,12.13 fieight in feet Time in minutes
(e) 0,3,5.8,9,9, 11,11,12. 12, 12. 13. 15. 18
Figure 4. Height of leprechauns Figure 5. Reaction time to Dynow
33
The norrnal distribution The normal distribution

between the values of 50 and 45 - 45 being the mean score ttri~rrtsone standard
deviation of 5 points.
Now look at the two proportions which are shaded. Elementary arithmetic tells
you that 68.26% of the total number of scores is no\\ accounted for, between the
values of 45 and 55, leaving a remainder of 31.74% for the more extreme values on
both sides. Again, the symmetry of the curve means that for this proportion, half of
31.74%, i.e. 15.87%, must lie in each tail part. In other words, about 16% of all the
numbers in that particular set will be less than 45, and the same amount will begrealer
than the value of 55.

The normal distribution and scores from 'real-life'


It is now time to use this information in an example taken from the classroom. Let's
I
The normal curve and standard deviations
suppose that a teacher obtains marks from a reading test given to 200 schoolchildren.
Suppose you take a set of numbers, the mean of which is 50, and you calculate that The scores are normally distributed with a mean of 60 and a standard deviation (SD)
the standard deviation is 5. We call this value (inches, seconds, points on a rating of 8. From the properties of the normal distribution. we will find that roughly
scale o r whatever) one standard deviation. Ten of the inches, seconds, etc., would two-thirds of all the marks, i.e. those obtained from about 136of the children tested,
comprise two standard deviations, and 15, three standard deviations - always with will be between 52 and 68. About 32 children (16%) \\.ill have marks below 52, and
reference to the particular set of numbers from which we obtained the value of 5. It roughly another 32 will have marks above 68. Thus everyone is accounted for. Now
is as though we take the standard deviation and make its value in the original units - and here we come into one of the main uses of the standard deviation and the
into one unit of a new measurement scale; rather like saying that one inch is the same normal distribution in psychological measurement - suppose the parents of a child
as 2.54 centimetres. You wouldn't ever mix inches and centimetres in calculations, who had obtained a mark of 68 enquired about their little Johnnie's progress. Being
but convert from one to the other. Likewise, you don't mix actual scores with told that their child's mark was 'above average' might at first please them, but soon
standard deviations, but convert from one type of scale to the other. they would probe again, and want to know just how much above average it was,
Let's return to the proportionsof numbers in different parts of the distribution. If compared with the other 50% of the children who also obtained 'above average'
I take one part of the curve between the mean, marked on the horizontal axis in figure marks. In other words, they want to know the relative standing of their child's
6 as 50, and one standard deviation, marked on the horizontal axis at a score of 55 performance. If the marks had all been squashed up around the mean, with a top
units, then I know that I shall have roughly one-tllird of all the scores in the group mark of 68, then the parents would have continued to feel delighted. Less pleasing to
lying between them. I know this because it is always the case with the normal them would be the news that the top mark had been 90, with a very thick spread of
distribution. Strictly speaking, the exact proportion of the total set of scores falling marks going upwards, even above 70. However, the teacher kno\vs that the S D of the
between the mean and one standard deviation above the mean (50 and 55 in this case) marks was 8, and thus that a third of all the marks were between 60 and 68. Knowing
is 34.13%. Because the normal distribution is symmetrical, exactly the same thing that 50% of all the marks obtained were 'below average', it can be seen that this
must occur below the mean, i.e. we shall have another 34.13% of the scores falling particular child's position is roughly 84% of the way up the complete set of marks.
And s o the parents can be pleased, after all!
If the child had obtained a mark of 76, then the parents would have had even more
f A Mean, median, mode
cause for pride, knowing that he was almost 98% of the way up (a mark of 76 is two
One SD below /
SDs above the mean), and a mark of 84 would have put Johnnie in the enviable
position of being99.87% of the way up -in other words. from a groupof 200children,
quite possibly top. Standard deviations cut off fixed proportions of the normal

--
Scores (in units, such as inches, seconds, etc.)
Figure 6. Proportionsof the distribution cut off by one SD belowand one SDabove the
mean
The normal distribution

distribution from the mean to (thcoretici~lly)infinity, in both directions. cl'he norm;~l X Work out the sco~csni;~rkingthrcc Slls ;~hovcand hclo\\, the Incan for thc follo\ving
distribution, with all the standard deviation cut-off points, is shown in detail o n pagc distril~utions:
197. (a) mean 15 inclics. SD 2 . 9 ~nchcs (b) mean 500. SD f 5 0
Make sure that you understand how the relative standing of a mark of 76 is (c) mean 8.4 scco~itls.SD 1.2 seconds (d) nican 80 elephants. S11 5 clcphanls
obtained (i.e. 50% + 33% + 15%) and how i t can be calculated that this group of 0 MS Scroogc Iias twc~zccrcc;irics. MSSwcctic and MS Wlnk. Hard tirncs hit thc lirm, and one
children would produce about four others who obtained marks above 76. Sec t~ltlicgirls II;IS 10 tic 111;1tlc ~cclu~~cl;~nt.In ;ui cllort to assess thcir worth :IS typists. MSScroclgc
whether you can calculate what mark would have been likely to put a child i r tlic~ Icss coulits tlic ~iurlillcro l ~ ~ P I clrors I I ~ C ; I C ~nii~kcs.She d~scoverstI1;1t MSS\vcctic lias a daily
enviable position of being only four places away from the bottom. 7'he an5wcr is 44. 1iic;111of I l l c~rors,SO 2 . :111(l hls Wink :I 1iic:i11o f 8 crrorh. S11 6 . l ~ l o ~ ~1111s ~ infor~il;~~io~i
~~~ill
11el1>Iicl to I C < I ~ I I :I tlccisio~i'!
To obtain it we need to know what the mark represented by two SDs hclow the mean
is, and which gives a cut-off point of 2Y0. Thus if we take 60, the mean, and subtract
the valueof twoSDs- 16, twice the valueof8, whichcomprisesa.sir~gleSD-wearl-ivc
z scores
at 44. Always be careful to avoid mixing S D vaIues with the actuaI orrginal scores. In In the examples we have looked at so far, we have considcredscoreswhich have been
this example we didn't subtract the value o f 2 from the mean of 60, although \\.e o n the mean, or esactly one. two or three SDs above or below it. The time has come
wanted to find the score which was two SDs below it. We subtracted I 6 marks, these to cx;~minescores which are not quite as readily converted into standard deviations.
being the number making up two SDs for this particular set of scores. Suppose for instance that the child with anxious parents had obtained a reading
Inark of 64. The child's position on the curve would have been in the middle of the
Exercises distance along the horizontal axis between the mean score of 60 and one S D above.
of 68. Look at this in figure 7.
3 On the Bloggs Personality Test, which is meant for a population of one-cycd dwarfs, the
mean score for kindness was 16. The standard deviation was4. Arc the dwarfs pretty similar
with respect to kindness, or rather variable? Justify your answer.
4 Fifty dwarfsof the variety just described live in a community. On the assumption that they
are representative of the species as a whole, how many of them would you expect to have
scores of (a) less than 8, (b) less than 12, (c) more than 20, (d) more than 28?
5 How big would the sample have to be before we might reasonably expect to come across a
very very kind dwarf, with a score of 280r more (i.e. three SDs abovc the mean)?
6 In fact Bloggs finds, when he examines the 50 scores, that there arc at least 3 dwarfs wit11
scores of 30 (the maximum), 10 with scorcs from 26 to 29, 15 with scores from 21 to 25, 10
with scores from 16 to 21, and the remainder with less than 16. Using a frequency diagram Exam marks
to illustrate your answer, tell us what Bloggs might conclude from these scores. Figure 7. Position of a mark of 64 in relation to a mean of 60
7 In a technical college, pork pies prove a popular snack. Sales records show that the SD for
daily sales is 200 pies, and that up to 1200 pies are eaten on 84% of the days included in the The child's position is exactly halfway between the two points of00 and 68. Does this
records. What is the mean daily pieconsumption, and might thestaff expect to sell l600 pies mean that his relative standing in the group as a whole is halfway between the mean
in a single day? position of 50% and the 84% position of a mark of 68? That is to say. his position is
about 67% of the \tray up the complete set of scorcs? Look carefully at the two
portions of the curve divided by the line drawn at the 03 mark. Are they even? NO -
and here we have a problem which makes the comput;~tionof relative standing into a
much more tiresome affair than \VC would like. As \VC get further ancl further away
from the mcan, the numbcr of scores falling into the v;~riousproportions specified
dimi~iishesrapidly. So. i f you take two portions. say between 00;1nd 64. and b e t w e c ~ ~
61 and 68, there will he fe\vcr scores in the latter. There are c \ ~ less n in the next half
SD, between the marks o f 6 8 and 72. and so on. Tlie same holds true of scores which
arc below the average. except that in thiscase i t is the higher rather than the lower
scores which are closer to the mean. There will he far fe\ver scorcs Palling between 44
and 48 than between 48 and 52. although in both cases the range of marks covers 4
points, or half an SD. When you look at the shape of ;I normal distribution the
changing size c ~ tlic
f proportions enclosed by the \~ariousS D cut-off points seems
obvious. ficlwc\jer. the prclhlcni of deciding on tllc rclati\rc position o f a mark of64.
co111p;lrcdwith the r c n i a i n i n ~scores. hasn't disappeared. I Io\v tlo \vc detcrminc i t ?
The answer is hy nicans of :.score's.
37
The ~iornlaldistril)ution

zscore:~correspond very closcl!, to stand;~rddeviations, ancl in fact :II-evirtually the As o l ~ :


r scorc \\.as positive. urc must adcl 50'%, for our linal proriollnccniclit. \\fliich
same thing, exccpt that a zscore al\\,a).srel'crs to a point's position wit11 regard to tlie will bc 73.57. I-lcnce the mark of 65 is almost 74% of the wily up the scale. You can
mean, This will become clearer shortly. For the moment think t l i ; ~ ta zscore of I is the see from the table that 49Y0 of all the marks on each side of the curve are included by
same thing as an S D of 1, a z score of 2 like an SD of 2, and so o n . Because there is the time the z scorc is ;IS high as 2.33 (or just very slightly belo\\f. to be completely
virtually nothing left in the normal distribution alter the thircl S D or z score away :tccurnte). Note also though. that the complete 50% is never given - don't forget that
from the mean has been passed - in either direction - it is rare for SDs or z scores mathematically. the tails of the curve never d o actually touch the horizontal axis. nor
which exceed 4 to be mentioned. I t is ~lsualto rcfer to z scorcs :IS l~1rr.sor tnitrus; but enclose nll possible scorcs.
in using SDs we tend to describe them ;IS being either :~l,ovc or. I,elow the mean,
rather than plus or minus.
An S D will have an unvarying actual valuc, whilst a z score rclcrs to a relative
position on the curve, and always in relation to the mean. In so far as a zscore of 1
means 1 SD above the mean, then a z score and S D arc identical. 14owever. an SD of
1 might refer to a group of scores making up that S D at ariy place or1 the curve, i.e.
throughout the whole set of numbers, whilst z scorcs have fixed positions on the
curve. A z scorc of + l means the point exactly l S D abovc the mean, and not any
group of scores making up an SD.
We shall now return to our problem score of 64 and its relative standing. We know
that its position is half astandard deviation above the mean, andso wegive it a zscorc Now let's look at the relative standing of a person who obtains a score which is
of +0.5. Very conveniently for us, tablcs have been devised which enable us to below the mean, say a mark of 41 from the original example. This mark is 19 points
ascertain quickly whereabouts on the normal curve all the 2 scores between -3.99 below the mean, and so just a little more than 2 SDs away. T o be precise, it is l9Iu, or
and +3.99 lie. We shall use one version of the several types of table available to find 2.375 SDs below. Its z score will be -2.375. From table S l , we see that a z score of
out where a z score of 0.5 lies on the curve; it appears as table S1 in this book (page +2.3 encloses 48.93% of all the scores, but our zscore was a slightly higher figure of
159). The figures in the body of the table tell us the percentage of scores falling into 2.375. As the table is for use with two decimal places only, we will round the score off
the area between the mean and any particular z score. The small diagram at the top to 2.38. Moving along the rows, we now stop at the column headed 0.08, and read off
of the page indicates by the shaded portion that we are considering the area to the the value of 49.13. So a z score of +2.38 would include 50% plus 49.13% = 99.13%
immediate right of the mean. Of course, as the curve is symmetrical. all the values of all the scores. So far so good, but our value was negative. We simply turn our curve
will apply equally to the corresponding area to the left-hand side - or to z scores which round, as it were, and work from the mirror image. Thus with our value of -2.38, we
have negative values. When we are considering positive z scores though, and wish to know that 99.13% of all the marks in the distribution will be above it, and so only
know how far up a total set of scores a particular z score (and its corresponding 0.87% below. I f we call this tiny proportion l % , then from our original sample of 200,
original score) is, we have to add 50% to the value we obtain from the table, as the we might expect l % , o r two of the marks to be as low or lower than 41. A t the other
left-hand half is not included in the shaded area. Let's see what happens with our end of the range of marks, we would only expect to find two pupils with marks 19 or
score of 64 then, which has a z score of + O S . We can read that value off by moving more points above the mean, i.e. with marks exceeding 79.
down the first eolumn on the left, headed z, until we arrive at 0.5. Then we moveone The way we obtain the value of a 2 score is given formally in the expression:
step to the right, and see the number 19.15. We have to add 50%. and so obtain the the mark's deviation from the mean
figure 69.15. Now we know that there \vere 69.15% of the scores lying below 64, and z score =
the standard deviation
30.85'X abovc. I t would be appropriate to round the percentages off, to 69% and
31% respectively. Il you express your deviation from the mean with a plus or minus sign, according to
Let's take another example. this timc the mark of 65. This is 5 points ahove the whether it is above or below respectively, then your z score will emerge with the
mean. and the S D for the group was 8. A mark 5 points ;~bovethe mean is thus/xof a11 correct sign.
SD abovc thc mean. Turned into a decimal. we can calculate that its z score will be
+0.63. As i t is ahove the mean its value is positive. Ag;~inturn to table SI. As the z A word of warning
score is now expressed to twodccimal places thc procedure issliglitl!~dilferent. Move The relative proportion of numbers in a set of scorcs falling b e t \ ~ c e nstandard
down the first column once morc. but this timc proceed until you reach0.6. The value deviations is unchanging. You always get about 68% of all the scorcs falling between
to the immctliate right (22.57) would bc thc correct percentage for a z scorc 010.6. one S D below and on: SD above the mean (i.e., between z scorcs of - I and + l ) ,
I4owcver. our =score is 0.63. and so WC nced to move along the body of tlie table three regardless o f the actual values of the scorcs from which the SD \\as deri\,ed. I t is
morc columns. until we reach the one headed 0.03. That value, added to our 0.6. cimilar in principle to the way in which an area of a circle is always expressed by the
gives us 21 z scorc of 0.63 -and so \vc rcad from the body of the tablc the value 23.57. formula nr2. I t doesn't matter how big o r small the circle is. or what i t represents- the
The normal distribution T h e normal distribution

formula is t h e same:However, you would not say that t h e a r e a of a n o v a l s h a p e is nr2, Exercises


although there is nothing t o prevent you from taking a rough average radius of t h e 10 Locate z score valucs 01 + l , + 2 , +3, - l , -2 and -3 for these sets of numbers, assuming
oval, and then putting that number into t h e formula. You wouldobtain a figure which that they are samples, not complete populations:
looked valid enough a s a n a r e a , but it would b e incorrect, a s t h e formula holds good (a) 48.55,58,59,60,60,61.61,62,63,hS, 65
for circles only. T a k e care not t o d o a similar thing when you a r e working with (h) 2,4.5,5,6.h,6,7.7,7,7,8,8.8,Y,10,10,11
standard deviations, z scores, o r areas under the curve. by working o u t values using I I A distribution has a mean or48 and an SD of 6. What are the scorcs which correspond to:
numbers obtained from distributions which a r e tlot ~ i o r m a l .These standardised (a) z = -2.5 (h) z = +2.33 (c) z = +1.66 (d) z = 0 (c) z = -0.33?
scores apply only to numbers obtained from n o r n ~ a distributions.
l You may obtain a 12 From a normally distributed sample with 100 scores, mean 30 and SD 2, how many would
you expect to find:
figure which looks authentic, but which in fact is nonsense! For instance, t a k e a
(a) over 20 (b) under 29 (c? over 32 (d) over 33?
skewed distribution . . . W e will work with o n e which has a mean of 30 points. T h i s i n
And what scorcs would the following z scores represcnt?
itself is a dubious descriptive statistic t o use in communicating information a b o u t this (C) -2 (f) -0.5 (g) +1.5 (h) + 4
set of numbers, but it can b e obtained quite legitimately. T h e standard deviation of 13 Using table S1, translate the following z scores into pcrccntage areas undcr the curve:
our numbers might also be calculated, a n d we could emerge with the value 4. Fine - (a) -2.43 (b) -1.88 (c) -0.36 (d) 0 (e) +0.9 (f) +1.47 (g) +2.61
or is it? T a k e a look at figure 8. T o use S D s a n d z scores meaningfully, we must have 14 Now turn these percentages into zscores:
a symmetrical distribution, a t t h e very lezst. A z score of -1 must include t h e s a m e (a) 50% (b) 67% (c) 99.2% (d) 75.8% (e) 33% (f) 1%
proportion of the total number of scores in t h e distribution away f r o m t h e m e a n a s a 15 You will now need to think,as I haven't specifically told you how to do this. Although you
zscore of + l . In figure 8 you can see quite clearly that this is not t h e ease. T h e r e a r e can get the information needed for the first few problems from the diagram on page 34.
far more scores t o the left of the mean than t o t h e right. S o a n y statements in which z make sure that you can also get it from table S1. For the remainder, you will need table S1,
scores and similar measures a r e used a r e not only incorrect, but may b e very and also have to do some addition and subtraction here and there.
misleading. In a normally distributed sample of 100 marks, the mean was60 and the SD 10. How many
of the marks would hc expected to fall:
(a) between 50 and 70 (b) between 40 and 80 (c) between 60 and 100 (d) over 90
(e) between 70 and 90 (f) bctwcen 58 and 62 (g) between 49 and 72
(h) between 0 and 40?
1 16 An intelligence test developed in Never-Never Land has a mean of 109 and SD of 12. What
proportion of the population will be expected to have IQs of:
(a) less than 91 (b) between 94 and l24 (c) above 137?
Brillia, the society for highly intelligent people, requires a minimum IQ of 115 for
membership. If the adult population in this country is 30 million people, how many
members might Hrillia cxpcct to get if cveryonc suddenly applied for membership?
17 The weights of 10000 onions are normally distributed. Their mean is 11.5 g, and SD 0.3 g.
How many would wcigh:
(a) betweenll.5and l l . 8 g (b) 11.2and 11.5g (c) 10.9and 12.1 g (d) 10.6and 12.4g?

Figure 8. SDs a n d z scores marked o n a skewed distribution 1 Standard scores and standardisation

T h e message is then: T a k e care when working with z scores a n d S D s t o use them


only when the data from which they were derived a r e normally, o r approximately
l z scores a r e also called standard scores. This is because their values, ranging from -3
o r -4 through 0 t o + 3 o r +4, are unchanging, a n d actual scores, marks, etc: can be
converted into zscores, a n d given relative positions o n t h e normal curve.
normally distributed. Otherwise you will e n d u p in trouble! In s o m e widely used personality a n d intelligence tests, such a s Cattell's 16
I
Personality Factor (16 PF) test and the Wechsler Adult Intelligence Scale, actual
scores (called 'raw' scores) can be converted into z scores a n d expressed this way.
Thus t h e standing of a person can very rapidly b e compared with others from the
same population. z scores a r e consistent, a n d a r e thus used t o make comparisons

!
between scores obtained o n different tests, in a way which is not possible with the raw
scores. F o r instance an IQ of 150 points might be extremely high, but a score of 150
o n another test - perhaps a personality test -might be close t o the mean. Expressing
'l'lic ~ i o r m adistribution
l

t l ~ es;lme scores by using the :scores of +3 or +0.01 wo~lldgive a rapid and clear
~ o r n ~ x ~ r i howevcr.
son
W e spcak of some psychological tests as being 'standardised'. This means that they
have becn tried out o n very many people comprising a representative sample of a
pitrticular socicty o r population. T h e scores obtained may often show a normal
distribution, but often tests are constructed so that they give rise to scores which are
normally distributed, and in the casc of many intelligence tests, around a mean o f 100.
Fin:~lly, wc talk of tcsts being administered under 'stantlartl' conditions. This
means that thc tester goes througl~exactly the same routine in giving the test t o all his
01.hcr customers, and that other testers have also been trained to follow an identical
proccdurc. T h e result is that test scores obtained from different sources can be
mc;~ningfullycompared. It would not be very helpful for any scientific investigation
if. for instance, o n e research laboratory became known for 'helping' subjects t o get This chapter is about the way we use the concept of chance, o r probability, in o u r
good scores o n tests - a n d even worse if this was happening, but undiscovered. statistical work, and why we need to d o so: You should be able t o understand all the
Unfortunately though, a test can never be administered under completely identical ideas and information presented without having t o refer to any of the previous
conditions from o n e occasion to the next, o r when m o r e than o n e tester is involved; chapters.
sti~ndardisationof procedure is an attempt to make the best of a difficult job. T h e main reason we use statements of probability in conjunction with statistical
tests in o u r work is that over and over again we are having to answer the question:
Could that event, situation o r pattern of numbers have arisen by chance?
C W e all know that if something can g o wrong it will! In scientific measurement,
surveys and experiments, we a r e not immune from this law of nature and s o must
consider the likelihood of freak occurrences and accidental events when we consider
and communicate o u r data. However, when referring t o a particular set of data,
rather than vaguely speculate by means of words how lucky o r unlucky we think we
have been, in true mathematical tradition we specify nur~~erically the element of luck
which w e consider might be involved.
Everyone has heard the phrases 'two-to-one', 'fifty-fifty', 'a million-to-one' a n d
'one-in-twenty'; they will all have been heard in connection with events involving a
certain amount of uncertainty, and a r c numerical estimates of the likelihood of
something occurring o r not occurring. Each expression can be translated: 'two-to-
one' means that there are twice as many chances that o n e specified thing will happen
rather than the other; with 'fifty-fifty', there is an equal chance of either of two events
occurring; 'a million-to-one' states that it is a million times more likely that o n e thing
rather than another will occur - i.e., o n e of the events in question is extremely
unlikely. Consider this problem. A n airline company, Fly by Night, is known t o have
accidents at a rate of about o n e in twenty flights. Its main rival, the High Flier
company advertises quite truthfully that its accident rate is one in a million. You win
the pools, and decide to take your dream trip to the Bahamas. Whichairlinecompany
would you prefer to travel with? Unless you get your kicks from fear of death, n o
doubt you would opt for the second firm, the High Flier! Thisdecision shows that you
already have a good understanding of the way we express probability in numbers
rather than words. We shall now proceed to build a little more knowledge on this
foundation.

The chances a r e . .
1 T h e expressions of probability you read earlier arc based o n two ways of describing
1 chancc events numerically. O n e is to use a rrtrio description - 'one-in-ten'. 'onc-in-
twcntyq.'a thousand-to-onc', 'tlircc-in-four'. etc. -whilst the other t r s c s p c ~ r ~ c ~ c ~ t r r r c ~ c ~ . s . Just-to complicate matters
as in 'fifty-fifty'. This nlcans exactly tlic samc as 'onc-in-two'. but has hccr~col~vcrted
to 50% and 50Y0. It could have been cxprcssed as 'There is a 50'% proh;~bility Thcrc is onc morc \\,;ty of c x p r c s s i ~ ~probability.
g and i t is a simple extension of thc
that . . .' In fact all probabilities can be expressed in pcrccntagcs. although this is not ~xrccnt;tgcmethod. I t is to express the likclillood as a decimal fraction. T o d o [his.
vcry common i r i cvcryday speech. Thc 5)') wholc numhcrs lying hctwccn the t\vo the decimal place on [he pcrccntilgc whicl~hiis bccn stated nceds to be movcd t\vo
cxtrcmes of 0% and 100'%,dcscrihc tlic gl-adually itltcring likclihootl of something 17li1ccsto tlic Icft. so that 1001%1hcconies 1 , 501%,~ C C O I I I C S0.5, IOr%,~ I C C O I I I C >0 . l ;111d
happening or not happening, as tlic cast Inay be. Howcvcr, we w o u l d ~ ~speak 't of a I becomes 0.01. .I'his method is in fact the most commonly uscd way ol'csprcssi~lg
probability of 1000'%,.Although numerically thcrc could bc sttcli a thing, the systcrii I ~ ~ > l ~ ; ~illl ~connecl~on
ility \\'ill1 statistical tests. Ba>ic;~lly.
i t is vcry s i m l > l ctlic I7iggcst
o f expressing likclilioods i l l pcrccnti~gesdictatcs that 100'%1is the highest pcrccntage tl;~r~gcr 0ci11gthat i t is easy to get the decimal place in the wrong position. 1'he 0 . 5
which can be used. This figurc indicatcsohsol~ircinevirahiliry, and increasing i t cannot (50n/0) V ; I I U ~ is oftcl1 written by students \\'hen they mean 0.05 (5%). This value is
bc meaningful, as an event cannot bccomc more incvitablc than that! Tablc 1 uscd fl.cquently in statistical work. as you \\.ill soon find out. The ;tbb~.c\iatiot~ for
below shows a scalc which gives thc probability, or likcliliood of particular cve~its, probability is 11.
using both ratio and percentage mcthods. The very top of thc scale rcprcscnts
absolute certainty that some event will occur, whilst at thc bottom. thcrc is equal
certainty that something will not take place. Most speculations can bc cxprcsscd Exercises
either as an event o r a non-event. For instance 'I am 95% sure that 1 won't score a I I t is important that you gain fluency in cxprcssing probabilitics in tcrn~sof ratios.
bull's-eye' can become 'I have a 5% chance of scoring a bull's-eye'. Also, i f it is pcrccntagcs and dccinlals. By drawing thrcc columnsantl giving thcm thc hcadingsshown in
possible to specify the probability o f o n e outcome out of several possible evcnts, then ti~blc2, construct your own conversion cllart and fill in the following valucs:
the remaining probability will apply to the other events. For instance if I throw a dice, 100%. 99%. 95%. 757'0, 67%, 50'56, 33O/o. 25%. IOn/o, 5%. 2.5?'0, 1%. 0.1%.
1 know that I have a one-in-six (or 18%) chance of obtaining any specified number-
a 'one' for instance - and that I can then say that there is an 82"/U chancc of arly one
of the remaining five faces coming up. Can you work out the probability of obtaining
I Table 2. An incomplete probability conversion table

/
any even number on a single throw? The answer is 50%. Half the six faccs are Pcrccntagc Dccimal Ratioor vcrbaldcscription \
even-numbered and half odd. Thcrcfore there is an equal chancc of obtaining either probability (p) J
sort of number.
1 00/~ I .O Complctcly ccrtain
Deadccrt! H-
1
2

'7
Table 1. A probability scale in ratios and percentages .- . -.
_-._._
-.,- ----'-
/-------
2 Convert the following figures to percentage statcmcnts of probability:
Ratio description Event Perccntagc description (a) onc-in-thirty (h) one-in-a-hundrcd (c) 40 :60
(d) 1 :20 (c) onc-in-a-thousand (f) onc-in-livc-llundrcd
Ccrtain That you wcrc born 1 OO'X, (g) 1 : I ( h ) onc-in-ten
Two-to-onc Racing fans can supply their own cxamplcs! h6iX, (rare) 3 Convcrt the samc figures to dccimal statements of probability.
Fifty-fifty A tossed coin coming up heads 50'%,
Onc-in-thrcc That Evc isdisplayingonc of hcr faces 33:'X, (rare) I Know your onions
One-in-six That a dice willcornc up with a 'six' l (>:(X,(rare) I T o illustrate our need for cxprcssians of prob;tbilit! in statistical work. I am going to
One-in-twenty 5% describe a situ;ttion which at first glancc seems far rcmovcd from thc world of the
Both uscd in connection with statistician. I t concerns thc proprictor of a transport cafe, and thc problcms he
cxpcrirncnt;~lwork espericnccs when buying onions. Strange though it might seem, his problcms. and
One-in-a-hundrcd l 'X, thc dccisions hc has to makc. arc rathcr similar to the ones WC encounter in scicncc
I mpossi blc That you will fly 10thc moonwithout O'%, 1;tboratories. or whcn WC conduct survcys which providc data for analysis.
mechanicill assistance
-- -
I Joc Bloggs and his wife nccd to buy about I00 onions cvcry week for the chccsc
-.
I and onion s;lndwichcs which thcy sell in the citfi.. Normally thcy obtain their onions
-- -
p - - - --.pp
I'robability I'robability

from the nearest greengrocer, Johnnie Appleseed, and find that in a typical weekly 01' them. A rise from 10 to 20 bad onions is ;l large enough difference to mnkc o n e
purchase, about 10 onions will bc bad. O n e week, MS Bloggs takcs command, and suspcct [hilt i t signifies a real difference in quality - but on the other hand, i t corrltl
decides to patronise a different greengrocer, Bill Bashem. Anxious t o demonstrate have been due t o the fact that some unlikely event has taken place. It might have been
t o her husband how much better her choice is, she systematically checks through the that o n e of Bill Bashem's assistants had been asked t o leave, and on liis last day in the
weekly purchase, counting the bad onions. T o her annoyance, she finds that the sack shop was getting his own back by deliberately including in the deliveriesonions which
contains 20 bad onions, i.e. the proportion of rejects has doubled. and is one-in-five. Bill Bashem would never dream of selling. Obviously, the ideal way to solve the
Now the question is: Does that doubling of proportion renlly indicate that Bill dispute is to continue t o buy 'samples' from both sources, and compare tliem over a
Bashcm's goods are inferior t o Johnnie Appleseed's, o r is it just that she has bought period o f time. Often though, this solution is not practicable.
tliem during a 'bad' week, o r when some accident has m a d e them worse than usual? Now I want to show you how this prol)lcni reflects the problems we encounter in
Let's assume for the moment that both the greengrocers buy onions of similar scientific work. When we conduct an experiment we usually obtain two sets of scores
quality from o n e wholesaler. Any differences in the onions are due t o different for comparison, and these comprise o u r samples. O n e set measures some kind of
treatments during the transport, storage and selling processes. When we buy event which occurred under the experimental conditions, and the other, the same
vegetables, whatever tlie quantity, greengrocers d o not normally supply us with kind of event, but without the experimental treatment. T h e two sets of scores are said
details of what they paid for them, o r how they have handled and stored them, etc. t o come from experimental and control groups respectively. Just as in tlie memory
experiment results described in chapter l , it often happens that our two sets of scores
are not radically different, but only differ by a small amount. Like Joe Bloggs, we are
left wondering whether the difference is due to a real effect, o r just t o some chance
variation. T h e ideal solution is the same. W e would repeat o u r experiment several
times, and thus quickly discover exactly what 'typical' results comprise. Just a small
difference in the scores, but repeated over and over again, would indicate a real, even
if small, difference in treatments. However, if we established that the baselines were
indistinguishable, then we would conclude that our treatment did not have any effect.
Fine -except that considerations of time and cost usually mean that we d o not repeat
experiments in order to reach a decision about what the results signify.
In other words, we make a preliminary judgement o n the basis of o n e sample only.
O f course, the experiment will probably be repeated, sooneror later, and particularly
if it concerns some effect which is deemed t o be of importance. However, each single
What happens is that on the basis of what we see when we unpack the purchases at experiment is still judged initially on its own merits - and this is where we need
home we decide for ourselves whether goods from a particular source arc better than st;ltistical techniques. T h e tests WC carry out on sets of scores will indic;~teto us just
from another - always providing that we are comparing goods of roughly similar how safe it is to attribute any difference we obtain t o a real difference in baseline, as
quality of course. W e would not usually base o u r decision about the relative merits opposed t o being due t o an unusual accidental occurrence. In fact had J o e Bloggs
of greengrocers' goods on the strength of only o n e purchase, but prefer t o make taken a statistics course, he would have been able to use a statistical test to decide o n
several. Over a period of time we would get a good idea of the average quality of the a rational basis where the next purchase of onions should be made! As it is. we might
goods, and how much variability we might anticipate from week t o week. O u r weekly note in passing that although statistical techniques can easily tell us what the
purchases are called samples and the knowledge we gain from them, of what probability of some event is, o r whether a sample can be regarded a s typical, i t can
comprises 'typical' proportions of sound and unsound vegetables, is called a brrseline. never tell us directly what to do. There are many factors which have t o be taken illto
O u r baseline with Johnnie Appleseed was established as being one-in-ten for consideration. If an experiment orsurvey is very costly, in termsof time o r skill, then
unsound onions. T h e r e is nothing to prevent some accident occurring which could o n we would not wish to have to repeat it unnecessarily. However, if the expCriment
o n e week turn half the onions into a mushy pulp-although that week's sample would concerns a matter which will prove to b e of immense benefit t o huni;~nity- o r if there
be in sharp contr;~stt o what we know the b;lselinc t o be. If another supplier normally 1 is some chance that a certain trcatment carries with it some danger (;IS in clrug
sold onions which were of very poor quality, then in this case. half the onions being l tleveloprnent. o r brain surzcrg) - then most of 11s would prefer rh;lt r ~ ( ~ / ) / i c ~ ~ l i ~
bad would not be much different from the baqclinc performance. (repeat esperimcnts) are carried out. Tliis repeated 'sampling' gives us a b;lscline to
T o return t o the Bloggs couple. You can just imagine that while Joe uses the
doubling of bad onions t o prove how right he was in patronising Johnnie Appleseed, i work from, and is safer than 'one-off' judgements. Even when we have cstablishcd
that some treatment works. statistical tests cannot tell us whether the benelits
MS Bloggs could equally well assert that it isn't fair to judge Bill Bashem o n the basis / outweigh the costs. S o J o e and his wife could still argue about whether the superiority
of o n e batch of onions justifies travelling to a more distant greengrocer. or whcthcr
of o n e week's purchase (sample) only. Which o n e of them is correct? In a sense, both
Probability

the pleasantness of Johnnie Appleseed compensated for the faet that his produce is
slightly below the standard of his less friendly rivals! 6 What ore statistital tests all about?
In chapter 6 there will be much more information about the rclationsliip between
statistical tests and statements of probability, and another example will show the
connection between them and experimental work. Meanwhile, make sure that you
are quite happy about theseveral waysin which we express probability numerically.

(P2=5&--,
-time to b e a t it b c f o c c .

All the statistical procedures described so far have been concerned with summarising
information as briefly and accurately as possible - hence the name descriptive
statistics. Now we move on to the second main type of statistical techniques,
it~ferentialstatistics-or, the dreaded tests! We use them to draw inferences (informed
guesses) about situations from which we have only been able to gather a part of the
total information which exists. As the aim of experimental work is to make
predictions on the basis of limited information, we have frequent occasion to use
statistical tests in analysing the results of experiments. The information contained in
the next few chapters will be directly applicable to most kinds of scientific work,
although the examples I give will mainly be drawn from the social sciences. The only
material needed for a good understanding of thischapter is that given on probability,
in chapter 5.

1 The dreaded statistical tests


! I would like to begin by considering briefly why statistics tests are dreaded and
disliked by many students. First, fcw people understand the rationale behind
inferential tests and techniques- in othcr words, they don't have much of acluc about
what is going on! Then, having ploughed through a number of tests, most of which
are somewhat different from each other, yet all of which are used in fairly similar
situations, students find it hard to fathom out which test is appropriate for a given set
Ii of data. Finally, statistical formulae and tests look really difficult, and are very
off-putting-especially for people who were nervous of maths in the first place!
' I hope to describe and explain the use of statistical tests in such a way that you
understand quite clearly just why you are using them, and what is going on when you
,! carry one out. Page 198 (just before the index) tells you how to decide on which test
to use for the various experimental designs, and with different types of data. As for
the fact that tests look difficult, I couldn't agree more! The t test formula in particular
looks quite horrific. Bold readers look at it now - on page 172 - nervous readers,
perhaps not just yet! Who knows what that pile of square root signs, brackets, bars
and lines is trying to convey?
Take heart. First of all, those of you who have read the chapter on measures of
dispersion will already be ~na better position to cope with nasty-looking formulae.
Look at thc one for obtaining the standard deviation, given on pagc 156. That too
looks off-putting, although admittedly not quite as bad as the I test formula.
However. if you d o have some insight into what it mcans. and have actually worked
What are statistical tests all about? What are statistical tests all about:'

through it to obtain a measure of sprcad, it doesn't seem quite as bad, does it? You statistics is: any group of numbers, finite o r infinite, which refer to real or hypothetical
now know that it is possible to use the formula by going through the scrics of steps objects o r events. This definition has quite a broad scope! It doesn't only cover people
which the symbols stand for, in small stages. The t test formula is just the same. It can (or rather, some numerical aspect derived from those people), but any numbers
be tackled in a series of steps, each of which is quite simple, and instead of looking taken from a group which shares some common characteristic. Examples of
like a pile of mysterious mumbo-jumbo, it can be seen to represent a longish populations are the number of: fishes in the sea, fishes in fresh water, left-handed
sequence of arithmetical operations. I f you doubt this, take your courage in one people living in Scotland, people living in England, giraffes in the world, giraffes kept
hand, and with the other, turn to the formula on page 172. Even nervous readers in captivity, sufferers from schizophrenia, plankton in a pond, measured heights of a
should now d o this. Examine every sy~nbolin turn, and decitie what circh one group of people, road accidents occurring in Great Britain, stars in the universe,cows
signifies. YOUwill already know that d n i c e a s 'take the square root of', a linc- in Yorkshire, memory test scores from adults. So a population can refer not only to
means 'divide by', and that + means 'add together'. C was encountered in obtaining humans, but also to other animals, events, scores, or simply anything which can give
the standard deviation, and just means everything given in a particular list added rise to numbers.
together. The only really new symbol is t itself - but you obtain that by ploughing . Populations can also vary in size. They may be quite small, as in the case of
steadily through the instructions contained in the formula. left-handed blind people; very large, as in the number of inhabitants of a country or
Perhaps it is worth mentioning here a common misconception. When you did continent; large and virtually infinite, as with stars in the universe or fishes in the sea.
maths, and algebra in particular, you learnt that formulae and equations have to be Some numbers are derived from populations of flexible and unknown size. Examples
regarded as puzzles, in which the various symbols must be taken apart, shuffled and are the scoresobtained from people participating in a personality test, or the number
reassembled in such a way that the correct 'answer' emerges. Formulae in statistics of bull's-eyes scored by a rifle marksman. In theory he could spend his entire life
aren't like this. All the puzzling and shuffling have already been done by statisticians, shooting at a target - just as theoretically, people could spend their lives completing
and a formula represents a series of oper;~tionswhich must be carried out in the order personality questionnaires - but of course this doesn't happen in reality, and so the
laid down by the rules of arithmetic- rather like the operationsconveyed in a recipe, scores obtained from people participating in experiments o r trials of some sort are
or a knitting pattern. T h e t test formula does not have to be solved in any way, but regarded as samples from populations of virtually infinite size.
just worked through, and you only need very elementary skills in arithmetic to be In fact it is very unusual to conduct experimental work on a complete population.
able to cope. Perhaps the crucial thing is to know in which order the various The examples which spring most readily to mind are those of sufferers from a rare
operations should be undertaken (given in the detailed operation schedule steps) - disease, o r a species of animal which is approaching extinction. Even then, any scores
and also, to prevent yourself from dying of boredom whilst slowly ploughing through or measurements taken from these populations would still only be a sample of a
the longer sequences of calculations! larger set of numbers available if the researcher spent longer measuring his or her
subjects' behaviour or attributes. It is extremely unlikely that you will ever work with
a complete population, although your aim, if you d o experimental work, will be to
T make statements about them, and not restrict your conclusions to the sample which
b r i
you have actually studied.
T h e skill with which you select your sample, and its size, will determine just how
accurately you can make statements about the populations from which it was drawn.
This population is called theparentpopulation, and inferences you make about it, on
i the basis of samples, are known as generalisations. In statistical formulae it is
I traditional to distinguish populations from samples by using the letters of the Greek
/l alphabet to denote an estimate based on a sample, as opposed to the more precise
figure obtained from an entire population. Apart from tllc difference this creates
Populations and samples
when calculating standard deviations-and we virtually alwaysuse sample estimates-
I shall now spend some time talking about the rationale which underlies statistic;~l I
this need be of no further concern to you.
testing. In gathering scores for analysis, we ;Ire normally working with only a fraction
of the information which is theoretically available. This partial availability of data,
coupled with the variability commonly found in complex organisms, is basically what Exercise
gives rise to our need for statistics tests. We call the large pool of information from
j I Identify the parent population of the following things:
which we draw a portion for study a poprrlrrriotr, whilst the smaller portion itsclf is (a) your pet cat (b) your bcst friend's IQ score (c) a raindrop (d) a sardine from a
known as a sample. tin ( C ) yourself (I) thc third person you saw on your way 10work this morning (g) thc

50
The first thing to realise about the term populn/iorl is that it does not necessarily
refer to people. In everyday language i t means pcoplc. but its official definition in 1
1
people you pass on your way home in the late afternoon (h) slowing of responses after
alcohol consumption (i) the Victorian cemetery at Highgate (j) Quecn Victoria
51
- l
I
Wliat are statistical tests all about? What are statistical tests all about?

1)iologist does not know which test-tube contains water from which pond. Howcvcr,
Populations, samples and statistical tests just by loohillg at the two samplcs hc can see that one is much grecner than thcother
So far then, you should have grasped the idea that a population is a large, often .' - duc to the diffcrent kinds of micro-organisms which inhabit wood and field ponds.
infinitely large, set of numbers derived from virtually anything at all. Also, that we Lct's pause again to take stock. The test-tubes contain samples of micro-
usually only deal with samples, which are smaller portions of the populations organis~ns.Our statistical samplcs comprise sets of numbcrs. Thc biologist's two
currently under study. How d o populations and samples fit in with statistical tests? In s;l~nplcsmight look diffcrcnt. So can numbers. A 'tcst-tube' containing the digits:
evaluating the results of an experiment, or some sets of observations, WC arc 1,4,5,2,7,9,4,3.0,5
normally asking the question: Is o n e set of rcsults reolk different from thc otllcr? looks very different from one containing:
What happens next is that if we decide. after using the appropriatestatistical tests, 22, 31, 6, 44, 25, 18, 39, 27
that the two sets of results really are different, then we conclude that they have been
drawn from two different p ~ p u l a t i o l ~Ifs . the numbers are not 'really' different, as This is the old cye-ball test again!
Just as the biologist could see. that the test-tubes contained diffcrent micro-
decided by the tests, then what we state is that they appear to have come from just
organisms: so we can sometimes see that samples of numbers differ radically from
o l ~ epopulation. This is where populations come into statistical tcsting, and I shall
cach other. No need for any stats tests at all! Unfortunately, experiments rarely
soon give more details. Samples are involved in so far as we normally work with them,
oblige by giving rise to scts of scores which arc as obviously diffcrent as the ones
rather than populations, and as a sample is not usually a perfect representative of its
above, but more often to ones like those I list in the next comparison. If the samples
parent population, we can never be quite certain that the conclusions based upon :,,
of pond water hadn't looked very different, then the equivalent in numbers might be:
them are absolutely correct.
Before I go on to describe how statistics tests might be used to evaluate thc rcsults Test-tube 1: 1 4 5 3 0 2 3
.,.

of a particular experiment, I would like to present an analogy to testing which Test-tube2: 3 6 4 8 9 2 7


involves strong visual associations, and which should help you to achieve insight into Now it has becomc quite hard to distinguish the two sets simply by looking at the
the rationale of statistical testing. numbers. This is another version of the problem concerning the results of the new
memory technique experiment, which was described in chapter 1. We want to know,
Fieldwork in biology by some means, whcther the two sets of numbers really d o differ from each other, or
not.
Imagine a country scene, and two ponds which are fairly close together. O n e of the Returning to the field, we must make some rather odd events overtake our
ponds is in a wood, and the other in an open field. A biologist wishes t o study the biologist in order to illustrate the rationale behind statistical testing. The biologist
micro-organisms which inhabit the ponds, and so sets out with test-tubes, nets, sets off, obtains his two samples of pond water, loses the labels from the test-tubes,
wellies and all the other essential paraphernalia. At great personal risk, he manages and on his arrival at the lab. leaves them on a table while he goes to change his wet
to obtain two test-tubes full of pond water. I shall pause here, to point out how the clothes. His colleague arrivcs on the scene. She kncw that the samples of pond water
ponds relate to statistics. All micro-organisms in the pond in the woods comprise werc to have becn collected, but can't remember whether thc samples were to have
population A . Similarly, the micro-organisms in the field pond make up population been taken from two ponds that day, o r just one of them. Can you translate these
B. What has the biologist done in taking test-tubes full of water from each pond? events into the statistical equivalent? You have two sets of numbers which look
You've got it! H e has drawn two samples. diffcrent. You wondcr whcther these samples have come from one source (popula-
As is usual in fieldwork, a downpour sets in, and the rain makes the labels which tion), o r whethcr they were drawn from two different sourccs (populations). If the
the biologist had stuck onto the test-tubes, fall off. On his arrival at thc lab., then, the numbers look completely different, then with no more ado you can conclude that
they have probably come from two different underlying populations.
I But how would the biologist solve the problem if the samples looked fairly similar?
! She would put the pond water under a microscope, and with the aid of this tool, and
the details it revealed, make a decision. How do we solve problems with similar-
looking sets of numbcrs? We take our samples and subject them to statistical analysis
- the appropriate tool - to reach a conclusion about their source or sources. Because
we have worked with a sample (water taken from a pond, or scores obtained from a
subjcct), and not a complete population, we cannot be 100% certain that the portion
we have inspected was truly representative of the parent population, and so we have
to allow for this potential source of error in our final decision. This will be covered in
more dctail later. Meanwhile, that's it. That's what all the fuss is about! We use
statistical tests to tell us (when we can't just sce) whether samplcs of numbers appear
What are statistical tests all about? What are statistical tests all about?

to have been drawn from one or two populations. The statistical test will also inform lnrgc population of adult identical twins aged 18- o r the largerpopulation of all adult
us what olJr likcly margin of error is. identical twins - or the even larger population of humans. The twins were asked to
learn some poetry. From each pair of twins, one was chosen to join the group which
A summary of experimental work was to d o the learning under quiet conditions, while the other was put into the group
which was to learn poetry under noisy conditions. By dividing our sample into two
In an experiment, we set up some alteration of conditions, and then compare the
groups, and by giving them different treatments, we will (we hope) create two
scores derived from the altered situation with those obtained from the control, i.e.
populations of scores. These scores are obtained after 10 minutes of learning, and
from the one in which there was no alteration. We thus obtain two sets of scores. O u r
they reflect the degree of material mastered. The stages of the experiment are shown
question is: Did the altered condition make any difference? The difference will be
in Figure 1 .
reflected in the scores, and so the question can be turned into one relating to a
statistical evaluation: A r e the scores from the two groups different in any way? This Stuge l
in turn is converted to: D o the scores seem to have come from one o r two Identify experimental Original virtually infinite
populations? We then carry out a statistical test - if the question cannot be population population
determined simply by the eye-ball test. If the scores do appear to have been drawn
from one population only, then we have to conclude that our experimental Sruge 2
Identify type of sample for Sub-population almost
manipulations failed to bring about any change in the scores, and the experimental
use in experiment infinite
group scores will closely resemble the control group scores.
A note about the use of the word 'population' with respect to statistical tests. It is
Stage 3
traditional to state the statistical aspects of experimental work in the terms1 have just Take a sample Sourceof a populationof
given. There is a potential source of confusion over the word 'population' though, scores
and I will try to explain it, so that you can avoid the confusion. You may remember
from exercise 1 that some groups of things can form an entire population in their own Stage 4 Quesriorl
right, or form a part of another population - and both at the same time. In Divide sample into Do the scoresseem to be
experimental work, whenever we take, say, a sample of subjects, then these subjects twogroups, apply fromoneorfwopopulations?
are drawn from a particular parent population. We then (usually) proceed to divide lreillrnent
our sample into two, and apply one treatment to one of the groups whilst the other Figure 1. The stages of an experiment on learning
serves as a control. We get our two sets of results, and apply statistical techniques to
them. Now it was said above, that we look to see (statistically) whether they have
From an experimental point of view, the scores obtained in A and B can now be
come from one or two populations. But we have been working with what was
regarded as two populations. From a statistical point of view, we can't tell whether
originally one sample, which was obtained from one population, before it was
the numbers from A and B have come from two populations until we havecarried out
divided into two. For statistical purposes, the sample we took becomes the
tests. If they seem to have come from one population, then we have to conclude that
population, and in fact if our experimental treatment did have an effect, the sample
in applyingour experimental treatment we did not create enough effect for the scores
becomes two populations. The point is that here we are talking about populations of
to be distinguishable. Our experiment did not 'work'. If, on the other hand, our
scores, and these must not be confused with populations of subjects. If it helps, try to
scores are distinguishable, we can conclude that the experimental manipulations did
separate entirely the idea of the scores which we obtain from subjects, or the scores
have an effect.
we are going to analyse, from the subjects, or populations of living things which
Although we know what we did when we conducted the experiment, in terms of
actually give the scores. In statistical work, it is just as if we takc the sets of scores
applying this o r that treatment, we can't tell whether what we did had any effect until
(maybe you can imagine them contained in test-tubes) and look at the numbers
after we have looked at the results, and perhaps subjected them to statistical analysis.
thcmselves, when asking the question about their source. It does not matter to
I t is as if, when we conduct our analysis, we have to pretend that we don't really know
statisticians whether the source itself was a population, sub-population. or sub-sub-
what happened before - like the biologist's colleague, mentioned earlier in the
population. They treat the numbers as if they have been directly derived from one o r
two populations - and so the word in this case means just a 'pool' of numbers. chapter. If the learning ability scores appear to have come from two populations, A
with noise and R without noise, then it can be concluded that noise does have a
particular effect. However, taking the experiment as part of the broader context of
A psychology experiment
psychological understanding, we would hope to generalise our findings from the
1 shall now describe a psychology experiment on learning. and examine the role of t\vins actually used in the study to all humans. If we talk of 'learning' generally, as
samples, populations and statistical tests in this set-up. Twelve sets of identical twins opposcd to poetry learning, then we are making another generalisation. but this time
aged 18 were selected. These subjects represent a very small sample from the fairly in terms of the material being learnt. You might wonder why the experiment was
54
IVhat are statistical tcsts all ahout?

carried out on twins, if the aim was to make statements about 'untwinned' humans.
Twins are particularly favoured in cxperimcntal psychology. This is because they
have the samegenetic make-up, thus reducing somewhat the amount of variability in
their behaviour, and s o helping to make the effect of experimental manipulations
more discernible.
Summary
THINK
O O U ~ Sb.
In experimental work we go through the following sequence of activities: OT THAT WAY.
1 We try to create two separate populations by giving different treatments to a
sample drawn from thc parent population.
2 We obtain scores from the two groups we have created.
3 We examine the sets of scores, using statistical techniques if necessary, to see
whether they really d o differ from each other.
4 If they differ, then our statistical test will enable us to put a figurc on the element In the previous chapter you were given, in somewhat laborious detail, the sequence
of chance likely to have been involved. If it is low, we concIude that the sets of of operations we follow in carrying out an experiment. Make sure that you have
scores seemed to have come from two sources (populations), and that our thoroughly grasped that material, for this chapter is simply an extension: some more
experiment was a success. If it is high, we conclude that the scores are not steps will bc added, and then the whole routine re-expressed, using the formal terms
distinguishable, and that our treatment did not bring about the effect which we had in common usage. By the end of thc chapter you will know about all the steps
originally anticipated. undertaken in experimental work, and most of the remainder of this book will be
spent on giving further detailsof some of the steps themselves - for instance statistical
tests, aspects of experimental design, and sampling.

Hypotheses and variables


In putting together an experiment, we start off by naming the phenomenon which is
at the centre of our investigation, stating which conditions we intend to manipulate-
and how - and considering what kind of data we hope to collect from experimental
and control groups for comparison purposes. The major step to be added to this
sequence of events occurs right at thc start of an experiment, and it concerns our
hclicf about thc effect which thc cxpcrimcntal manipulation of conditions will
have. Any idea or theory which makes certain provisional predictions is called a
hypothesis, and the preliminary idea we have in our experimental work is termed -
not surprisingly - the experitnen~alhyporhesis. Examples of experimental hypotheses
are:
That a particular drug changes specified organic tissue
That diet influences intelligence
That facial appearance matters in inter-personal perception
That advertising policy affects becr sales
That environmental pollution is responsible for altering plant life
That study habits influence exam marks
We have named particular things (organic tissue, intelligence, inter-personal
perception, etc.) and stated our belief that each is influenced in some way by
wmcthing elsc -the 'somcthing elses' being a drug, diet and facial appearance, etc.
All these 'things' and 'somcthing elscs' arc called variables. T h e term 'variable', used
in an experimental context, means anything which is free to vary, and in order to
describe them in a quantitative way, they have to be expressed in appropriate units.
Somelimes the units will bc quite obvious. likc inches, IQ scores, succcss rate on a
Hypotheses - Hypotheses

task, pints of beer sold, etc., but at other times ingenuity will be called for. For The terms one- and two-tailed are more c o m m o n l ~used than directional or
~nstancehow d o we measure and express taste, attitudes, beliefs and motivation? nondirectional. It is extremely important to know whether an experimental
Often we have to devise a rating scale specially for expressing a particular variable in
an appropriate kind of unit. Such 'home-made' scales are often regarded as being less i ,
hypothesis is one- o r two-tailed, for this information will be used in conjunction with
the statistical analysis to state the level of chance expectancy associated with a
trustworthy than well-established units in common usage. As you will discover in particular experimental outcome.
chapter 10, the kinds of units we use to quantify variables have an important bearing
Exercises I
on the statistical test we will choose for data analysis. l

The pairs of variables which occur in each experiment have separate names. The 1 Identify the IV and DV in each of the following proverbs: l

variable we manipulate is called the independent variable, and abbreviated to IV. The (a) Sparc the rod and spoil thc child
variable which we hypothesise will alter as a consequence of our manipulations is (b) Two heads are better than one
(c) Absence makes the heart grow fonder
called the dependent variable, o r DV. It is easy to remember which way round the IV 1
and D V are. The dependent variable alters as a consequence of the value of the
independent variable - its value is dependent upon this. The value of the independent
(d) Out of sight, out of mind
(e) A rolling stone gathers no moss ~
(f) He laughs best who laughs last
variable is free to vary according to the whims of the experimenters. The IV's and Are the statements likely to be framed as one- or two-tailed hypotheses? I

DV's named in the hypotheses listed above are given in table 1. 2 Decide whether each of the following research hypotheses is one- or two-tailed:
I
(a) Older people make slower learners
Table l (b) Alcohol affects reaction time
(c) Anxiety influences performance
Independent variable Dependent variable (d) Sunlight makes plants grow faster I

(e) Quality of bar staff influences the sale of drinks


A particular drug Specificorganictissue (f) Vandalism rises with over-crowding
Diet Intelligence 3 Compose three one-tailed and three two-tailed hypotheses. Check your creations with a i
Facial appearance Inter-personal perception fellow student or your teacher.
Advertising policy Beer sales
Environmental pollution Plant life
Study habits Exam marks The null hypothesis
Regardless of whether a hypothesis is one- or two-tailed, it is known as an
Most variables can be either dependent o r independent, within the context of a experimental o r research hypothesis. Experimental hypotheses can go by yet another
particular experiment. For instance, in the final example given above, it was name, just to confuse matters, this being the alternative hypothesis. Alternative to
hypothesised that students' study habits will affect their exam performance. what, I hear you asking? We'll discover in a few moments, when I have reviewed the
However, it is also reasonable to wonder whether students' exam marks might state of the game so far.
actually lead them to seek to improve their study habits! Beer sales, given as a D V The stages undertaken in experimental work are:
above, would become an IV in the context of studies concerning alcoholism, road 1 From an idea, formulate a one- or two-tailed hypothesis.
accidents or sales of soft drinks. 2 Decide what the units of measurement for the IV and DV arc.
3 Decide what values the IV will havc in your experimental manipulations.
4 Identify the parent population and sclect a sample of subjects from it. Dividc this sample
Directional hypotheses
into two.
Yowmight feel that the hypotheses listed earlier were all rather vague, for the words S Apply the experimental treatment to one group of subjects, and treat the other in an
'influence', 'affect', 'change', etc., are not precise. It often happens that we feel we identical manner, but with different values of the IV. The latter forms the control group. In
can make more specific predictions about the effect our manipulations of the IV will some experiments it is possible for subjects to take part twice, under comparable conditions,
have. For example we might predict that a specific diet will improve intelligence; and in this casc we speak of subjects acting as their own controls.
6 Collect the results, i.e. twosetsof scorcs reflectingdiffcrent valucsof the IV. Thescores will
certain experiences might cause a deterioration in a skill, perceptual o r mental
be valucs of the DV.
process; environmental pollution may reduce plant growth o r altered study habits 7 Analyse the results. Indoing so. you will be able to decidc how likcly it is that any differencc
might improve exam marks. When a hypothesis states a predicted direction of found between the sets of scores is a rcal one, as opposcd to one which may havc bccn duc
outcome - as seen by the use of such words as 'reduce', 'increase', 'lower' o r 'raise' - 10 chance faclors.
then it is called a directional, o r one-tailed hypothesis. The vaguer typesof hypothesis, X Draw a conclusion about whether thc original experimental hypothesis has bccn confirmed
like the ones given earlier, are known as nondirectional, o r two-tailed hypotheses. or not.
Hypothrsrs

This all scenls reasonably straightforward. You might gi~cssthat it can't possihly then Ihc null Ilyl~otllcsiscanb c ~ ~ ~ j e c tWhen
e d . this happens, theconclusion is that the
remain as simple as this, and that somctlling will 1,ccomc Inore coml,lici~tcd.Right cxl,c~.imc~lti~l I~yl~othcsis call bc take11 as correct. I f thc numbers do not dilfer, thcn
agairl! Thc 'something' which becomcs more complcx concerns thc wily WC cxprcss the statistician's gloomy statement is confirmed, and thc null hypothcsis stands.
our prediction of whether or not the experiment worked, whcn WC start tocarry out When this infornlntion is convcycd back to the scicntist, then it is concluded that the
thc statistical analysis. For reasons bcst known to mathematicians and philosopllcrs. cxl~crimcntalhylx~thcsiswas either wrong (and so must be discarded). or incorrect
whcn we start to grind our scorcs through the statistical trcatnleilt mill, we I~cginby Iicc;~uscof soille o t h c ~val-iablc
- whosc inllucncc was ovcrlookcd. 111 this case. the
making thc statcmcnt that we will assume theexperiment has not worked - this k i n g cxl~crimcntalhypothesis may hc rcjcctcd as i t stands. but modified at some f u t u ~ ~
reflected in the expectation that the two scts of scorcs which WC obrain will not diffcr. tlatc. in the light of the cxpcrimcntal findings.
The mathcmatical rationale bchind thisstrilnge turn of cvcnts is I~cyondthe scolx o f This [nay sccm unncccssarily co~nplicatcd,I)ccausc of thc statistician's negative
this book. 1-Iowevcr, what you d o need to know is that whcn you stal-t the approach to tllc expcrimcntal hypothcsis. Also, an understanding of the processes
mathematical treatment of thc data, you makc thc tcntativc statement: The il~volvcdis not hclpcd by thc Sact that many of the conccpts havc more than one
indepe~~dent variable does not affect the dependent variable in the way we anticipated. name. Anyhow, it sccms casier to think of the statistical operations, which occur in
This is known as the null hypotlzesis. It is because of this gloomy creature that the the middle of the cxpcrimcntal proccdurc, as purely mathcmatical cvcnts which arc
experimental hypothesis - its exact opposite - is sometimes callcd the alternative :
-
'
rather separate from the practical activities. When thc statistical fun starts, it is as
hypothesis. though you havc to forgct evcrything you alrcady knew about thc trcatmcnts you
have given the samples, and start afresh, with the pessimistic prcdiction that the sets
of numbers you arc now inspecting do not differ from each othcr.
The null hypothcsis can I,c dcfincd as: The statistical hypothesis of no difference.
The word 'statistical' should scrvc as a rcmindcr that it is the concoction of
statisticians, a i d not all intcgral part of cxpcrimcntal proccdurc.
Exercise
4 Dccidc what kind of hypothcsis cach of thc following is:
(a) That drinking blood makes the teeth grow
(b) That thcrc is no diffcrcncc bctwccn twoscts ofscorcs ot~taincdduringi~ncxpcrimcnt on
pcrscvcrancc
(c) That sunshinc affects mood
(d) That the drug Grornor makcs you shrink
I think that it is helpful, when trying to fit experimental work and statistical analysis (e) That thcrc is no diffcrcnccin social status ~ C ~ W thc
C Crcsidcntsofacouncil
~ housccstatc
together, to think along the following lines. . . and the owners of stately homes
The scientist has an idea which is put into the form of an expcrimcntal hypothesis. (I) That the provision of maternity hospitals changes thc perinatal mortality ratc
An experiment, designed to test the hypothesis, is subsequently undertaken. The (g) That children who cat breakfast show bctter concentration at school
results are collected, and a statistical technique or test is needcd in ordcr to decide (h) That the pcrccivcd rcdncss ofsun~ctsis altcrcd by thc quantity of volcanic dust circling
whcthcr the cxpcrimcntal hypothcsis was correct or not, i.c., whcthcr tllc IV did tlic Earth
(i) That vcgctarlalis havc a lowcr incidcncc of athlctc's fool
influence the DV in the manner predicted. At this stage, envisage thc scicntist ( j ) That traffic vibrations ;~rcclcratcthc dccay of old buildings
handing the results over to a statistician, saying 'I collected thcsc rcsults in sucll and
such a way; can you tell me whether they really diffcr from each othcr or not?' The
statistician thcn takes the numbers, and in fact need know nothing about thc purpose Like a dog w i t h t w o tails!
of the experiment or the scientist's prediction, to be able to analysc thc data. All that
Rcturning to the fact t11;1t expcrinlcntal hypotheses arc of two kinds, one- and
must be known is what kinds of scores the numbcrs represent, for instancc IQ,
two-tailed, I would like to say a littlc more about why thc choicc of hypothcsis has
minutes, item recall, people, inches, attitude ratings. acidity, etc.. and how wcll the
conscqucnccs for thc evaluation of rcsults.
experimental and control groups were matched. On rcccipt of thc data. thc
Supposc you say that you cxpcct a certain drug to uid rccovcry. This is a one-tailed
statistician makcs the initial statement which co~nprisesthe null hypothcsis: The sets
or directional hypothcsis, and you are claiming that you cxpcct rccovcry rates to
of numbers do not differ. Whatever the IV was, i t is assumcd not to have llad the
improve whcn subjccts take thc drug. So far so good. Thc statistical null hypothcsis
predicted effect on the DV.
applied to the data will be an cxpcctation of no diffcrcncc. I f thc scts of results do
The statistician thcn carries out the appropriatc analysis on the data. I f i t is found
appcar to diffcr. as sllown I>ya statistical test, and the rccovcry ratc is better wit11 the
that the numhcrs do appcar to diffcr. and it is very unlikely that this isduc to chancc,
drug than without i t . then all is wcll. N o diffcrcncc hctwccn the scts OS scorcs. ancl
Hypotheses

you would conclude that t h e expel-imcnt;~lhypothcsih was incorrect. ;11ic1reject i t i l l A summary of experimental procedure
l'a\.our o l t l i c null Iiypothcsis. Uut \\#hath ; ~ p p c n il'yo~rr
s sctsol'rcsults ; I I X qtlitc clc;t~-ly
different, but that t h e d r u g has mndc recovery .slr)wr~.?From the statistic;~lpoint of ., .I'o e n d this chapter, I shall now give a summary of the steps undertaken in a scientitic
view, the sets differ. However. a s you set u p a directional hypothesis. a n d thc results investigation. from start t o finish. T h e fourth a n d fifth steps have not yet been
;Ire in the wrong dircction, then you have no option l,ut to reject your c x l , c r i n ~ c ~ i t ; ~ l covered. but ;Ire included ;it this stage so that you now have a complete list of the
Iiypothesis. If you h a d established a nondircction;~l(two-t;~ilcd)liypothcsis, then it ~xoccdt~re.
would not have mattered in which direction the expcrimcntal g r o u p scores hod I 11:1\.c:In idea about the effect of one variable upon another.
altered. In either case you would be ; ~ b l cto reject the null li!potlicsis and,:tcccpt the 2 Dclirlc the IV and DV, and decide how they will he quantilicd.
cspcrimental hypothesis, which would have been that the clrug would h ; ~ v crrlfererl 3 Exprchs thc idea form:~lly.and by means of a one- or two-tailed experimental hypothesis.
the rate of recovery. 4 Decide \\.hat kind of statistical analysis will be appropriate.
Why should experimenters ever m a k e directional hypotheses ~lic~i'!In brief, 5 Specify a signific:tnce level and sample size.
because it is easier, f r o m a statistical point of view,.to show a difference between sets 6 Select the sample to be used from the parent population which is under scrutiny.
7 Apply the experimental treatment to one part of the sample, and treat thcotherasacontrol
of scores i f a direction is predicted a t the outset. A s most researchers prefer t o report
group.
experiments which have worked - and indeed, it is harder t o get t h e results of - ~ -
8 Collect the data.
uns~rccessfulcxperimcnts published - there is some motivation here. This is quite a 9 Analyse the data.
pity though, for it is often t h e case that a n experiment which hasn't 'worked' is just a s (i) Establish the null hypothesis.
v:~luablein leading t h e way t o further research. (ii) Apply appropriate statistical test or technique.
(iii) Accept or reject the null hypothesis in the light of Step 9 (ii).
10 According to the outcome of Step 9 (iii), decide whether the experimental hypothesis can
Space-saving abbreviations be accepted (if the null hypothesis is rejected) or rejected (if the null hypothesis has been
accepted).
Rather than writing o u t t h e word 'hypothesis', together with its type, it is usual t o
refer t o a hypothesis simply a s H, a n d t o a d d another letter o r digit t o signify the kind :
of hypothesis being referred to. For reference, the various hypotheses a n d their
abbreviations a r e given in table 2. Notice that there is no distinction between one-
and two-tailed hypotheses in terms of t h e abbreviations used. It is also u w a l t o m a k e
the following abbreviations:

E for experimenter - to be used in writing reports instead of the first person singular,
which is avoicled.
S in work involving animals; the long-suffering subjects, ranging frorn insects and worms,
through goldfish and rats, to humans.

Table 2. Hypotheses commonly encountered in scientific work

Alternative H,orH, The cxpcrin~cntalhypothcsis; ;I st;ltclncnt ofthc


Experimental H I or F/, prcdictcd influer~ccoftllc IVon tllc I>V
I<escarch M, or H:,
Null Tllc\tatistic;~lhypothesisof no diL'ferc11cc
Directional
One-tailed
Nondirectional
None
t"'
Nonc
Nonc
! An experimental hyl~othcsisin \vl~ichthc spccilic
direction of score differences is predicted
An experimental hypothesisin which tl~cspccilic
Two-tailctl None ! direction of score differences is t ~ o predicted
-
r
l Significance

Soh is not acceptable, and if this outcome occurs, then the null hypothesis cannot be
8 Significance rejected. Yet another tradition is followed in thc way we word our final statement.
We say that the null hypothesis 'cannot be rejected'- not that it has been accepted.
We refer to the level of probability, or significance, by using the abbreviation p.
Also, in stating the value formally, another symbol, S, is used. This means 'less than
or equal to'. So if our results turned out to liave a significance level of 0.05, we would
write either p = 0.05, or p S 0.05. If thc significance level fell between the values of
0.05 and the next one of 0.01, we would state the most conservative value, and give
p 0.05 in our results. Similarly, between 0.01 andO.OO1 would be stated a s p S 0.01.
Note that as the probability levels fall, from a numerical point of view, so our
confide~lcein the results rises, and we speak of higller significance levels - the highcst
which is normally quoted being the much-desired 0.001 level.
In this chapter I am going to enlarge upon what happens when we analyse our scores, In principle, as many writers on statistics point out, we should state the level of
and are forced to consider the element that chance factors might have contributed. significance we are prepared to accept before we carry out our experimental or
Material introduced here builds upon that presented in the three previous chapters. observational work. The level we choose rather depends upon the importance of
being correct in our final conclusion. One writer, Siegcl, illustrates this with the
A very significant experiment example of an experiment in brain surgery. If we were to investigate some new
treatment for a brain disease, then we would not be too happy about deciding that it
When we evaluate the results of an experiment, we normally describe the findings in
worked if our level of significance for trial experiments had only been 0.05. The cost
terms of their significance. The term has a rather special meaning in this context. It
of 5% of the brain surgery operations failing, both in terms of human suffering and
doesn't just refer to the importance of the experiment, and whether it represents a
expense, would be too great for us to wish to recommend that particular treatment.
significant advance in the field of scientific endeavour, but rather the outcome of the
We would probably adopt the most stringent level before we felt that we could
statistical analysis of the data. The significance, expressed as a precise numerical
recommend it for general use. Notice though that our choice of a significance level
probability value, tells us how sure we can be that our scores really are different, and
doesn't in itsclf tell us what to do about our findings. It merely tells us what kind of
our experimental treatment did work, as opposed to being the result of some fluke -
confidence we can have in them. Naturally, we demand a high level of confidence for
like an atypical sample, or the effects of some unconsidercd variable. In other
matters involving life or death outcomes (literally), or any other work in which the
words, even if you have decided that your scorcs appear to come from two
consequences of accepting the research hypothesis are not trivial.
populations rather than one - i.e. your manipulation of the IV did alter the DV -you
must state how certain you are of this.
You may recall from tlie chapter on probability that we can indicate our degree of
confidence in something by putting a numerical value on it. We could say, for
instance, that we are not very convinced of our success in an experiment, and state
Interpreting the outcome of statistical tests
20% as our level of confidence. We would mean that on one out of every five 1 Many statistical tests finish off with a particular 'statistic', for instance:

I
occasions we carried out the study (20% of the time) we would get these particular I from the I test
results accidentally - regardless of our manipulations of the IV. If we wanted, we 11 from the Mann-Whitney U test
could say the same thing a different way and give the figure of 80%. meaning that on I Tfrom the Wilcoxon test
only 80% of the occasions would the differencc in scores represent a real difference I and F from the variance-ratio test
due to the action of the IV on the DV. The lowest level of confidence which is
acceptable to scientists is the 5 % , or one-in-twenty level. In the 'Results' section of a
I! The statistic is then evaluated to see how significant it is, using the special tables
drawn up for cach. It is then possible to state whether a particular difference in scts
report it would normally be expressed as the 0.05 level of probability -or significance ! of scorcs waslikely to be due to a real effect rather than chance factors.
- following a brief note saying which particular statistical treatment has been A common mistake made by students is to think of tlie final statistic-e.g. U = 31.
employed. If our treatment of the data had come up with a certainty of 0.01 ( l % ) , or I = 2.58 - as some sort of answer, which may be correct or incorrect, like the
then we would feel much more confidence in accepting our experimental hypothesis,
as the chancc of fluke occurrences would now only be one-in-a-hundred. Finally, and
, answer to a sum in arithmetic. The statistics you obtain aren't really answers,
most pleasing, is when our results can only be attributed to chance f.'ictors at an
estimated one-in-a-thousand rate. which is expresscd as tlie 0.001 or 0.1% level. The /1 although it is truc that they do represent the final stages of working through an
analysis, and w can be seen as the finish of it all. The particular value obtained won't
be subsequently evaluated as right or wrong, but according to how likely it is that it
three levels of 0.05. 0.01 and 0.001 (S%, l % and 0. l %) are the ones traditionally niiglit be the product of scores rcflccting chance factors.
used in reporting the rcsul[s of statistical treatment. A probability levcl of more than
Significance

But what if I was wrong?


As you will have gathered, in stating a certain degree of confidence in your results,
you are also admitting to the possibility of being wrong. (We will discount the
possibility of being wrong because you made arithmetical errors during the calcu-
lations!) From a logical point of view, you can be wrong in two ways:
1 You decide that your sets of results differed, and accepted the experimental hypothesis,
when in fact the IV hadn't really affected the DV in the manner predicted, but the
one-in-twenty, or one-in-a-hundred event had occurred.
2 You decide that your sets of results didn't really differ, and so didn't reject (i.e. you
accepted) the null hypothesis, and concluded that the experiment hadn't worked - when
really, it had. It might be that great variability in the scores prevented the statistical
technique from showing a significant difference, for instance. Or i t might have been that It is now time to tackle the much-dreaded statistical tests! Actually, I am sure that you
your method for quantifying the DV was not precise and fine enough to show up rather will be surprised by just how easy they are to carry out. T h e difficulties associated
subtle changes which took place. with statistical tests lie in the rationale behind testing and interpretation, rather than
These mistakes are called, respectively, Type l a n d Type IIerrors. Again, in 'real life' in arithmetical aspects. Before we embark on the first test, make sure that you
(whatever that is!), your willingness t o make either of these mistakes will be understand the topics covered in the last four chapters.
dependent upon the kind of research you are doing, and the consequences of drawing
incorrect conclusions. I A fictitious experiment
First I will outline an investigation, and then I will use the results in a statistical
analysis.
A keen market research student wished to discover whether his nine fellow
residents in a student hostel consider the washing up liquid Gresego t o be better than
its cheaper rival, Kwikclene. Privately, he thought that Gresego was better, but he
was careful not to let any of the others know this. H e asked them to participate in a
small study, and requested that each person wash up an identical pile of dirty dishes
on two occasions on consecutive days. Half the students were to use Gresego first,
and the other half Kwikclene first. As soon as they had finished the washing up, the
students were required to give a score out of ten to the liquid they had just used. The
higher the score, the better they judged the washing u p liquid to have done its job.
The investigator was careful to regulate such variables as the temperature of the
water, the amount of washing up liquid used, and the amount of time and effort put
into washing up. In addition, he did not let the students know which liquid was being
used on each occasion, just in case that knowledge and certain prior expectations
caused the student to give a biased judgement.
Exercise
I In the experiment just outlined. identify the IV and DV. State the experimental hypothesis
and decide whether it is one- or two-tailed. State the null hypothesis.

So, the market researcher obtained eighteen observations in all, collected in pairs
from the nine participants in the study. H e is now in a position to evaluate their
judgements, which are given in table 1. If you hadn't worked it out (and shame on
i
i
you!), the experimental hypothesis established here was a one-tailed one, as the
prediction had been that Gresego would give berrer results (i.e. higher scores) than

i Kwikclene.
Simple statistical tests Sirnple statistical tests

Table l. Ratings obtained for the efficiency of the Iii~\~c Ixcn if the ~,rol,al>ilitvlcvcl of 0.Oi (five out of one htrndrcd) h i ~ dIxcn tllc
washing up liquids Gresego and Kwikclene
outccaiic. 7.1111s we can rcjcct thc null hypothesis quite h;~llpily,antl ucccpt thc
csljcrimcnt;tl liypothcsis. 'The two sets o f numbcrs \vIiich our s a ~ n p l c scomprise
Judgcmcnts of perfo1111;incc ; ~ p p c ; ~I(,r havc Ixen drawn from t\vo different parent populatiorls. not one. and s o
Subjcct Grcscgo Kwikclc~lc tlic niarkct rcscarcll studcnt c;~riconcludc that Grcsego does it1 fact appcitr to
~vocluccbcttcr I-csults than Kwikclcne. O f course, n'hethcr it is better to the extent
that its highcr pricc can 11c justilicd is quitc another question. and not one which is
answc~.cclby the invcstigi~tio~i just described. In a formal rcport. at thc cnd of the
R c s \ ~ l t ssection,
' it would be saidof this analysis:
Thc I ~ C S L I of
~ ~ S[lit statistical allalysis wcrc significant at thc p S 0.025 lcvcl (onc-tnilcd
Wilcoxon tcst; T = h; N = 9 ) . and so thc cxpcrimcntal hypothesis was accepted. It was
concludcd that thc pcrformancc of Grcscgo was judged to bc supcrior to that of
K\\~ikclcnc.
I.;xcrcises

2 Carry out the Wilcoxon tcst on the following pairs of scores. and interpret thc lest results
T h e eye-ball test indicates that Gresego has indeed produced better results - it w ~ t al onc-tailcd
~ hypothcsis cstablishcd.
obtained a median score of 7, a s compared with Kwikclcne's median of S . Also, (a) 210, 200 (11) 250, 244
there arc only two students who gave the higher rating t o Kwikclene in thcir paircd 240, 220 235. 238
comparisons. Not content with the cye-ball test however, and maybe in an effort to 190. 260 202, 221
convince the others that this was a 'proper' scientific investigation, not just a s c h e m e 170, 220 196, 215
dcvised t o get the washing up done by others, thc investigator decided to carry out a 270, 440 227, 234
statistical tcst on the sets of scorcs. A test which is appropriate for this typc of data is 230. 260 218, 232
220, 580 232. 237
the Wilcoxot~tcst. It can be used whenever setsof scorcs are 'paired off' in some way.
220. 920 224, 245
In this experiment, cach subject has contributed scorcs towards both sets of ratings, 215,224
and in presenting and analysing the data, the pairs are kept in line with cach other. 203, I92
T h e workings for the Wilcoxon test a r e given in operation schedulc 7. Study the steps 3 Intcrprct thc following vducs of T . assuming that they urcrc derived from cxperimcnts or
involved, and notc that they finish with the final triumphant statement: investigations involving one-tailed hypothcscs.
T=6 ( a ) N = 10, 7 . = 7 (b) N = 12, T = 5 7 (c) N = 2 0 . T = 7 (d) h ' = 10. T = 4
Now that's not thc answer, but a value for the Tstatistic which WC must procced to ( c ) N = 1 6 , T = 2 Y (I) N = 1 8 . T = 4 9 (g) N = 1 3 . T = 1 7 (h)N=I5,T=29
intcrprct, using the appropriate table. O u r evaluation is the finalstep of the analysis,
for it is this which gives us a probability level, which in turn indicatcs whcther our Interpretation with a two-tailed hypothesis
results might havc arisen by chance o r not. If the final valuc of Tcould havc arisen on
Supposc tliat in the original investigation the student had not had an opinion about
rnorc than o n c in twcnty occasions, then we cannot rcject the null hypothcsis,and we tlic rclativc mcrits of Gresego and Kwikclenc, but mcrcly wondcred whcther eilher
must concludc that the original experimental hypothesis was wrong, i.e., that of thcm might turn out t o be a bcttcr brand. Here we havc a two-tailed hypothcsis
Grcscgo is rtof judgcd t o be bcttcr than Kwikclcne. If, howcvcr, the tablc indicatcs
rathcr than a onc-tailcd, o r directional, hypothcsis. In the r c ~ u l t s \ \ ~uPoultl
e look for
that on only o n e in twcnty, o r cvcn fewer occasions, niight thc results have arisen 'by a consistent diffcrcncc bctwecn the two sets of scorcs which would reflect diffcrenccs
i~ccident',then WC are in business! Thecxpcrimcntal hypothcsiscan bc accepted, and in washing up powcr (tllc DV). llut WC arc not anticipating that o n e particular brand
the investigator would conclude with some confidcncc that Gresego is thc supcrior
will producccithcr highcr or. lower scorcs than the othcr. Thc diffcrcncc could be in
brand of washing up liquid.
cithcr direction. Of coursc. if thcrc isn't any diffcrence. then the null hypothesis will
I-ct's sec what happened then. T h e results of thc analysis, T = 6, arc evaluated stand, and wc would concludc that the washing up liquids cannot be distinguished in
using table S2. T h e stcps involved in evaluation arc givcn in operation schcdulc 7a.
terms of thcir pcrformancc.
T h c outcome is a significant value of 0.025 for o u r particular T- in othcr words, WC
W e will use the dat;i obtained carlicr to sec what would happen at thc analysisstage
might havc cxpectcd to get this valuc d u e to chance factors on only two and a half out i f thc stuticnt had cst;~l>lishcda t\vo-tailed. rathcr than a one-tailcd hypothcsis. T h c
o f ;I hundrcd occasions. Perhaps five out of two hundrcd is ;in easier way of thinking
calculationscarricd out ill thc Wilcoxon tcst are unchanged. and once nlorc we havc
about this p:~rticuli~rv;~luc.O u r confidcncc in thc rcsults is stronger than it would
;I valuc o f Twhich is h. T h e proccdurc for interpretation of T thcn alrcrs as follows. ,

(1s
Si111plestatistical tests Simple statistical tests

Instcad of taking the tablc's valuc for 7 ' = h when N = 9. using the top linc of tablc Thc same tcnlpt;~tion to go back on a statclncnt ~ n a d eat the outset of i l l ]
S?. \VC rcfcr to thc linc in~nicdiatclybelnw, which is headed 'Level of significu~iceIhr experiment can occur in another situation. This is thc onc which was dcscribecl in
two-tailed tcst'. A value of T = 6 will still be in the same position in the body of the chapter 7, in which you start off with a one-tailed hypothesis, but find that although
table, but instead of reading off the significance level of 0.015. we now give it a your results are statistically significant. the rtirecrion of difference is the wrong onc.
probability value of 0.05. This is preciscly twice the first value, and equivalent to the Actually, i t is harder than it sounds to alter the original hypothesis in this casc.
one in twenty accidental occurrence rate. rather than the one in forty (or two and a because it is more than likely that the experimental work (and hypothesis) fits into ;I
half out o f a hundred), which wasobtained previously. Thus the result hasdiminished general background of known phenomena or some theoretical model. I t is this
in terms of level of significance. A swift glance at the top of table S2 will tell you that background or model (or maybe only n simple observ;~tion),which governs thc
whatever significance level you decided upon for the intcrpretntion of n one-tailed direction of the prediction, and so it isnot usually possible tochange the experimental
tcst, i t will be doubled for a two-tailed one. Thus it is easier to obtain significant hypothesis without altering the underlying assumptions o r model quite radically. In
results with a one-tailed hypothesis. You can check this out for yourself by the event of an unexpected direction cropping up in the results, most scientists would
considering a T value of 7, obtained from nine pairs of scores which have been decide to carry out the experiment again - a process known as replicarion. They may
subjected to the Wilcoxon test. This value is just a little too large to be significant at or may not predict a new direction of results in the second attempt, depending upon
the 0.05 level for a two-tailed test, but you can sec that it would be significant for a whether they viewed the.outcome of the first experiment as being due to the effect of
one-tailed test, as it does not exceed the figure of 8, given in the far left column. In some variable they had overlooked, o r due to chance events. In carrying out the
presenting the significance level in a formal report, we would say that the value of experiment once more, they have a second chance to look carefully at what is going
T = 7, for a one-tailed test, would be significant at the 0.05 level. This would be on, and perhaps rule out the freak event explanation.
written something along these lines:
Exercise
The results of the Wilcoxon test were significant (p S 0.05, when N = 9, T = 7), and so
the null hypothesis can be rejected. 4 Interpret the values of Twhich were given in exercise 3, but this time assume that the results
have been obtained from investigations in which two-tailed predictions had been made.
You may suspect that temptation can arise over this business of relative signifi-
cance levels of one- and two-tailed hypotheses. You would be correct! Suppose that
it had been predicted that the two types of washing up liquid might differ, and a
relatively vague, two-tailed hypothesis had been established. The investigator finds
out that there is no significant difference between the two sets ofscores when they are
1 An even simpler test - the sign test
If you have carried out the exercises and had a go at the Wilcoxon test, you were
interpreted as two-tailed results, but that there is a significant difference if they are probably surprised by just how simple the arithmetic was, and found the evaluation
treated as one-tailed test data. The temptation is to say at that stage 'Well, I really of T t o be the trickiest part of the procedure. The next test we will look at, the sign
suspected all along that such a set of results (naming the list with the higher set of test, is even easier, and could probably be successfully carried out by a clever
scores) would be better.' If the investigation formed the basis for an experimentaI chimpanzee! I have chosen to describe it after the Wilcoxon test though, because I
report, the experimental hypothesis could then be written up as a one-tailed want to use its results to illustrate how tests can vary in their ability to detect
hypothesis, and no-one would know that things had ever been imagined to be significant differences in sets of scores. We call this ability power-eficiency, and it can
otherwise. As I mentioned earlier, publication traditions make life much easier for be likened to degree of precision in tools. The finer an instrument, the more suitable
researchers who obtain significant results, and so rather strong will-power is needed it is for tasks requiring attention to detail; the more sophisticated a test is, the better
to avoid succumbing to this temptation if it arises. it can detect subtle and perhaps small differences between sets of scores. You will
soon be ablc to decide for yourself whether the sign test can hc regardetl as more
sophisticated than the Wilcoxon.
The step-by-step procedure for the sign test is given in operation schedule 8, but I
shall outline here what goes on. A s with the Wilcoxon test, you obtain differcnces
between the paired scores, and give each diffcrcnce a plus (+) or minus (-) sign. In
I the sign tcst though, instead of ranking the differences hetween scores and rhcn
summing all the ranks which have the same sign, you merely colorr the number o f

l differences with each sign. This is equivalent to saying 'There are so many (numbcr
stated) scores in one set which are smalier than their counterparts in the second set,
and so many (state numbcr) which are larger.' Having found out how many scores arc
in one particular direction (maybe having been moved there by your experimental
1l 71
Simple stalislical lesls Simple statistical tests

manipulations), you then evaluate this number, using tablc S3. It tclls you the referring to. for N = 13. Imagine that we found three out of thirtecn ladiesdidn't lose
probability o f your particular directional changc. weight with the treatment. What would you conclude? If you had set up a one-tailed
If I predict that o n e condition will givc rise to higher scorcs than another, I am prediction,.then you would besafe, and couldclaim significance at the0.05 level. T o o
predicting that all (or most) of one group will be greater than their paired bad if your prediction had been nondirectional though, for a probability of 0.1 is not
counterparts. If they are all higher, that is fine. T h e sign tcst will givc a significant judged significant; thc maximum number of 'failures' you can have for significance,
result. If a few of the scores are not highcr though, then this is not quitc so good, and when N = 13, is two. What you can't do. is change your mind after the analysis!
the likelihood of obtaining a significant diffcrence is diminished. It is common-sense Hcrc's another example. A batchof 60 youngtortoises isgiven the tortoise 1 0 tcst.
really, but by using the mathematical wizardry incorporated into the sign tcst, we can O n thc basis of the score they obtain, and also their leg length, they are then paired
give probabilities associated with all the different proportions of changcs which off. Next, they a r c put individually into a nice maze o n the lawn, a n d told that if they
might occur. Note that if WC had predicted lower scores rather than higher oncs, thc hurry along through it, then they will be able togive a hare a good punch on the snout.
procedure would be just the same, but with the signs reversed. Whethcr higher o r O n e group of tortoises is given a drug (cleverly disguised in a lettuce leaf) which is
lower scores a r c predicted from one group of scores, the hypothcsis would be a supposed to increasc running specd: the other group is given a dummy pill, callcd n
directional, o r one-tailed one. It is quite possible to use the sign test with two-tailed placebo. and which is also wrapped in a lettuce leaf. Placebos a r e always given in
predictions, just as with the Wilcoxon test, and the chosen significance level is studics involving drugs, to ensure that all subjects appear t o be receiving identical
sclected from the two indicated a t the top of the columns of the evaluation tablc. treatments. O n e by one, the tortoises all have their turn in the mazc. Sure enough,
Let's suppose that in an experiment using twelve subjects, we obtaincd paired sets the tortoises who have taken the drug race along like greascd lightning - all except
of scores. Without going into great detail, the data have been collected from an o n e that is, who lets the side down. Apart from his running time, all thc others have
experiment on dieting. Each lady who participated contributcd two scorcs; unaided shortcr times than those in the group which had the placebo. Analysis: N = 30, and
weight loss over a certain time interval, and weight loss over an identical period, but we have an S of 1. This is way below the value of 5 given for the 0.0005 probability
during which they were alsoscoffing Waist-Away - the Wonder Weight Loss Powder. (one-tailed test), and s o the conclusion was that the results were highly significant.
A s we aren't cynical about the worth of commercial diet products, we prcdict a T h e experimenter is so pleascd that she lets all the tortoises punch the hare!
greater weight loss when the powder is being used than when it isn't - and s o we have
HES BPEN
established a one-tailed hypothcsis. T h e results of the experiment were that ten out WIN^ THERE
of the twelve dieters did have greater losses with the treatment than without, whilst
two subjects didn't. If w e carry o u t the sign test on these data, then we shall end up FOR HOURS-,'
with the valuc of S, the number of less frequent signs, being 2, and a value of N, the
number of paired scores which a r e bcinganalysed, being 12. If you inspect table S3,
then you will see that the values in the body of the table, when N = 12, for the 0.05
a n d 0.025 levels, is 2. If o u r obtained value of S is equal to o r smaller than the
tabulated value, which it is, then we conclude o u r results are significant, and the
scores really d o come from two populations. S o we would decide that the Wonder
Weight Loss Powder does what it claims to do. Although o u r obtained value of 2 did What would be the conclusion if six of the tortoises had been membersof the Hare
not exceed the values given in the columns for both the 0.05 and 0.025 levels of Protection League, and only ambled along, despite having rcceivcd the drug? A n S
probability, we would quote in our report whichever lcvel we had decided upon valuc of h falls between the two tabulated values of 7 and 5. for the 0.005 and 0.0005
before we undertook the data analysis, i.e. if we had said that we would accept the one-tailed significance levels. S o thcse results would b e stated as being significant a t
0.05 level, we would claim significance at that level, but if we had said beforehand the 0.005 probability level, for they didn't quitc make the 0.0005 mark. W e always
that we would not take anything highcr than 0.025 as beingsignificant, then we would extrapolate to thc most conservative valuc when a test statistic falls between
give that level in o u r formal statcmcnt. In practice, what tends t o happen is that we tabulated values. T o fail t o gct significance with a one-tailed prediction, we \vould
don't specify in advance that WC will accept results of a particular significancc level, need t o have at least cleven tortoises whose spccds are slower than their matched
and then quote that, but wait t o see how things work out, and then rcport thc bcst counterparts, but it would be ten if we had set up a two-tailed hypothesis, for the
level of significance which was obtaincd. So, for instance, if all our dictcrs had done maximum pcrmittcd number of 'deviants' at thc 0.05 lcvcl is givcn as nine.
better with the treatment than without it, thcn the number of less frcquent signs (S) Earlier I said that scores might havc been 'moved' in a particular direction by
would be 0, and we could claim significance at the 0.0005 level. If we had not cxperimental manipulations. Don't takc this to mcan that in the sign test you must
predicted a direction of difference, but established a two-tailed hypothcsis, thcn our always usc thc results from the experimental group as if they were the oncs which
obtained value of S = 0 would b e significant at the 0.001 level. have 'moved'. In working through the calculations i t is quite arbitrary which column
Now cast y o u r e y e down table S3 to the line below the one we havc just been of scorcs is subtracted from the other, and it is always thc number of scorcs with the
72
Simple statistical tests Sirnplc statistical tests

less frequent sign which gives us the value of S. The sign tests tells us the prohahility justify dropping it, then doso. If you can't justify the decision, then don't, o r you will
of obtaining a low number of scores which are in a different direction to the be accused of fiddling your data!
remainder - and like all statistical tests, i t has nothing to say about cause and effect, It is best for you to gain insight into the relative merits of the Wilcoxon and sign
predictions and interpretations. What it does say, is that if we get a certain number tests by working through them, using the same scores. Even more illuminating is to
of scores which are out of step (in either direction), we can conclude with a specified create fictitious data for your calculations, and to attempt at the outset t o obtain
degree of confidence that the two samples have been derived from different specific significance levels. Exercise like this is really good for you!
populations. The sign test can also be used on data which are not derived from I hear a few voices asking how we know which test to do, when we are faced with
experiments, e.g. observations of many types, opinions or information derived from the analysis of sets of scores which have been matched off for comparison purposes.
surveys and questionnaires, etc. The chief requirement is that the scores are paired Whenever we collect data which comprise definite scores, then the Wilcoxon would
off for comparison purposes. be preferable to the sign test. Sometimes however, the data can only really be
Exercises expressed verbally. For example subjects' ratings of some food item as 'better', 'the
same', or 'worse', after it has been subject to some change; whether people 'agree',
5 Carry out the sign test on the data given in exercise 2 . Evaluate for hoth one- and two-tailed
.don't know' o r 'disagree' about something or other; whether animals seem to be
tests.
6 Find the significance of the following values of S for both one- a n d two-tailed tests: 'more' or 'less' active under two conditions, etc. Whenever we can state that there is
a difference in a particular direction, but are not able to quantify the amount
(a) S = l , N = 8 (b) S = 5 , N = 12 (c) S = 4, N = 18 (d) S = 3 , N = 20
(e) S = O , N = 6 (f) S = O ,N = 2 5 (g) S = 4 , N = 1 6 (h) S = 0 , N = 5 i precisely, then the sign test is the one to use. Items falling into one category are
labelled as if they were pluses and into the other as if they were minuses. Items which
Comparison of the sign test and the Wilcoxon test don't change - o r in the example above, the 'don't know' category - are counted as
tied, and that particular pair of scores extracted from the analysis, with a consequent
So far, I have described how the sign test works, and made some comparisons reduction of N. The sign test is thus a very useful instrument for many types of
between it and the Wilcoxon test. I would now like to look at the two together a little investigation in the social sciences when non-numerical data have been collected. I
more closely, so that you get better insight into what the power-efficiency of a test I shall say more about the way we quantify variables, and the relevance this has for
means, and also, so that you can see why it is that the Wilcoxon test is the better of I choice of statistical test, in the next chapter.
the two, under most circumstances. We will work with the data obtained in the I
washing up experiment, and which are analysed in the steps comprising operation
schedule 8. The values of 2 and 9 were obtained for S and N respectively, and after
referring to table S3, it was concluded that the null hypothesis could not be rejected.
However, when the Wilcoxon test was carried out on the same data (operation
schedule 7) the conclusion had been reached that the null hypothesis could be
rejected, as the sets of scores were significantly different at the 0.025 level. The
conclusion differed despite the fact that both the Wilcoxon and sign tests were
evaluated for one-tailed predictions. How is it that we get this difference between the
two tests?
The Wilcoxon is the more powerful test of the two, because given the same sample
size, it correctly rejects the null hypothesis, when the sign test doesn't. It can do this Another easy test - the Mann-Whitney U t e s t
because it uses more information from the sets of scores than the sign test. It works The investigation into washing up liquids described earlier involved ninc subjects
on the actual size of the ranked 'wrong' direction differences, not just the number of who kindly contributed pairs of scores for the advancement of human knowlcdgc. I t
scores which are in the wrong direction. In using more information it is thus a more \\)as the pairing which was the major factor in deciding that the Wilcoxon tcst was
precise instrument for detecting differences. You may be able to see, from the appropriate for the analysis of the results. In the experiment, pairing occurred
workings of the Wilcoxon test, that it is sometimes possible to get a significant I,cc;~usesubjects acted :IS their own controls, but i t tvould also h;~vchccn ~>os\il>lc
to
difference between sets of data when there are many scores in the wrong direction, use it on atly data which are paired off in some \\.;ty - maybe scores collcctcd from
but only slightly so. and also, that ;I single score in the opposite directioncouldcount identical twins, o r from unrelated subjects \\rho had been matched up on the basis of
hc;~vilyin the test by giving rise to a large value of T, and so forcing the researcher to some attributesconsidered relevant. l - h e racing tortoises are an cx;rml>lc;towns and
concludc that the data werc drawn from one population. This is the old problem of cities in different cot~ntricswhich are 'twinned' provide another. Pairing can also bc
the outlier again, by the way. Does the extremely I;~rgc,unique score mean that it is seen in 'before' and 'after' situations, although things get tricky hcrc. for therc is
atypical in some way, and if so,canit be dropped fromthe analysis? I f youarcentirely always the danger that exposure to the first condition is going to alter the reaction to
confident that something very distinctive brought about the odd score, and so can thc second in addition to change brought about by the delibcrnte espcrimcntal
S i t ~ ~ pstatistical
lc tests

monipul;itions. Whenever possiblc scores are paired, Ilcca~rsethe ntatching reduces Table 2. Sandwiches consumed by overweight and
the mount o l random v;~riation\vhich c;~rioccur. ancl wliicli oftcn ;~ctsto mask real underweight subjects in the lime-tampering
differences. Statcd lnorc formally, this variation is Inore likcly to increase the experiment
likelihood ofType I1 error.
Suppose that thc expcriment had not produced pairs of scorcs tliough. If we had
just obtained data frotil eighteen subjects, with nine in each group, and the ordcring
of scorcs within each set \\!as quitc arbitrary. then i t would not Ilc correct to carry out
cithcr the Wilcoxon or sign test. I f therc tverc no matching, thcn you could not
sensibly decide which scot-e in one set would provide a 'partner' for a 1)articular scorc
in the othcr. Indeed, a test which worked on un~natched,or i~idel~c~rcletrt, sets of
scores must be used. The equivalent tcst to the Wilcoxon, except for the matching
requirement. is the Mann-Whitney U test. Because it does not involve working with
pairs of scores, it is quitc possible to set up an experi~nentand analysc data from
groups of scores which arc of unequal size. Obviously, this can't be the case when
scorcs are matched!
Just as the Wilcoxon test gave us the statistic T, and the sign tcst S, so the table diffcrs some\vhat from the tablcs for the Wilcoxon and sign tests. as it has to
Mann-Whitney test specifies certain operations which will enable us to find the value c;tter for samples tvhich may differ in size, and so which can't bc given a singlc value
of a statistic termed U. In fact two statistics are derived, U and U' (rcad 'U prime'). of N. Instead, the numbers across the top indicate the size of samplc B (which will bc
In a sense thc two are interchangeable: at the end of our calculations, the smaller thc smaller, if thcy aren't of equal size), and those down the left-hand side the sizc of
becomes U, and the larger U'. It is the value of Uwhich we then go on to look up in samplc A . At the points where particular values of N,, and N, intersect, two values
the tables to determine whether there is a significant difference hetwcen the two sets can be sccn. The top numbcr givcs thc maximum permittcd value of U which can be
of scorcs. permitted for significance at the 0.05 level (two-tailed test), and the lo\\,cr nutnbcr thc
niaximun~value of U for thc 0.01 lcvel (two-tailed test). The same tabled valucs can
The Mann-Whitney test in action be used with one-tailed tests, but now the significance levels would be 0.025 and 0.005
During investigations into the different eating habits of fat and thin pcoplc, respcctivcly.
volunteers (eithcr overweight o r underweight) were invited to a psychology lab. to On tlic wholc, the \vorst aspect of the Mann-Whitney U tcst is trying to rcmcmhcr
take part in an expcriment on visual perception. They were required to spend a few its name!
hours isolated in individual cubicles. Unknown to the subjects, the clocks in somc
cubicles had bccn tampered with, so that they ran a little fast, and gained one hour in
evcry four. Tow;~rdsthe end of the session, subjects were told to hclp themselves to
sandwiches which were provided. The cxperimenter, cleverly disguised as a lab. rat,
carefully rccordcd what everyone ate. Lo and behold! The overwcight people who
bclicvcd i t to he later in the day than it was, ate morc th;m thcir underweight
counterparts. The number of sandwiches consumed by both groups of subjccts is
shown in tnl~lc2. For reasons unrelated to the availa0ility of fat and thin sul~jccts.
morc fat subjccts p:irticipatcd in the experiment than thin oncs. Thc individual steps
for working out the value o f U arc given in operation schcdulc 9.
You \\.ill notice t l i ; ~ tin Step 4. and aftcr that. we take tlic s~nallcstset ofscorcs. i ~ n d
carry out our calcul;~tions114th that group only. The choice of scores for \\orking tvith
i> quite arbitrary. llut thc stnallcr is normally titkcn. for case of c;~lculation.Working
with the Ilrrgcr set gives the same valucs for U and U'. and in Llct i t is quite a uscf~rl
chcck for ;~rithmctic;tlcrrors to rcwork tlic tcst, using thc numbers iron^ thc largcrsct
on thc >cconrl occ;~sicln.Whcn you have scorcs from groups of cquiil sizc. thcn it
doesn't niattcr 1\4iicli oncs you take to usc in the c;~lcul;~tions. Ag;~iti.rc\vorking the
d;tta. l ~ u tusing the otllcr group's results. provides ;I useful check. Whcn yo11 It;~vc
7 Cirrry out the h1;11in-\\'llitncytcsr on thc data provided i l l taldc 2 . (iivc \i~nilic;rnccIc.\cls
coml~lctctlthe ccrlcul:~tions. the \.al~tcof U obtainctl is looked up in talllc S-!. .fhis for I>otholic- ;111dt\\(~-ti~ilcd
prediction\.
S i n ~ p l statistical
e tests

8 Carry out the Mann-Whitney test on the followingsctsof data:


(;I) Sct A Sel I3 (b) Set A Set B (c) Set A Set B
10 Ulhat's in a number?
10 10 25 11 150 100
9 12 30 15 100 250
15 4 45 22 125 225
18 3 33 9 225 100
16 0 42 13 250 125
I 29 8
7 32 7
22 16
27 15
37 18
55 15
9 Frorn the information given below, find the significance of the Ustatistics obtained for both T h e material included in this chapter is independent of earlier topics. It has probably
one- and two-tailed hypotheses. not occurred t o you before, but (and get ready for the great initiation!) we use
(U) N A = 4 , N B = 6 , U = 3 (b) N A = 3 , N H = 5 , U = 6 numbers in different ways. Numbers a r e symbols, a n d sometimes they a r e used in a
(C) NA = 6, NB = 8, U = 8 (d) NA = 10, NB = 10, U = 19 very rough a n d ready way. O n the o t h e r hand though, w h e n they meet with certain
(e) N, = 12, N B = 18, U = 46 (E) N, = 20, N B = 9, U = 50 requirements, they can also b e used in a n extremely precise manner. I t is important
10 Decide which test is appropriate for data derived from the following experiments and to know what kind of n u m b e r a score is, f o r this partly determines the kind of
investigations: statistical analysis o r treatment which can be carried o u t o n the data.
(a) Thirty people are given a questionnaire in which their attitudes towards corporal
punishment are measured before and after they have been shown a film on the subjcct. Levels of measurement
(b) Two groups of subjects participate in an experiment on sleepdeprivation. They are not
twins, and are not matched in any way. First, let's take a look a t the different kinds of numbers which exist. A s all numbers
(c) Identical twins take part in an experimcnt on alcohol consumption. One twin from describe o r measure something o r o t h e r (a variable), w e talk of the various kinds of
each pair is in the experimental group, and the other in the control group. Each twin numbers as achieving a certain levelof measuremenr. O u r measurements might b e at
contributed one score to the results. a very crude and inaccurate level. they might b e a little m o r e precise, o r extremely
(d) Ten villages are selected from British rural districts, and on the basis of their scores in fine in the kind o f discrimination which they make. T h e r e a r e four levels of
the Best Kept Villagc competition, they are paired off. One village from each pair is asked
measurement, a n d they a r e listed below in ascending o r d e r of precision.
to display prominent anti-litter notices. At weekly intervals, an inspector goes around the
1 Nominal
villages and gives a rating out of 10 for each, dependent upon how much litter is visible.
(e) In a practical experiment in visual perception, subjects are asked to read letters of the 2 Ordinal
alphabet which are placed to the right or to the left of their visual field. The aim is to 3 Interval
discover the distance at which the letters can first be correctly identified. Each subject's 4 Ratio
right and left visual ability is measured. The data from several people are collected for the T h e different levels have their o w n characteristics, a n d for each o n e , certain
statistical analysis. arithmetical operations a r e either permissible o r not - this being determined by the
characteristics themselves. If you violate the rules, then you may well e m e r g e from
your calculations with a number which might sound a s if i t means something, but
which o n closerexamination will be seen t o b e nonsense. Y o u will soon discover what
I mean.

The nominal scale


At this level, numbers a r e simply used t o classify things. T h e y say something about
the underlying phenomena - but not very much! In fact o t h e r symbols. such a s letters
of the alphabet, geometric forms o r colours, would b e equally appropriate.
Sometimes a verbal label could easily be used instead of a n u m b e r , but numbers a r e
often preferred because a verbal description uses u p m o r e space and takes longer to
say o r write o u t . T h e 'classic' example of t h e nominal type of n u m b e r is thedigitsseen
on the backs of football players' jerseys. T h e s e n u m b e r s give u s s o m e information
W h a l ' s in a number? What's in a number?

about the individual players- namely their position on the field -at aglance. Clearly, even further. and take an avcrage for the total; obtaining a mean of 6. What on earth
i t woulc: be impracticable to replace such numbers with a verbal description. Imagine docs ilzai figure represcnt? Certainly not thc 'average' positioll of all the players on
thc words 'halfway down the field on the right-hand side', replacing a single digit. It the field! The rcsults of adding, subtracting, multiplyingor dividing nominal numbers
would be quite possible and meaningful to replace jersey numbcrs by letters of the are quite meaningless - although it is easy to fall into the trap of believing thal
alphabet though, or geometric symbols, or even different coloured jerseys. bccause thcy ore numbcrs, they do represent something. T h e position is rather
(Although how would the feums recognise each other thcn? Another systc~nwould similar to that of spurious accuracy which I mentioned in connection with decimal
bc required.) Asit is, each digit stands for a place, and to protect the sanity of football pli~ccs.If you arc wondering why numbcrs are ever uscd in a liominal capacity, apart
players and spectators, the symbols comply with an internationally accepted systcm. from considerations of time and space, therc is another reason. It is that if categories
Other examples of nominal measurement, and which don't necessarily involve arc labelled from l upwards, consecutively, then the highest number used will tell
numbers, are: blood groups, types of cheese, psychiatric classification schcmes, car you how many categories there are in the scheme. This can sometimes be a useful
registration, bus route and convict numbers. Letters of the alphabet; A, A O , 0; piece of information.
Stilton, Cheshire, Lancashire; schizophrenia, depression, etc., could easily be
replaced by digits without any loss of meaning or information. For instance in The ordinal scale
psychiatric classification, doctors would agree on the symptoms which defined a
Slightly more sophisticated than the nominal, the ordinal scale, as its name would
category, and from then on merely refer to that category by its number. Notice that
suggest, involves an ordering or ranking of the variable under consideration. Thus
the use of a single symbol to indicate a category does not necessarily mean that it is a
the numbers of an ordinal scale show a relationship between numbered items, and do
simple variable, or that it can easily be categorised. That is quite a different matter.
not merely represent a class or category. When wespeakof 'higher', 'lower', 'easier',
'faster', 'most often', etc., we are using verbal labels to imply the type of order found
Properties of the nominal scale
in an ordinal scale. The numbers in this kind of scale are used systematically so that
The only thing which the nominal scale represents is equivalence. If several objects on any one scale, Oor 1 will represent either the lowest or the highest of all possible
or phenomena are given a particular number, then they must, as just described, be values. Examples of schemes are: social classgradings1, 11,111, etc.; house numbers;
similar to each other. That is why I stated that if footballersadopted a new system of fruit and vegetable grades I, 11, 111; examination grades A-F; army ranks and race
coding their positions on the field, the system would have to be taken over by positions first, second, third; Moh's scale of rock hardness, 1-10.
everyone-otherwise the information would only beof use to an individual team, and
there would be no way of generalising it to other teams. It would be possible, if Properties of ordinal numbers
everyone used the same categorics, but different symbols, for translation to take
Like the nominal scale, the ordinary scale involves equivalence (=), but also relative
place. For example, the position a star represented in one team would be known to
size, indicated by the symbols > (greater than) and C (less than). Numbers on an
be a circle in another, or a yellow jersey in a third, etc. Thus the only arithmetical
term which is relevant for nominal data is 'equals' (=). The nominal scalc illustrates ordinal scalc represent a fairly rough and ready rank ordering, and there is no
expectation that the difference between any two grades at different points along the
well why arithmetical operations (and statistical tests) are dependent upon what the
scale is the same. For instance there is no guarantee that the difference between
numbers actually represent. Although it would be possible to add up all the numbers
apples Grade I and I1 is the same as the difference between Grades I1 and 111. What's
on a team of footballers' jerseys, to get 66, what would the total signify? We might go
more, the relativc gradings of vegetables don't even remain constant, but alter
according to season. What could be passed as a Grade I1 tomato in winter may be put
into the inferior Grade 111category in sumtner. For these reasons, whenordinal data
are expressed as numbers. adding and subtracting, multiplying and dividing, and all
calculations involving thcse operations are not permissible. As with nominal data, a
number would cmerge at the end of such calculations, but it could not be interpreted
very well, and particularly if cross-scale comparisons were involved. If one were
trying to describc a set of ordinal numbcrs using a measure of central tendency, then
the mcdian and tlie modc, which only involve counting, would be appropriate. There
is no measurc of sprcad which could bc uscd, as they all involve addition o r
subtraction.
There is onc kind of scale which is particularly controversial. It is the well-known
IQ scalc - thc nurnhcrs being derived from the outcome of an IQ test. While many
psychologists maintain that it achieves a higher degree of precision than that of
ordinal data. othcrs consider it to be ordinal and treat data which comprise IQ scores
with great carc.
KI
What's in a nurnher? What's in a nurnher?

The interval scale Properties of ratio scales


We now arrive at the first level of measurement in which arithmetical operations are All arithmetical operations are permissible on numbers in a ratio scale, and also,
allowed. This is because at the interval level, the numbers are not only ordered, but because of the fact that they all have absolute zeros, it is possible to convert fairly
the intervals between each step, at mll points lrlong tire scnle, are of equal size. Thus easily from one to another. This is because the ratios of stated intervals, even across
two numbers which are adjacent at a low point on the scale (e.g. 3 and 4) are different scales, are equal.
separated by exactly the same distance as two which occur at n higher point (e.g. 343
and 344). This is where the problem arises over IQ scores. for i t is difficult to believe What about cardinals; and are they discrete?
that a difference of 5 points on the scale occurring between the values of 30 and 70 is
The interval and ratio levels of measurement are sometimes classified together and
the same as the difference of 5 points occurring between 90and 1 10; the middle range
called the cardinal scale. Cardinal numbers give amounts of something, whilst
of the IQ scale is much more precisely measured than the extremes at either end. As
ordinals give order only, and nominal numbers are simply names for categories.
social scientists frequently wish to undertake analysis of their data using arithmeti-
Another division - this time of cardinal numbers only - is also discrete and
cally precise methods, they will often aim to use methods of measuring the variable
contirruous. Discrete units are ones in which each value is clearly separated from its
under consideration which achieve at least the precision of the interval scale.
neighbours. For instance in counting the number of people attending an event, our
A main feature of interval scales which distinguishes them from the most-
money, o r the number of shops in a street, a unit is a complete entity which can't be
sophisticated ratio scales, is that they have no absolute zero point, but only arbitrary
meaningfully divided. Half a person can't go very far (at least, not in a vertical
zero points. This can be seen through consideration of two examples of the scale,
position!), an establishment either is or isn't a shop, and we don't divide money into
temperature (Centigrade) and calendar years. On the Centigrade temperature scale
units smaller than fp, in British currency. All kinds of currency have their smallest
the zero was fixed as the point at which water freezes, and much lower temperatures
unit, and no further subdivision takes place. O n the other hand, continuous units are
which occur are indicated by a negative sign. Calendar ycars were given the 'starting
ones which can be divided up over and over again. There isno theoretical limit to the
point' of 1 at the time of Christ's birth, and the many years which occurred before this
number of times an inch, second or ounce can be subdivided; the constraints are
date are denoted by the label BC.
purely practical, and with our advanced technology we are capableofobtainingsome
extremely fine units of measurement. You will be pleased to hear that for the
Properties of the interval scale majority of tests you will encounter, it doesn't matter whether the variables involved
Because the size of intervals is equal within any one scale, it is admissible to carry out are of a discrete or continuous nature.
all arithmetical operations meaningfully. However, cross-scale comparisons (e.g.
Centigrade to Fahrenheit) must be handled with some care. The comparisons are
difficult because the zero point is not the same for both scales, and because the size
ofthe interval used in each scale differs. Conversion from one scale to another isquite
possible, but usually involves a certain amount of arithmetical juggling.

The ratio scale


This is exactly like the interval scale, in having ranked numbers with equal intervals
between the numbers throughout the scale, but it differs in that it has, in addition, an
absolute zero. This means that 0 means. literally, 'nothing there', not, as in the case
of temperature Centigrade and Fahrenheit, just 'very low'. You can rememberwhat
'absolute' zero means by memorising the phrase 'absolurely nothing there'. It is
impossible to have minus numbers in a true ratio scale. If negatives are to bc seen,
they signify that the operation of subtraction should be carried out, not that the
number is less than zero. Length (metric and imperial), weight, elapsed time, speed,
temperature on the Kelvin scale and frequencies arc all examples of ratio scale Exercise
measurement. On the Kelvin scale the reading 0 signifies the lowest possible State the level of measurement achieved in the following types of numbers:
temperature attainable; the point at which evcn the movement of molecules ceases. (a) steam engine numbers (b) a child's age (c) hospital staff, grades 1-9 (d) football
It is wily bclow zero on the other scales. its equivalent on the Centigrade scale being goals (e) the Beaufort scale of wind strengths, from 0 (calm) to 12 (hurricane) (I) opin-
-273C. It is easy to spot when a scalc is ratio, as a minus sign makes the items ion ratings, from 1 (agree) to 5 (disagree) (g) TIQ -Tortoise IQ scores (h) National
nonsensical. You can't have -2 humans, -3 oz sugar, or - 10 seconds! lnsurance numbers (i) centimetres (j) Types l and l1 errors.
83
The assumptions. underlying parametric techniques

l No\\! \VC takc a closcr look ;it the :~ssuml>tio~is

f c ; ~ [ ~ ~ofr cscts
s of scorch:
\\,hicl1 ~~ntlcrlic
the parametric
tcchniqucs. Thcrc arc thrcc main restrictions, ;lntl tlicv conccrll the fclllowing

l .I'llc type 01' lilc:1surc w l i i ~ . ltlic


~ scores ~r~prcsc~it (1.c. tllc lc\~cl
of III~:I~~II-~IIIL.II~).
I
2 '1.11~distriI>~~tici~iof S C O ~ C S- IIS{I:IIIY. our I I I : I ~ Ic(i11ccr11
~ lie1112
o\,cr \vl~~,tl~er
or 1101 lit,\ L . ~ I I I ~ C
Iro~n;I norm;~lJixtrihutio~~.
3 1 . h spread
~ or tlic scorcs.
i
I I 1,evel of mcasurcmcnt
l
As parametric tests involve scvcral sophisticated (hut not ncccss;~rilydifficult)

,
,
1
arithmetical opcrations, they arc only suitable for use with data which arc ot'intcrvi~l
or ratio level. It is always possible to usc the simpler nonparametric testson high lcvcl
data. but this is rather like using an untrained pcrson to makc finc distinctions in
wine-tasting. He would arrive at judgc~nc~its of differences bet\\,ccn wines fairly
Why do levels of measurement matter? successfully, but thcy would only bc crudc, and would not cornparc very favour:~hly
i
; with the precise dcscriptions and statements made by trained \\,i~lc-tasters.I f :I
Scientists of every shapc and form are always manipulatingvariables, and expressing ; trained taster is always available at no extra cost, and the wine to hc tasted will not
different values of the variables by means of various numerical units. The units will 1 insult the taster, why use a novice?
then be used in the subsequent data analysis. Chemists, physicists and biologists d o 1
not often run into difficulties here, for the units they deal with most frequently arc of I
the cardinal type, and quite suitable for sophisticated arithmetical treatment. In fact,
ask a'rcal'scicntist (as thcy like to think of thcmsclvcs) about levels of mcasurement.
and thc chances arc that hc or she won't know the first thing about the topic, for it
does not crcatc problcn~sin their world. Unfortunately, lcvcl of mcasurcmcnt is a
constant headache for social scientists, as many of the variables thcy must consider
are measured in units which d o not attain interval or ratio sophistication, and so are
unsuitable for prccise mcthods of analysis. Thc thrcc simplc tcsts which I have
I 2 The distribution of the scores
covered in chapter 9 arc appropriate forany kind of numbers, but as mathematical
As I nicntioned earlier, parametric tests can only bc used on data which are derived
tools they lack precision. They belong to the group of tcsts called rtonparamerric, and from normal (or nearly normal) distributions. It may surprise you to lcarn that there
are relatively free of restrictions concerning their use. In contrast to them, we have
is a certain amount of controvcrsy among statisticians over how far you can break this
the more-sophisticated, morc-powerful, paratneiric tests. Thcsc tcsts work on thc 'rule' and gct away with it. Personally. I think it's bettcr for thc uninitiatctl. 5uch a5
basis that the scores fcd into them come from a normal distribution. and they are able ourselves, to err on the conservative side, and obser\,e this rule.
to pull on the mathematical properties o f this distribution in distinguishing scts of
Assuming that you dccidc to play safe in this respect. how \voultl you dccidc
data. Just as the Wilcoxon tcst correctly rejects the null hypothesis when thesign test
wlicther your data d o fall illto a normal distribution? Whcn thc ~iunihcrof scores you
doesn't (and this failure constituted a Type 11 error), so the sophisticatcd parametric
are dealing with lics between fivc ; ~ n dfifty. the answer is that you usc thc old cyc-b;ill
tests are also bettcr ablc to rcjcctcorrcctly the null hypothesis than the nonparamctric
tcst again - i.e. you decide by looki~igat them! Putting thc scorcs into tlic form of a
oncs, and hencc have greater power. Bccausc social scientists arc concerned with di;~gr;~niin which thc vertical axis intlicatcs frcqucnc!.. and the horizont;~Iasis various
variables that sometimes achieve interval status, but more frequently don't, it is groups of units (in ;~sccntliugorrlcl.), is an opcration tvhich nlakcs the dccision easier.
important for them to know which typc ofstatistical treatment is thc appropriate one.
Hut what if they show a sh;~pc\\,hieh is not rlrrite normal'? After all. i t w o ~ ~ I>cl d f;~irly
Using any kind of instrument on unsuitable raw material is Iikcly to have unfortunate
surprising if any smallish sample drawn from a population with a normal tlistriilution
conscqucnccs; in statistical analysis, if you use the wrong technique. you either niiss
niirrorcd thc parent p o p ~ ~ l a t i oc.rtrcr!\l.
n Thcrc is ;llways a certain mount of crror in
finding a subtlc diffcrcncc which is thcrc (a Typc I1 error), o r on thc other hand,
sampling. So to discovcr whether thc cxtent of dcriancc from thc nornial sh;~pcis
producc a statistic which looks good, but in fact may bc quite mcaninglcss. Data
sufficient to suggcst that thc >cores arc dcrivctl from a non-norrn:~ltlistril>utic~n.you
evaluation - o r 'crunching'. as thcy say - must be vcry fincly tunctl to thc rrrrtrrrc, of
coultl apply onc of the statistic;~ltcsts dcsigncd to ans\vcr this clucstion. such as ;I tcst
the data.
of podness-of-lit (ch;~i)ter11)./ i t this stage in your st;~tihticalcareer i t i \ usr~;~ll\,
( l ~ l i ~\11J'licic1it
c to krio\v 11i;lt the ~ ~ o ~ u i i ; ~,>l' I ~ I I IJC
l i t;Iydist~.iIl~~tior~ I ~ col~iidc.red.
\\.itl~outIiavi~igto go illto ;III thc clct;~il\.
~
l1 Two parametric tests
Notc that in deciding wlicther o r not s:lmplcs :Ire norrn:~llydistrihtrtcd. !ou must
consider thc shapes of hot11 samples. They must both meet the requirement of
riorm:~litybcl'orc ;I parametric tcst can be used. Ifowcvcr. thcy ncctln't h;l\.c come
froni the samc normal distribution -that is, li;~vcidc~itic;~l nic;llls. Intlcc(l. this is
precisely what you have hopcd to i~void in undcl-t;~kingthc cxpcrimcnt: you
;~nticil,atctlt l i ; ~ tthe rn;~nipul;~tion of the 1V wo~rltlcr-catc two groups ol'\cor-CSc o ~ ; ~ i n g
from ~listrihutionswith vcry different n1c;ins.

3 'l'lie spread of the scores


Sin~ilarityof spre;ld bctween samples is called officially hotnogc,trritv of ~~nri(rtrc.c, and
i t means tli:~tthe two sets of scorcs liavc similar vari;~nccs.Vrrritrtrt.o is a measur-c of You will nced to hnvc rcad and understood the five previous chapters to be able to
ho\v cprcad nut, or scattcred, a sct ofscorecis. ;lnd wi~scovercdin dcti~ilin ch;~ptc~. 5. 1 tlcal with the contents of this one. In it I will covcr the two par:inictric tests which
!
wlicrc I explained that i t is directly rclated to tlicstntrdurclclc~~~iotiot~,a~iotlicr ~ne;~surc replace tlie Wilcoxon and Mann-Whitney tests when the di~tafor an:rlysisconform to
of spread. What we are saying here is that two sets of scorcs must be scattered hy a the following requirements:
rou~lilye q w l amount if a parametric technique is to bc uscd on them. If the scores I Both samples are normally disrrihuted.
comprising one salnplc are widely sprcacl out round the mean (i.e. li;~vc;l large 2 The v;~rianccsof rhc samples ;Ire simil:ir to c;lch othcr.
vari;~nce),while thc other set is squashed up in close proximity to it (with a small 3 The samples comprise scores of a t least intcrval mcasurement.
variance), then one of the nonparametric techniques must be used in analysing these
You may recall that thc main difference between the Wilcoxon and the Mann-
data. Think of 'homogenised' milk to remember what the official tcrm for this rule
Whitney tests is that in the Wilcoxon test the scores for each condition are paired off,
means. Homogenised milk is milk which has been treated so that it becomes and stays
while in the Mann-Whitncy tcst, there is no matchingof individual numbers between
the same throughout the whole bottle.
the samples at all. Thc sign test is meant for paired scores, and it is less powerful than
As with the question of normality, you miiy wonder just how similar the variances
the Wilcoxon test, which i t can replace. Thc importance ofwhether scores are paired
have to he before a parametric test can safely bc used -or, how dissimilar they can be
off, or matched, as the pairing is called, is paramount in thechoice of a statistical test.
before parametric techniques have to be abandoned. And, as before, for cases which How we plan our cxperilnents to obtain paired scores, and the advantages and
arc not readily determined by means of the eye-ball tcst, so i t is possiblc to use a
limitations of such cxpcrimental designs is fully covered in chapter 13.
particular statistical technique to answcr the qucstiori. The appropriate technique is
known ;IS thc variat~ce-ratio,or Ftcst. Basically, it involves obtaining the variancc of The two t tests
ei~chset of scores, placing the larger value ovcr the smallcr, and looking up the
resulting ratio in a table. Actually, the Ftestcan be regarded as a statistic:il tcst in its So far then, you know about the Wilcoxon and sign tests which can be used on
own right, for it can answer the old question of whethcr samples appear to have been tnatched scores, and the Mann-Whitney test which is used on ~tnmatched,or
dr;~\vnfrom oncor two populations. Suppose you have twosctsofscoresrepresenting itrd~~perrderr!,samplcs. The more powerful tcsts which replace thc Wilcoxon and
Mann-Whitney tests are tlic two t tests. One of them is for matched s;~nil)les.and thc
rc;~ctiontimcs ;~fterthe administration of a drug i n one casc, or a placeboin anotlicr.
( A ~~ltrc,c,ho is a sut>stanccwhich appears to I)c identicirl to the drug extcrn:rlly. arid othcr for independent samples. Thcrc are no prizes for working out th;rt the m:rtchcd
I tcst rcplaccs the Wilcoson, and the independent I tcst the Xlann-Wl~itney.
\vhich is I~clicvcdby tlie 'victim' to bc the drug. hut which does not in fact cont;~iriany
i~cti\,c substance ; ~ all.)
t I f xou obtaincd widely scattered scorcs from the cxpcrimcrital Sometimes, just to complicate things, the t tests appcar with different names.
Originally, the t dictril)~~tion \+!:IS worked out by a certain Willialn G o s ~ b t\vho
. was
group. Iwt rccults which showed hardly any varit~tionat it11 from the controls, \+,l10
rcccivccl the pl;iccbo. then you would bc ablc to conclude, with statistical ju<tilic;~- employed as a rcsc;~rchchclnist :it thc Guinness brewery in Ircland at the start of this
cc~ltury.At that tilnc. Guinricss cnil,loycec wcrc never allowed to pulllish any
tion. t l i ; ~ ttllc two s;~rnplcsdiffcred,; ~ n dthat the experimental trc;ltmcnt h;ld h r o u ~ l i ~
: ~ l ~ otul i~ cI;lrgc v;~ri;~tion of rcsponsc \vliicli \\;I\ cvitlcnl. iliscovcries thcy matle. I~lo\vct~cr, I>ccause of the cxccption;~lirnport;~ncc of his
I f any of the tlircc rcquiremcrits lictctl al~c~vc arc not nict h! your data. rhcri you \tat is! ical discovery (and ~)crli;~lls ;~lsohccause i t concerned a m;lt hcln;rtical. ratlicr
s h o ~ ~ rc;~lly
ld use ;I nonpar;~mctrictechniciuc in your ;~n;~lysis. r;~tlicrtli;~none of tlie than I: beer recipe!), the strict rule was relaxed for Cosset, providing that he
more po\vcrl'ul parametric tcsts. p~rblishcdundcr a pcn name ;ind remained anonymous. He chose the nirnic 'student'.
:ind so thc t clistrihution ;rnd its application in statistical tcsts hcc;~nieknown as
ctridcrrt's t tc.rt. This name has tlicd out ;I little in recent ycars. slthoush the distinction
I~ctwccnthe vcrsicln of tlic t formula ~ ~ s cfor d matched ;rnd unmatched samplcs
'I'wo parametric tcsts

to find o u t . If thcre wcrc threc urn\. corlt:~irlingtc:r, coffcc and hot c h o c o l , ~ t cwhat
,
would b e the (if?H o w many samples would you nced t o t : ~ k chcforc you knew what 12 Tests of goodness of fit
each c o n t a i n e d ? If two a r e known. then t h e content of the final urn is fixed - always
provided that you know what the overall separate items in thc container5 arc - and
you would be able t o name it. S o because ~ w items o h:lvc t o h e known hcforc the t h ~ r d
can h e named, the df = 2. G o t it?

Exercises
1 Decide whcthcr a rclated r test or ; ~ nintlepe~ldcntl teqt woultl hc approprii~tcfor [he
following:
(a) A comparison of heights in a twin stutly, whcrc orlc twin is in the cxper~rncnt;~l group,
and the other in the control group.
(h) An analysis of differences in exam marks obtained (assume intcrviil data) hy hr)ys and A v;rgrle idea of the contents o f chapters 9-1 1 will h e helpful for a n under;tanding of
girls in a particular year, with a view toshowing that one of thc scxcsoht:rined hcttcrmarks. why tests of goodness of fit differ from o t h e r statistical tests. T h e I tests, sign,
(c) Medical social workers in a hospital start to notice 1l1;1t pcoplc who livc in the vicinity of
Wilcoxon and Mann-Whitney tests a r e all designed t o tell you whether two samples
the steel foundry have large ears. They measure car length precisely in thesc people, and
a p p e a r to have been derived f r o m o n e o r two sources- t h e underlying population o r
compare the lengths with measurements obtained from others who livc in the vicinity of the
populations. This is t r u e , regardless of whether the test is parametric or non-
colliery.
(d) 'Before' and 'aftcr' reaction scorcs obtained from one person rvho took part in an parametric, matched or unmatched. If t h e two samples differ in any important way
experiment on several occasions. from e a c h o t h e r , then o u r test statistic will result in o u r decision t o reject the null
(e) 'Before' and 'after' scores obtained from a group of people who only participated in an hypothesis.
experiment once.
2 Carry out I tests on thc following sets of data: Tests of goodness of fit; quite a different cup of tea!
(a) The independent samples: T h e tests o f goodness of fit work o n slightly different principles t o the ones already
Set A: 3 , 5 , 2 , 4 , 6 , 2 , 7 described; their n a m e gives a clue a s t o h o w they work. Suppose you have a set of
Sct B: 9.25.4, 16.36.4.49 scores which clearly showed a normal distribution (figure l ( a ) ) , a n d they had been
(h) The independent samples:
derived from a control g r o u p in a n experiment. N o w t a k e t h e experimental group
SetA:7,6,8,3,9,4,9,5
scores. T h e y show the strongly skewed distribution illustrated in figure I(b). Imagine
Set B: 11, 14, 17, 16, 15.21
(c) The related samples: cutting o u t the pattern shown in figure I ( b ) , a n d placing t h e s h a p e over that of figure
Set A: 3 , 8 , 4 , 6 , 9 , 2 , 12 I(a). T h e result would resemble figure ](c).
Sct R: 6. 14.8.4. . 16.7.19 .
(d) Use the data in 2(c), but treat as if unrelated. Compare the two valuesof r you obtain.
3 Evaluate the statistics you obtained in exercise 2 for significance, (i) assuming a one-tailed
and (ii) a two-tailed hypothesis.
4 Decide whether the following data would be bcst suitcd to parametric or nonparametric
analysis:
(;I) IQ scores from two samples with mcans of 95 and 110. and identical variances.
(b) The two sets of scores:
Set A: I . l , 1, 1, 1 . 4 , s . 10, 10, 10, l 0 (a) Nornlal (b) Skewed (C) Plormal and skewed
SetB:1,3,5,6,6,7,7.7.9.10 . distribution distribution distributions together
(c) 'Before' and 'aftcr' attitude assessments, carried out on a group of students exposed
fo :In exhibition of modern art. Figure 7. Tests of g o o d n e s s of fit work o n the extent t o which distribution s h a p e s
(d) Weight i n grams of haby rats boru to fc~llalcswho haw hccn kept oti ;In cnrichcd or overlap
\candartl diet.
T h e eye-hall test tells you immediately that the t w o distributions arc very different
(c) The reactions to ccrt;~instimuli, rneasurcd in scconds. ;iltcr:~drug trc;~tment.cornparcd
with scorcs from controls. Samplcs arc normally distributed. I1:1vc vi~ri;~nccs of I0 scconds - hec;iuse they don't fit neatly o n t o p of e a c h o t h c r . S o , with n o m o r e a d o . we might
and 100 scconds, with mcansof50and 55 scconds. conclude that scorcs showing such markedly diffcrent shapes must have come fronl
i I~ors;~mplcswith cifcqu;tl to 10. and for ;~orlc-t;~ilcdtcst,iritcrprct
tllc f~illowingvalucsofr: two different underlying populations. o n e with a normal distribution a n d the o t h c r
( a ) 0.500 (b) 1.812 .(c) 2.200 (d) 3.0 with a skewcd distribution. Tcsts of goodness of fit work o n the principle of
distribution sliape. rathcr than o n actual scores. T h e y c o m p a r e overall patterns of
results which a r c obtained by putting the actual scorcs into a frequency form.
01
Tests of goodness of lit Tests of goodness of fit

Although our conclusion, ;thout whctlicr samples liavc Occn dcl.ivcd from the same
(11-different underlying populations. \\.ill hc the samc \\fit11 tcsts of gootlncss of fit as
with the tests covered errrlier, in these. we normally conlpase one s;\rnldc with
another of a particular distriljution. An analogy might help.
Our sarnl~lcsare rcprcscntcd hy t\vo \.;lnilla sliccs. ~ f h cclucstio~i\\)c ; ~ s kis: 'Wcrc
thcy bought in tlic same shop?' - the sliol~being cqi~ivalcntto ~ > o l > i ~ l ; ~Il'tlicy
t i o n rrre
similar, then we can conclude that thcy did come from a single source. ; ~ n dho not
reject the null hypothesis. If they dqJc~r.thcn we would conclutlc that thcy h;~vccomc
from two shops, and so he able to reject the null hypothesis. In using any of the
statistical tests described up to now, \\.c \vouldn't point to one of thc sliccs and ask behaviours; the trcotrherof times ir coin came up heads, or a thro\vrl die showed a 'six'
whether the other differs from i t , I>ut\vould ask whethcr they tliffcr from each other.
(no stiltistics book would be complete without at least a mention of coins or dice!):
In using tests of goodness of f i t . the qucstion becomes ' I f /hi.%slice (pointing to one of
the rrrtr~rl~c~r
of rats making falsc moves in mazes. of crimes cornniitted by \.arious
them) came from Ye Olde Tea Shoppe, did the othcr comc from the same place?' kinds of people, of red blood cells found in a specified volume of blood. etc. In a beer
There is a very subtle difference, and it is in specifying thc characteristics of one of
sampling 'study'. described in chapter 15, beer is rated according to five categories.
the samples, and asking whether the other resembles it. Apart from this, and the fact
Each category would form a cell, and a tablecould bcdrawn up indicating the number
that we use frequetzcies of scores, rather than actual scores, tcsts of goodness of fit are of times a sample pint was judged 'undrinkable', 'fair', or whatever. When you are
similar to the other tests. Although the eye-ball tcst may be sufficient to decide that
using. o r contemplating using, the chi-square test, the levelof measurement in\,olvcd
some distributions differ, as in figure I(c), on occasions the difference may not he big in the categorisation scheme makes no difference to the kind of test of goodness o f fit
enough to decide just by looking, and the usual question of whethcr a small
to be used. They all operate on frequency counts derived from an!. kind of data.
difference might be the result of chance factors has to be answered by means of a
However, there arc some rules for chi-square tests, and they are listed below.
formal statistical procedure. If you read about the requirements for parametric tests,
in chapter 11, you will have already encountered a situation where a test of goodness 1 Itcn~sin cclls must be indepcndcnt. They cannot appcar more than once. and thcy cannot
of fit could be used. This is in the requirement that both samples for parametric tests bc included if they arc 'so~ncliow'inllucnccd by othcr items. Each itcm must be an isolatcd
are drawn from normal distributions. I f either of the samplcs is skcwcd, thcn we cvenl.
2 The number of itcrns appearing in thc 'cxpcctcd' category, obtained tiurin_~thc stages of
know we cannot use a parametric test (and we might conclude imrnediatcly that they
co~nputation,must bc at lcast fivc.
have been drawn from different sources). However, in many cases it is not too easy i 3 The tests must bc carricd out on the actual numhcrsof items which appear in thc cells. not
to decide whether the sample in question is sufficiently close to a normal distribution on dcrivcd proportions or pcrccntagcs. Even though the proportions of numbers arc
to justify the use of a parametric test, and in this case, we might well use a test of unaltcrcd. the tcst is invalidated if the original nurnbcrs arc not uscd.
goodness of fit to help us to decide.
There is a whole group of tests based on the distribution of a statistic called ,y2, As usual, there are somc squabbles. The one concerning chi-square is over rule 2;
pronounced 'ky', rhyming with 'fly', and sometimes written out as 'chi-square'. They 1 that therc must beat least five items in the'expected'category. Some statisticianssay
all come under the heading of the chi-square measure of associariotz. Although therc I that it doesn't matter all that much - but i t is probably hetter to play safe and ohscrve
are several varieties of the clri-square tests which, as usual. go by many names, and the rule. It is possible to overcome the problcm of small expected frequencies by one
can also be calculated in various fashions. don't despair. You only really nced to of t\vo methods. One way is to pool the data. which means putting some o f the
know that the chi-square tcst works on a slightly different principle from tlic othcr categories together. An example will make this clearer. Suppose I have carricd out
tcsts you have encountered. ancl to cope with the workings of just one of the somc observations on the age at which wisdom tectli appcar in an obscure African
\,:~riations,which is called. encouraginely ,sit~zpleclzi-squtrrc (sec operation schedule tril~c.and \\,is11 tocompare the information with that already av;~ilablefor Europeans.
12). For the sake of conlplcteness I include a little on the rationale o f the crit~1~1e.x I ~iiighth;~\,cgroi~pcdmy sul>jects;tccol-di~lgto the following broad di\.isiolis:
dri-sqrtarc, and includc the steps involvcd in its computation in operation schedule 1 Children from birth t o 0 ycars
13. 2 Adolcsccnts from I G l 4 years
3 Adolescents from 15-!X ycars
Frequencies and rules in chi-square tests 4 Young people from 19-21 ycars
5 Adults from 22-25 ycars
In all testsof goodness of fit, the categoric5 into which frequencies ol'scorcs fall are
I t might turn out. during calcul;~tions.that fcwcr than five scores would hc c.t-/~c~c.rc,tl
called cells. T h e cells contain counted item5, such as the t~~rrrrlio.
of pcoplc who fall
in the first category. Conseql~cntly.I coultl pool the cells of categories 1 and 2. giving
ill. have accidents. obtain various 10 scores. say 'No' to a give a n opinion
one cell. lal>clletl'Children a n d atlolcsccnts from birth to 13 yc;~rs'.No\\. the c,.rpcc.rctl
on a queslionnaire, fall into p;~rticular social classch. show certain spccilicrl
frccluc~icics\voulcl Ijc of sul'ficicnt sirc for 111cto proceed. but as a result of thc pooling
'I'ests of goodness of fit Tests of goodness of f i t

process, the new category would Irave t>ccomcmuch broaclcr, and so less LIS~SLII.
this reason, pooling is avoided if possible.
The second way round the problem is to increase the size of the samples under
consideration. Unfortunately, this is often impracticable, and almost alwayscostly in
For

'
1
:ittacks does not mean tosay that i t is the excess body weight whichcausestheattacks.
I t may be that we makc them fat through asking them to eat plenty of fatty foods, or
sweets, or by drinking lots of alcohol; it may be the fype of food they eat which causes
the heart condition. However, on the whole, we would be in a much stronger position
terms of money and effort. for advocating wcight reduction than if we just went by the results of thesurvey type
Sometimes, chi-square tests arc called tests of association, for in survey studies,
when we have counted the number of items falling into the categories, we conclude
1 of study, shown in table 1. All the survey tells us is that there is an mocin,ion between
the variables. I t might even he that having a heart attack at an early age encourages
that the values of one variable are associatcd with particular values of another. people to 'eat, drink and be merry', and so become overweight. Our survey did not
However, because a survey is not ;in experiment, and we have not actually i tell us anything about the order in which the events occurred, and it is this which is
demonstrated a causal link, we cannot go on to say that 'X ctrrises Y ' - only that 'X is rather crucial for making predictions. Experimental work normally has something to
associated with Y, in such-and-such conditions'. We will encounter this kind of say about the relative timing of events.
guarded statement again, when we come to consider another measure of association, Returning to table 1, you can see just by inspection of the data that overweight
correlurion. I people have more heart attacks than the [hinnies. This is a nice clear-cut example -
?
not at all like the sort of data which we tend to obtain in 'real-life'! However, it is
Simple chi-square i useful to have a clear-cut example like this, in order to explain the way the chi-square
j test works. Conceptually it is fairly easy to grasp. In the example, 1000 men were
The simplest chi-square test involves only four cells. and their arrangement into what studied; 500 fat and 500 thin. If there is noassociation between body weight and heart
is known as a 2 X 2 contingency ruble (read 'two by two'). Such a table contains two attacks, then we would expect by chance roughly equal numbers of heart attacks to
samples, and both are divided into two values of a second variable. An example is occur in both groups. By a mind-blowing feat of mental arithmetic, you might
given in table 1. 1 conclude that the expected figure for each group would be about 250, i.e. half the fat
! men and half the thin ones having attack reports. Actually, you may have reasoned
Table I . A 2 X 2 contingency table, showing the relationship between this way when you carried out your first eye-ball test on the data. Alternatively, you
weight and susceptibility to heart attack
may have decided that 400 heart attacks just looks a lot more than 100, and left it at
that, without calculating the rough estimate of what would be anticipated if the null
Men Have experienced Have not had a Totals hypothesis were true. The null hypothesis would state that there is no association
heart attack(s) heart attack between weight and heart attacks, and in mathematical terms, would suggest that thc
value of 500 attacks would be divided evenly into two, between the overweight and
Overweight 400 100 500
underweight men. The total number of men who didn't have heart attacks would also
Underweight 100 400 500
be evenly divided, so that overall, a contingency table drawn up on the basis of the
Totals 500 500 1000 null hypothesis would look like table 2.

Table 2. The null hypothesis for the heart attack data


- This example illustrates the relationship between weight and coronary heart disease
- using fictitious data! The population comprises male members of this society, and Men Have experienced Have not had a Totals
two samples are drawn, one of very slim men and the otherof overweight men. These heart atrack(s) heart attack
give us two categories on the weigh! variable. The number of people in these samples
who have had one or more heart attacks before their fiftieth year is ascertained, and Overweight 250 250 500
so they are divided into two categories; 'have experienced at least one hcart attack' Underweight 250 250 500
and 'have not had a heart attack'. Thcsc give us two values for the second variable
l~eurtur/nckllirlory. Note that I have been c;lreful to call the two variables involved Totals 500 500 lcnw)
just 'variables'. without identifyins a dependent variable ( D V ) and :in independent - -

variable (IV). Although i t is possible that excess wcight doescausc hcart attacks. you
must remember that the ~ l l i - ~ ( l l itest
i i ~is~ ot~!\.;I measure nf ;~ssociation,and says All right so far? Notice that the row and column totals have not changed. only the
nothing about causality. I f we had ~nanipul;~tcd pcoplc's weights, ancl rlzen ohserved dihtribution of the numbers in the cells. now rearranged according to a thcorctici~l
thc heart attacks which followed. we could have callcd our weight variable the IV;ind e\pcctation. The chi-square test works by taking each o f the ohruined values in the
thc hcart attack rate the DV. Even tlren though. we would still need some caution in cells and comparing i t with what would have been there if the null hypothesis were
interpreting our results. Just because we makc people fat. and they then have heart true. So we compare each of our ohraitzed values of 400: 100 and 100:400 with the
95
l'ests of goodness of fit Tests of goodness of f i t
'rhe point about this example is that only one sample of students was taken. I t differs have arisen through several moderate diffcrenccs, rather than one or two outstand-
from the previous one in which two samples (ovcr- and undc~~weight men) were inply large ones. There is a method for finding out where the differences lie, and i t is
categorised and compared with each other. The computational steps for the described by R. Rosenthal and R. L. Rosnow in their Primer of n~efllotisfor [hr.
one-sample case, with more than twocategories, are almost thesame;in schedule 13, helra~~ioral scieizces (New York, London: Wiley, 1975). Ironically, i t comprises
instead of going through Steps 1-3 to obtain the e.rpec~edcategories, you can virtually carrying out another chi-squ:~retest!
start at Step 4, because the expec~ednumbers can he calculated very simply. The only Note, by the way, that complex does not mean more difficult, but ratlier more
other thing which differs is the calculation of the degrees of freedom. With tedious to calculate. The principle behind computation isexactly the same as that for
onr-sample c11i-square it is always the nunlher of categories minus 1. simple chi-square. I am including here a table of data obtaincd yet again from the
University of Wetwang Degree Day proceedings. It shows how the numbers would
be presented in a formal manner, and provides the material for the steps of schedule
12.

Table 4. Degree classes obtained by students at Wetwang University

Degree classification

Subjects I Ili Ilii 111 Pass Totals

Languages 5 10 20 35 30 100
Maths 35 40 80 25 20 200
Economics 0 10 10 20 20 60

Totals 40 60 110 80 70 360

If these data were presented in a formal report, the conclusion might be worded:
'The value of obtained (74.696) when (I/ = 8, is significant at the 0.001 level of
probability. Therefore the null hypothesiscan be rejected. It isconcluded that there isa
Complex chi-square relationship between the different subjects studied and final degree classification.'
Complex chi-square is an extension of the simple 2 X 2 chi-square, and it refers to In a discussion of the data, it might tentatively be advanced that language and
any sets of samples and categories which exceed four cells in size. For instance, if we economics students showed a pattern of results which was negatively skewed, i.e.
had obtained three categories of people in the heart attack study -over, under and fewer obtained good degree classes, whilst the classes obtained by maths students
correct weight - we would have created a 3 X 2 contingency table. We might also approached normal distribution, although with ;I suggestion of a positive skew. This
have categorized heart attacks differently, and given that variable three values also - is a good example of an occasion when in writing up a report, good use could be made
none, one and more than one. Now we would have a 3 X 3 contingency table, of illustrations to help the reader quickly and clearly understand the data.
containing nine cells in all. We can make our tables as big as we like, but there are
one o r two snags which can arise. If the table gets very big, then we are more likely
to o b t f i expected frequencies which are less than 5 in size. Sometimes this can be Accuracy and chi-square
o\,ercome by increasing the sample size. Another problem arises over the interpreta-
In schcdulc l?. you itre asked to subtract the valuc of 0.5 from the o h ~ c r \ ~ e d
tion of significant results. Unfortunately, even i f we know, from our calculated value
frequencies. The incorporation o f this step into the clri-square calculations is known
of X ' . that our distributions differ from each other, the statistic tells us nothing
;IS Ynrer' corrc~ctiot1.I t is a correction for continuity. which is sometimes omitted.
bout precisely where in the table the crucial differences are to be found. Up to a
Ilowe\~er,it improves thc accuracy of the test and has thc effect of giving a slightly
point the eye-hall test helps -and particularly in smaller tables. If, for instance, in our morc conscrvativc estimate of ,y'. It should always be included in calculations from ;I
degree results, we had obtained a significant difference, we could have guessed that
7- X 2 contingency t:~ble.and i.; :~bsolutelyessential when samples arc small, i.e. thc
i t lay in the numbers of students at the extreme ends of the scale, as they show the
total is less than 25.
I)iggcst discrepancies from the value of 20. But i t is much morc difficult when there
Obtaining a value for X' provides one of the rare occasions when you can include
arc many categories of different variables, bccausc the value of chi-square might
:IS m;my decimal pl:~ccsas you wish -during the computation. I f you round off the
The design of csperinicnts

both groups were permitted to take tea. coffce, mints ctc.. :I\ th?!. wishecl. ; ~ n d
13 The design of experiments smokers were allowed to inhale their usual dosc of nicotine
Stop now - and answer the following questions:
I What are thc IV and DV in this expcrimcnt?
r J M h T Q6 You M E A M - 2 Which variables have been controlled?
3 Which relevant v;triahlcs have rtor been controlled?
4 What criticisms of the experiment can you make?
5 1s thc espcri~r~etital
hypothcsis onc- or two-tailed'!
6 What is the null hypothesis?
Now for the results. Exam marks were obtained, and it was found that the
cxperimental group - the students who had received tranquillisers - had marks \i8hich
were significantly higher than those obtained by the control group. T h c diffcrcncc
was significant at the0.05 level.
In this chapter I a m going t o consider more closely the cliffcrc~itways in which we Stop again! Think!
obtain from experiments and surveys tlre sets of data which we will subseque~ltly 7 Would the null hypothesis be accepted or rejected?
8 Which statistical test would be the most appropriate for these data:'
compare. You will get the most benefit from this material if you understand the stcps
9 What does the 0.05 level of significance mean? Now proceed . . .
we go through in carrying out an experiment (chapters 6 and 7). T h e particular
W e a r e now going t o pull the experiment t o pieces. Before I s t a r t the demolition
statistical analysis carried out after the data have been collccted is determined by the
job though, criticise it yourself. Main clue - d o you think that the suhjects in the
level of measurement attained by the scores. and whether thcy arc paired off in any
experimental group were really well matched with those in the control group?
way. T h e different methods of matching either groups o r individual scores come
under the gcneral heading'expcrimcntal design'. Howcvcr, ninny of thc points made
in this chapter will have some relevance t o the gathering of material by non-experi-
mental methods, e.g. through surveys o r observational studies. e x a m s o f course.
In their investigations, psychologists and other social~scientistsa r c hampered by
two things (besides shortagc of funds!). O n e is ethical considcr:rtions, and the o t l ~ e r
is the sheer complexity of human and animal behaviour: In an attempt to make the
best of things, thcy have to make use of experimental designs which will overcome,
to a vcry limited extent, these problems. These designs a r e described in this chapter,
ant1 I givc illustrative examples from the social sciences, I,ccause social scientists
constantly facc problems which require more design ingenuity than those tackled by
'real' scientists. However, thc designs I cover here a r e of use in most kinds of Confound that variable!
experimental work. O n e of the major flaws in the experiment just described is apparent in the first
sentence -'students who had requesred tranquillisers'. Although it is stated later that
Examination nerves! students comprising both groups are indistinguishable, in fact they a r e not. They a r e
Before going any further. I a m going to outline an expcrimcnt, s o th:tt you can use different in that the first group asked for drugs, whilst the second didn't. Worse is to
thc fr:\mcwork tochcck that you have got the hangofall the tcrmsuscd in conncction comc. Although the groups were matchcd o n what the experimenter apparently
with cxperimental work. T h e experiment conccrns the cffcct of taking tranquilliscrs considered to be the relevant variables, they did not appear t o be matched on any
upon the cx;lmination performancc of students, and thc hypothcsis is that thc personality characteristics. It might b e that the students who asked for tranquillisers
students will I~cnclitfrom them. ascvidcnccd by their highcrcsam marks. ncforc the were completely different on an important personality dimension from those \vho
didn't. When a variable - arry variable, but the request for a drug in this example --
esarn. sti~dcntswho httd rcclucstcd tranquilliscrs were itlcntificd ancl locatcd, iitld
fornictl thecxpcrimcntal group. Stucicnts from thc snnic vcar ;Incl discipline \vcrc :~lso alters consistently with the two groups, i t is called a confounding variable. Usually.
sclczted. anti niatcl~cdfrom sccorcls with \tudcnts i11 tllc cxperinicnt;~lgroup on such confounding variables go untletectcd at the stagc of experimental design, a n d arc
v:~ri;~l)les
as previous cxom rc\~tlls.IQ, gcndcr, mcdic;rllrccord : ~ n dcxam subject. particularly nasty becausc thcy d o not affect the experimental and control groups
Thcsc formed the. control group. l'hus thcre werc twogroups o l ' s ~ ~ l ~ j c\vhich
c t s wcre, cqually, as 'nuisance' variablcs d o . T h e variablcs which were uncontrolled in the
ar groups, indistinguishilblc on all thc criteria uscd in matching. All sul~jcctsin thc
tranqt~illiscrlcxamstudy, such a s anxiety, fatigue, meals, a r e examples of nuisancc
cspcrimcntnl group-\\.crc ~ i v c na tr;~ncl~tiIliscr
two hours beforc thcir cxanl. 'l'hc \~ari;~bles, and considcred to bc fairly harmless if affecting both groups equally, which
control group suhjccts \vcrc asked not to take any drugs Hcforc tllc csilrn. ;tlthough is the a s u m p t i o n made. Confounding variables altersysrernaricully with the IV. and
103
'The design of experiments The dcsign of expcrimcnts

r o may actually bring about a changc in the D V which iswrongly attributed to the IV.
It1 thc cxpcriment just dcscribed, it might b e that thc studcnts who askcd for
tranquillisers believed that their work would subsequently bc bctter - and as we all
know, such expectationscan materially alter outcomes. T h c dcsign of this cxperiment
would be improvcd considerably if tranquillisers had been given to menibcrs of thc
experimental group without their knowledge - and if the group comprised studcnts
who had not requested the drug. Giving people drugs without thcir awarcncss is
unethical, howcvcr, and a s a conscqucnce, much drug rescarch is conductcd'thc
othcr way round. Both scts of subjects (who have already agrccd to takc a particular
drug) a r e given what appears t o b e t h e drug, and what they a r c all led to bclievc is the
drug. In fact only half the subjects, the experimental group, actually receive the drug,
a n d the remainder a r e given a n identical-looking inert substance, a placebo. In
clinical drug trials the nurses who actually give out the drugs d o not know who is
getting what, either, although of course they will be aware that some patients will
only be receiving placebos. This procedure is called the double-blind, and it is meant
t o ensure that the administering staff don't differentiate, even if unconsciously,
between patients receiving different treatments. T h e point is that in a good
experimcnt, the experimental a n d control groups should be indistinguishable, s o that
any difference in outcome can be truly attributed to the IV. If aconfoundingvariablc, 3 A n cxpcriment is undcrtakcn to judge thc efficacy of two types of literature
rather than the 1V, has created a significant result, this would have to bc classed as a devoted to post-natal baby carc. O n e batch of literature is distributed at Poshgrovc
Type 1 error - if it was discovered! clinic, and the other a t the nearby Backstrect clinic. Changes in maternd attitude arc
compared for the two kinds.
Confounding vuriable: clinics - together with their staff and customcrs. Whcrc the
literature is distributed, and how, matters. Not only might the type of client at thcsc
placcs differ substantially and systematically, possibly as a result of clinic location, but
the staff in the two places might also differ in their attitudes towards their clients, and
Spot the confounding variables - examples towards the distribution of educational material. If all the Poshgrove clients. after
receiving rather dry and tcehnical information on baby care, had apparently hcedcd thc
1 A gardener, wishing t o compare two varieties of begonia, counts thc flowcrs in ten litcraturc, thcn thc results might suggest that this format wassupcrior to thc comic strips
plants of each type. H c has placed all t h e pots of o n e varicty on o n e side o f thc distributed at Backstrcct. However, it might really bc that the effect of the literature was
greenhouse, a n d the remainder on the other side. O n e side gets considerably much a function of the type of clients reading i t . The Poshgrove ladies may well rcad. mark,
more sun than the o t h e r . . . and inwardly digest all the written information thcy rcceive, whilst tile Baekstrcct clicnts
Confounding variable: position in thegreenhouse,and the subsequent cffcct ofdiffercnt might regard papcr handouts as useful things for lighting thc firc, or wiping the baby's
amounts of sunlight. The two varieties should have been mixed together. bottom! Thc manner in which the clinical staffgave out the information might have had
2 Professor Grimrap starts t o run the first group of subjects in an expcrimcnt on an effect, with thc Poshgrovc staff carcfully making sure that each clicnt reccived thc
anxiety and performance on a visual task. T h e last subjcct in thc group attacks him, information. and the ovcrworked Backstrcct staff first of all forgcttingrhat they had to
distribute the IcaBets, and thcn, in their hurry to disposc of thcm, shoving handfuls out
and as the work has t o go on, his pretty assistant Miss Smile, runs the subjccts who
to all conicrs- including illitcrate mothcrs!
make up the second group.
Confounding variablc: experimenter. Evcn if Grimrap and Smilc wcrc not as different
as their namcs would suggest, it is not good for onc pcrson to run onc set of subjects and What is to be done? Counterbalance!
another the other group. Even when cxtrcme carc is takcn to standardisc thc proccdurc Whcn any variable is spotted as likcly to influcncc the rcsults of an cxpcrimcnt. WC
and instructions 10 thc subjects, the effect of an imposing-looking male, as opposed to a try to control its values if possihlc, and hold thcm constant for all valucsof rhc IV.
less-threatening female, cannot be entirely ignored. This is particularly important in However, it is usually impossiblc t o control all such variables at once, and so what wc
vicw of the naturc of thc cxperiment -Professor Grimrap may arouse anxiety Icvels,
whilst Miss Smilc may provc to be a more attractive visual targct than the expcrimcntal must avoid isletting any particular variablc cxcrt its influcncc in a systcmatic manncr.
one! If the professor and his assistant had run equal numbcrs of subjccts in each group, and thus give risc to a confounding crror. T h e solution is a common-sense o n e . We
then the effect of thcir personalities (if any) could be rcgardcd.;~~a nuis;~ncevariahlc, try t o arrangc our cxpcrimcnt so that suspcctcd confounding variables arc spread
affecting both groups equally, rather than a confounding variable having a diffcrcnt equally across all experimental conditic~ns.O n e p;~rticularway of doing this is known
cffcct on each group. as courrrerhalatrcirig.
104
The design of evpcrimcrlts The design of expcrimcnt.;
Stlppose we design an experiment in which subjects have to come to the lab. and
Over the long run, i t will be found that approximately half the subjects experience
learn so~nethingundcl. one of two possible conditions. We suspcct that time of day
the experimental and half the control conditions. It is usual, towards the end, for
might influence their performance, but i t is simply not practicable for us to run one
experimenters to abandon coins, pins, pencils etc., and allocate the remaining
single subjcct every day at 10 am -which would make 'time of day' into a strictly
controlled variable. Instead, we will counterbalance 'time of day' so that it influences subjects in such a way that the numbers participating in each condition are perfectly
subjects in both groups equally. We will work with half our subjects undergoing both equal. All subjects wcre allocated according to a randomisation technique however,
including the last ones. This is because i t was chance effects determining the
conditions (one subject in each condition) in the morning, and the remaining halves
of the two groups in the afternoon. If we think that dividing the day into justvtwo allocation of earlier subjects which settled the fates of the later ones. and not just the
blocks isstill not enough, because 'early' morning differs from 'late'morning for most fact that they turned up later in the day (another potential source of bias).
of us, then we could alternate the two conditions throughout the entire day.
If the experimental condition is called A , and the control condition B , we emerge,
when we alternate thctwo, with a pariicular type of design known as (and get ready
for it!) the ABA B design. There is one other deqign comnionly used in counterbalanc-
ing, and i t ir especially useful when learning is involved and subjects are acting as
their own controls. If the experiment consistsof subjects learning two roughly similar
sets of information, one under condition A and the other under condition B, we
might feel that the first learning experience will affect the second for betteror worse,
and systematically, if we let all the subjects learn the material in the same order, AB
AB AB. So we alternate the order and have half thesubjectslearn under condition B
first. The design is thus known as the ABBA design. If theFeare two conditions, and
subjects have to make hundreds of responses under both, the ABBA design is ideal,
The word 'random', used in connection with selection procedures, does not mean
for it ensures that A comes before both A and B, and that B also occurs equally often
before A and B. In the ABAB design, you may observe, A aever occurs before A , haphazard, or even uncontrolled, but rather 'by chance'. As the subject draws near,
the decision about his o r her fate is not determined by biasing factors, but strictly
and B never before B. In these ABAB and A BBA designs the pattern is repeated ad
infiniturn, I should add. When we have finished runningthe experiment, the results according to chance, so that the probability of enteringeither group is the same.
from each of the A and B conditions are put together into twocomplete samplesof A When you think about it, i t is really common-sense to take certain effects into
account, and to attempt to prevent them from influencing one of your groups more
and R for statistical analysis. There are methods of analysis available which tell you
than the other. Counterbalancing and randomisation are systematic attempts to deal
whether in fact any 'order' effect has occurred -but they are rather beyond the scope
with this potential source of error.
-. .:--ofthisbook. -- - '

Are you related in any way?


One of the major considerations in planning an experiment and its subsequent
In the previous section, possible confounding errors were dealt with by systematic
:~nalysisis whether o r not the two samples you intend to compare are relnted, or
counterbalancing. This technique involves deliberate alternation of conditions,
rtlntched, in some way. The statistical tests described earlier wcre of two sorts,
according to a previously chosen scheme. Another strategy is-to distribute the values
besides being parametric and nonparametric. One kind was meant for samples
of the variable i n question (in the last example, the two conditions) across the trials
derived from two separate (although at least generally similar) sources, and the other
in a random manner - a technique known as randon~isation.
for samples which were matched. By this we mean that every score in one sample has
For instance, if time of day was a possible confoundingvariable to be dealt with, as
a 'p:lrtnerl in the othcr. It would make nonsense of the results to shuffle the order of
subjects turned up to take part in an experiment, they wouId,be allocated toeitherthe
one ofthe setsof scores if they had been individuallymatched with the othcr set. With
experimental o r the control group condition by the experimenter tossing a coin o r
the unrelated samples tests however (the Mann-Whitney and independent t tests).
using a table of random numbers. In using the latter, the experimenter would take
this is not the case, and the order of scores within the sample i s quite ttrhitrary. This i s
. theshcct of numbers, and without looking i n detail, place thi~point of a pin or pencil
underlined by the fact that these testscan becarriedout onsamplesof unequ a I slzes. .'
soniewhere on the page. He or she might have decided bdorehand that if an even
Although statistical tests are mainly concerned with whether scores are paired o f f
.-.. number comes up the subject goes into thc experimental gfoup, and if an odd
or not. i t would not make sense. experimentally. to compare groups of scores which
.. -. - - .number, into the control group. Either way, forces of natuye: and not undreamt-of
had come from very disparate sources. For instance, if you wanted to see whether a
L

biases of the experimenter, determine the victim's fate. (Nate: i t is wise not to alarm
..- . - new teaching method helped children to ;lcquire a larger vocabulary, you must make
- -. ~ . - ?- -- . . subjects unneceSsarily by-allowing them to witness rando&llocation procedures!)
. ,
. : -
.~..
C---. . - . ~

~-
_--. - sure that the children forming the experimental and control groups were fairly
106
The design of experiments The design of experirncnts

similar. It would not b e a very fair test if group A compriscd childrcn fro111a class for
gifted children, whilst group B was madc up of youngsters with reading problems!
This seems perfectly self-evident, but we always havc to bc o n the lookout for
less-dramatic manifestations of this problcm. Evcn comparisons bctwecn classes of
children within the 'normal' rangc of acadc~nicability, but with different home
backgrounds, o r betwccn classes with different tcachcrs, could he unfair. What kind
of error does this constitute? T h e old confounding error - s o as far as possible, the
children in both groups must b e equated. Howcver, making sure that groups are
generally similar before the experiment starts is not the same a s matching.
Matching always involves a careful pairing off of individuals o r individual scorcs o n
the basis of one o r more variables. Y o u can easily tell whether matching has taken
place in a design, or is u p t o the standard required for a paired statistical conlparison,
by taking the first score of one group, and asking yourself whether it has an obvious
partner. If the answer is 'No', then the samples are not matched.
From a statistical point of view, whether or not groups are equated doesn't matter.
Statistical analysis is completely divorced from the topic under consideration, and '
'&RRoT oR S f i M EWEARU-i IrJ THE UA5H100kl
will simply tell you the probability of samples being drawn from the same parent
population. However, tests for matched samples differ from those for unrelated
scores in that they take into account the extra degree of matching. A s you saw in problem associated with its use is that when subjects take part in experiments over a
exercise 2(d) (chapter l l ) , sets of scores can differ significantly whcn they are longish period of timc (i) they start t o understand the task requirements better and
matched, but b e rcgarded as indistinguishable when they are treated as unrelated improve with practicc, and (ii) they start t o get bored and tired. It is traditional to
data. S o the statistics test just does its job according tq the types of data fed in, and assume that these pracfice and fatigue effects cancel each other out, and s o d o not
whether the samples are matched. To obtain matched scores, we normally follow one create a confounding error which would cloud t h e effect of the IV. It is also possible
of two possible procedures. Either WC can ask subjects to act a s their own control, s o for subjects to provide more than two scores, and their performance can even be
that they each provide two scores, o r else we pair indi-viduals off o n s o m e basis, s o analysed in such a way that any consistent trend they show will b e discerned. Like the
that the o n e score each givcscan be matched with its partner. These designs are called 'order' effects 1mentioned earlier though, these advanced designs will not b e covered
the repealed measures and matched subjecrs designs respectively. in this text.
T h e repeated measures design, culminating in a 'matched' statistical analysis, is
The repeated measures design o n e of the neatest and most powerful of the simpler designs. Unfortunately, it cannot
b e used as often a s we would wish, because in many experiments, when a subject has
A s the name implies, subjects repeat their performance, but under slightly different taken part once, he, she o r it cannot participate again. Learning experiments arc
_, --
,_ -.-- ----
_ *.,__i
conditions. If we suspect that order may act as a b i a ~ i n ~ v a r i a b l ethen
, we can particularly vulnerable in this respect. It is not possible, for instance, t o ask someone ,
c o m i ~ n the
e repeated measures d e s k n with an ABBA%esigli~,>s shown in table l . t o learn certain items under noisy conditions, a n d then-perhaps the next day -learn i
- the same material under quiet conditions. T h e test material can only b e learned once,
Table 1 and even if the gap between sessions is quite long, a certain amount will have been
. .
retained, making learning quicker on the second occasion, whatever the experimcn-
Subjcct Condition A Condition R tal conditions. O n e way round this problem is to obtain two very similar sets of
material for learning. However, this is not a perfect solution. for you never know
First Second what personal associations the material can trigger for the subjects, making the
Sccond First
... . .. material of unequal difficulty for them.
First Sccond Often, it is nccessary to debrief subjects after a n experiment -i.e. tell them what
Second First cl?. - the whole thing was all about - and this provides another reason why subjects may
not b e able to participate in an experiment on a second occasion. It is particularly
If w e d i d l o t consider the order of conditions be an ii&xtunl variable at all. then common in experirncnts undertaken by social psychologists, in which it is essential
&&could
. =- ~. skip_thc ~o~njc'rbalancingentircly.
A-

identical conditions. .
and have c$chsub~cct
- .. participate under that subjects are not aware of the aspects of their behaviour which arc underscrutiny.
O n c e they havc this awareness, then of course their behaviour alters. Not t o tell
i1
This kind of dcsign, bccause it provides scorcs w d k h will bc compared from subjects about the true purpose of an experiment (and often t o deliberately mislead
'within' cach subjcct. js sometimes callcd thc rc~itlrir~sltl~l;e_c~,dcsign.
T h c most scrious them) is a form of dishonesty, and many psychologists refuse t o d o this. However,
- ~
'l'lie design of experiments The design of experiments

some psychologists think that it is justifiable, and especially if subjects are informed Unfortunately, for the psychologist not involved in nature-nurture work, o r for
of the true nature of the experirncnt immediately after they have participated. The the research in medical conditions for which purpose-built twin 'banks' have been
work of Stanley Milgrarn (Obedirnce to nuthority, London: Tavistock, 1974), in established so that a ready supply of identical twins exists, twins do not appear all that
which he persuaded subjects to administer what they thought were strong electric frequently. It may be that they are wisely avoiding the vicinity of psychology
shocks to other subjects. is a source of much controversy. The problem of what departments! Therefore, although the use of identical twins for matched subjects
constitutes unethical conduct can only be resolved by the individual at present, in the designs is ideal, it is not normally practicable, and the less-satisfactory alternative of
absence of any general guidelines for psychologists -in Britain, at least. matching unrelated individuals on the basis of relevant variables must be employed.
Finally, subjects may not survive to take part in a re-run of an experiment. In Often, members of the experimental group will have come to our attention because
medical research rats and mice are commonly used to evaluate new drugs, o r in of some unusual condition they show, and so we would then attempt to find others
studies of brain tissue and body organs. After the experimental treatment has taken who are similar in as many respects as possible, but who do not display the condition
place it is often necessary to kill the animal painlessly, in order to inspect the tissue itself.
to determine changes which have been brought about. Work carried out on the effects of premature delivery, or low birth weight, upon
later intellectual, physical and motor development contains some good examples of
matched subject work. Infants born prematurely in the same hospital, and to mothers
from the same population, can be matched with babies which are within the normal
weight range. There is one aspect of this work though which makes it very difficult to
draw any safe conclusions about the effects of low birth weight. This is that the factor
or factors which helped to bring about the premature birth may often continue to
exist after birth, usually to the detriment of the developing child. Social class and
malnutrition are two such factors which can play an indirect and direct role
respectively, on the developing foetus or child.
Of course, matching isn't restricted to experiments .in social sciences. In any kind
of work involving animals and inert materials, individual animals o r specimens can
be paired off with each other - once again, on the basis of important characteristics.
For economists and geographers, whole communities can be matched for certain
studies. In examining the regional incidence of diseases, epidemiologists (and there's
The matched subjects design a word which will impress your friends!) might compare communities, one which
shows a high incidence of a particular malady, the other a low incidence. Differences
In the repeated measures design, every subject contributes two scores for analysis, between the two, likely to be causal factors, would then be sought. For ethical
each score going into a different sample. There is another kind of set-up which gives reasons the research must be done this way. It would not be possible to take two
matched data suitable for related statistical tests, and it occurs when 'individual carefully matched towns, and then see if a high incidence of a particular disease could
subjects are paired off precisely with a member of the other group. In order to get the be induced in one of them!
pairing precise enough, it is common to get one group of subjects together, and then
look round for partners for everyone. If you just took two groups of roughly equated
subjects, and then tried to pair them off, you would be lucky if you managed to match
up all the individuals involved, and so it is easierin the long run to look for individuals
with specified characteristics from the outset.
The obvious choice for matched partners is to procure identical twins. We know
that their genetic make-up is the same; most of their social background, and to a
lesser extent their intelligence, personality and other psychological characteristics
are also highly similar. Having procured specimens, we persuade the twins to take
part in an experiment, each under different conditions. Twins are the experimental
psychologist's delight, and this is increased no end if twins who have been raised in
different households are found! Any differences between the twins can be attributed
solely to upbringing and environmental influences, and so such twins figure
prominently in the 'nature-nurturc' arguments over relative contributions of
heredity and environment which abound in psychology.
T l ~ design
e of experiments

The independent subjects design 14 Sampling


The last kind of experimental design to be covered has already bcen uscd in many of
the examples given earlier in the book. It is a very common one, in which results are
derived from groups of subjects who havc bcen roughly matchcd. as n whole. For
instancc, two groups of childrcn at a school may be givcn two typcs of dcntal
treatment - but it would previously have been asccrtaincd that the childrcn in each
group had approximately the same numbcr of fillings, uscd thc sarnc brands of
toothpaste,-and ate about the same quantity of swcets. Individuals would not be
asscsscd with ;I vicw to pairing off though. I f this were done, then the dcsign would
be a matchcd subjccts onc, rather than an indcpendcnt subjects dcsign. Sometimes
this design is called a betweensirbjecrs design; as I mentioned earlier, it provides data
which are less likely to show a significant difference than matched dcsign data - and This is another chapter aimed primarily at life scientists, although the principles of
particularly if the change in the DV is only asmall or subtle one. sampling are universally applicable to scientific work of any description. Somc
knowledge of the content of chapters 6 and 7 will help you to appreciate the relevance
Summary of sampling, but it is not essential for an understanding of the information included
in this chaptcr.
Statistical tests, besides being nonparametric or parametric, arc also designed to
analyse independent or matched sets of scores. Fitting into these t~vocategories of
tests are ~ h r e etypes of design: Generalisation
1 Repeated measures (or within subjects) As life scientists we wish to understand and predict the behaviour of livingorganisms.
2 Matched subjects Thereforc we conduct experiments and surveys, and occasionally even find that yes,
3 Independent subjects (or between subjects) indeed, a certain variable does have a consistent effect upon something else. So
Groups 1 and 2, which generate paired scores, arc suitable for matched tests, whilst what? This is the big question which confronts us at the conclusion of an experiment
data from the third group would be analysed using an indcpendent (or unrclatcd) or study. Although we might have shown that something is the case for particular
tcst. This information is also given on page 198: organisms in a particular environment, clearly the aim of our labours was not just to
make detailed statements about single members of a species, or special groups of
animals or plants. We hope to be able to apply our findings to all the animals
belonging to thc species under considcration (often Homo sapiens), and which livc
under roughly comparable conditions. Thisapplication of findings from thc particular
to the general is called generalisation, and it is an aim which underlies almost all
scientific investigations, and whatever mode of enquiry is used.
The problem with generalisation lies in thc fact that we are generalising. \VC can
never be quite certain that our conditions apply to all other members of the species
involved, for the simplc reason that our study did not include all its members. WC
took a satnple. and it is purely from this that WC draw our conclusions. Howcvcr. thc
validity .and generalisability of our conclusions rely entirely upon how good our
sample was. If it was poor, thcn the findings will be of vcry limitcd valuc. and will
mercly tell us about what happcncd whcn orgnrrism X, under conrliriorrs .'l was
subject toprocedirre Z . WC can't say vcry much about species X, or the likcly effect
of Z on that species under conditions other than Y. So it is importilnt that WC arc
careful to obtain good samples.

There's more to it than you think!


In principle, it sounds as though nothing could be simpler than taking a sample. After
all. wc 'sample'offcrings of food and drink casily enough, and pronouncc judgcmcnts
on huc11 offerings without hesitating ovcr possihlc methodological diflicultics. But in
113
Sampling Sampling

practice, good sampling is far from easy. You can sec this from your own experience find that judgements of 'good' will predominate. There will be several 'fairs' and
of sampling, if you give the matter some thought. 'very goods'. but relatively few 'undrinkable' and 'superb'decisions. This proportion
It is just possible that you visit your local pub fairly regularly (strictly for of comments reflects the underlying normal distribution, and will remain the same,
observer-participation studies, of course!), and s o you will prob:~bly have noticed whatever the size of the sample, once the first few judgements have becn made. A s
that during the evcning at least o n e person will comment upon the state of the beer the sample gets larger, the more it can bc trusted to be representative, and after it
that night. Beer varies - the causes of this variation I~cingperhaps hest known and displays a pretty solid looking normal distribution, further efforts a r e unnecessary.
understood by publicans, but giving rise to seemingly endlcss speculation by the ~ h ;faith we can have in a sample isn't just a matter of knowing how large the
customers - and each visit to the pub provides an opportunity to obtain and 'test' a sample is however. It also depends upon how well the items constituting the sample
sample. You must realise intuitively that any one occasion does not necessarily were chosen. If they were picked in such a way that every part o f the variation which
provide a good indication of the 'usual' quality of the pub's beer, and inevitably the exists in the parent population has a n equal chance of selection, then all is well. But
beer closely resembles water o n the day your rival darts team visits to .play- -you 'at if sorne parts of the population have a better chance than others, then the sample is
home'! Individual samples, i.e. pints of beer, which a r e obtained daily over a period not a good one, and becomes what we call biased. Just because items taken from a
of weeks, will be combined t o give a proper sample - a reasonably sized portion of population keep having,roughly the same value is not necessarily an indication of
the parent population, the pub's beer. In a sense a single pint of beer is a sample, but bias - provided the selection procedures were adequate. In thc beer example just
it is such a tiny one that it could not safely be used as a basis for sound judgements. given, selection procedureswere fine, and the many judgements of 'good'were a true
U p to a point, the larger the sample, the better it is, for it is more likely to truly reflect reflection of the mean value. However, if all the daily beer samplcs had been taken
the characteristics of the parent population. Exactly where the 'point' lies will be right at the start of every evening, on the first few pints drawn, this would have
discussed later in this chapter. constituted a poor sampling strategy which would probably have resulted in the
researchers reaching the erroneous conclusion that the beer quality was mainly
'poor'. Sometimes, we take samples from populations about which a great deal is
known. A t other times, we may feel we know nothing about the characteristics of the
parent population,
. ~
and the whole point of taking a sample is to discover what it
comprises. In this case we can't check the accuracy of oursampling by comparing the
results with any 'expected' values, and the snag is that if the sampling strategy is a
poor one, further samples will only add more poor data to the pile. Unfortunately
there is no handy way of recognisinga poor sampling procedure; basically, you need
to think intelligently about the features of data collection which are likely togive rise
to biased results - and then try t o avoid them!
Poor samples can be disastrous. First of all, they can provide misleading
Suppose we take all the draught bitter sold at a particular pub over the course of a information about the characteristics of a population and result in errors which are
year as o u r parent population, and o n the basis of sampling decide that we wish to costly in termsof money o r well-being. If I think, from the basis of talks to teachers,
make a pronouncement on its quality. (This could be the preliminary stage in the that this book is God's gift to students, and ask f o r a million copies to be printed, I
compilation of The Statistician's Guide to Public Houses!) First, we would decide would probably end up in the modern equivalent of the debtor's prison!The teachers
upon an objective scheme for judging the quality of the samples."Objective' means who gave m e their comments a r e friends - d o you think that they form an unbiased
something which is publicly examinable, as opposed to more private s~ibjective panel of judges? And what about the bits o f thc book upon which they were basing
a
feelings, which are unique to an individual. WC could construct five-point~scale, their judgements? Did I wavc the b o o k o p e n at random, o r make sure that they saw
ranging from 'undrinkable', 'fair', 'good' and 'very good' to 'superb', and spendsome all the best drawings and funniest cartoons?
time sampling beer to decide which quality of taste will be put into which category. In obtaining judgementsabout a book, or gauging the quality of a pub's bcer, our
In important studies, when several people arc involved in making judgements like main concern has becn to draw o n e good sample of data. In both cases we were
this one, and on a rating scale specially constructed for the task, we like to be sure dcaling with opinions, but we could draw single samples of virtually anything which
that thcy are all making comparable judgements. In other words, they have to be in provides some kind of numerical score.
agreement over how thcy \\,ill categorise the samples. When their agreement isgood, Now WC will look at sampling crror in connection with statistical tests, when we
we speak of high irlrcr-jrrdge reliabilitj~.Whcn the judges are in agreement over the normally draw two samples and compare them with each other. Let's return t o the
matter of categorisation. then the sampling procedure proper can bcgin. analogy-of the biologist busy obtaining samples of water from ponds, who was
Let's say that unknown to the judges, the parent population of bcer overall would described at length in chapter 6. As you may recall, he lost the labels off the
be judged 'good', and has a normal distribution. Samples collected over a reasonable test-tubes, and left the samples on the table. His colleague, who was unaware of the
time period will be combined to form a large ovcrall sample. and in which we should sources of samples, had to decide, on the hasisof their constituents, whether they had
Sampling
Sampling

comc from onc or tw-o ponds. If thesamples lookcd diffcrcnt, and contained the diffcrcnt
micro-organisms, shc would be ablc to conclude that they had been drawn from two
separatc ponds (populations). However. it might be that tlic field workcr had in fact
taken two water samples from one pond. but that one had becn collcctcd from near
thc mud at the bottom, whilst thc othcr carnc from a plncc whcre a clear stream
cntcred thc pond. Would you cxpcct thc contents even to look thr same in this casc?
Of coursc not, Ncithcrsamplc is truly rcprescntativcof thc bulkof pond watcr. which
is somcwhcrc bctwccn thc two cxtrcnlcs of muddy and clcar. So thc c o l l c a ~ u c in,
deciding that tlic sanlplcs wcre drawn f r o n ~two ponds, would bc mistaken. By
incorrectly rejccting the null hypothesis (which would havc stated that thc sarnplcs
wcre derived from onc sourcc) shc has committcd a Type I or Typc I1 error. Can you
remember which one?
Similarly, poor sampling might lead to an erroneous conclusion of the other Type.
Somctimcs pcoplc volunteer for cxpcrimcnts hccausc thcy nccd something. I t
If the biologist had obtained muddy watcr from thc bottom of an othcrwisc clcar
)night bc moncy, i t might be advice and hell, with somc personal difficulty thcy arc
pond, and compared it with water takcn from a cornpletcly muddy onc, it might be
cxpcricnci~~g, or it might bc bccause thcy need to alleviate thcir boredom or
concluded that the samples came from the same source, when in fact they had becn
loneliness. Whatcvcr the nccd, such pcoplc - and especially the ones who turn up
drawn from two ponds. This would constitute a Type I1 error (the null hypothesis has
without much pcrsu;~sion- are probably particularly unlikely to form a good
not been rejected), and thc first example a Type I error. I hope you were ablc to work
represcntativc samplc. It is not that having problems o r being short of cash makes
that out yourself!
pcoplc 'abnormal'- it is just that thcsc subjects are at the extreme cnd of the problem
These examplcs should help you to see why, if wc arc to generate data which is
continuum, as cvidenccd by tlic fact that the severity of the factors is forcing the
trustworthy and useful for generalisation and analysis, we should go about sampling
would-be subjects to take some action. Evcn people who volunteer for studies
very carefully.
from the purcst of intentions, i.e. who would like to make a contribution to
scientific knowledgc, have to bc rcgarded as somewhat different from their fellow
humans!
What's t h e problem then7 So, in recruiting people to take part in experiments, i t is difficult to find
1 have already hinted that it is harder than it sounds to obtain 'good'- meaning reprcscntativc mcmbcrs of the socicty under consideration. Even if we didn't have
'representative' - samples. By taking samples in social sciences we mean obtaining all the 'volunteer' and othcr sorts of bias creeping in, it would still be difficult, for we
scores o r observations from people (samples of cve~rts)under certain conditions don't really havc a vcry clear idca of what characteristics a 'typical' rnembcr of our
(samples of et~virotrmental variables) o\.cr a specified time period (samples of socicty would show, or how to mcasurc all the attributes which we ought to consider.
rime). All these things vary - with tremendous consequences; and this is one of And evcn when WC know roughly what we are looking for, there still might remain
the reasons why work in social sciences is so much more complex than that of the problem of actually finding the intended victims.
the physical sciences, in which inert substanccs do not vary appreciably in the Suppose you want to conduct a market research study on thc rcsponse ol'nornial'
'normal' environment over a short period of time. schoolchildren from middle-class homes to somc new-fangled toy you intend to
The people who turn up at the psychology lab door as potcntial subjccts are a develop. Having dccidcd that 'normal' covcrs ccrtain values of IQ. numbcr of
sample of human beings - but who kno\vs what kind of samplc thcy conll,risc? parents in the home, siblings, dogs and hamstcrs, schol;~sticachicvernents, pockct
Almost certain to bc biased, at any ratc! It is oftcn quitc difficult to procurc subjccts moncy, parental aspirations, ctc., and that all thc childrcn you arc interested in will
for experimental work, and usually thcy have to be bribed in some way - by direct be between ten and fourtccn vears old. how do !.ou actually localc the ones you wish
payment, reassurances that despite appearances thcy are contributing to a worth- to intcrvicw? Just walk into the nearest school? It might bc quitc an unusual school
while experimental undertaking, or by repayment with baby-sitting offcrs. ctc. though - perhaps onc for subnormal childrcn, or with ;I catchment arca in a very
Somctimcs coercion cvcn comcs into it. Thcrc is an clcmcnt of this in the wcll-to-do district. Clcarly this is not s;~tisfactory.I f your marketing plan was aimed
rcquircmcnts laid down by some psychology departments, that in ordcr to satisfy at all thc childrcn in thc British Islcs who fittcd your rcquircments, then to obtain a
university regulations studcnts must participate in thc staffs experiments for at lcast reprcscntative samplc you would have to obtain childrcn from several locations
x hours per year. A considcrablc proportion of psychological rcscarch has bccn throughout the country. And the first and most pcrsistcnt agent operating against the
carried out in which studcnts have been used as subjccts, and psychologists facc sclcction of good sarnplcs rears its ugly hcad - tlic cost! Limitations of time,
criticism on that scorc. Studcnts are by no means typical mernbcrsofthc humall racc cquipmcnt and personnel usually boil down to financial rcstrictions. and exist in
- o r cvcn of the society to which thcy belung. for that matter! addition to all thc othcr b~d~ctingconsiderations which bcsct c\.ery rcse;~rchprojcct.
Sampling Sampling

I f we did have vast resources, and came up \virll a list of all thc suiti~bleschools to
include in the parent population from wliicll the sample will I)e d ~ a w nwe
, still have
the problem of selecting individual schools from the list. Human nature being what it
is, the research personnel, if left to their own devices, would nodoubt opt for holidays
in Cornwall, Wales o r Scotland, and the matter would be decided. Not good enough!
Some poor souls must be despatched to Sallord, Battersea, Scunthorpe and other
unattractive sounding places; in fact most of them must be, because these industrial
areas, being fairly densely populated, will contribute more schools to the list than the
rural areas. And after having arrived at the scliools, do the researchers proceed to
look for the quietest, most innocuo~ls-lookingcreatures to interview, and shy away
from those specimens which look as if they have stepped straight out of the pages of the better they will carry out the task. I am sure that wecan all imagine student leaflet
a comic? Again, if they had their own way. no doubt they would! Once more, this distributors at work, stuffing old handfuls of literature into the letter boxes of empty
'

would be biasing the sample, and to avoid this. aNchildren possessing the attributes houses, o r behind bushes here and there! The more subjects a person has to interview
listed must stand an equal chance of selection. The use of a school register and or use in an experiment, as a general rule, the less likely he o r she is to carry out the
random number table would do the trick at this level. job carefully. The less well-trained people are, the less seriously they will take a
study, and the less likely they are to understand the subtle nuances of the task in
hand. So this all comes back to cost again, although there must be a point at which,
even with all the financial resources one could wish for, it is still impossible o r
impracticable to tackle a complete population of subjects.
Finally, as I mentioned in the previous chapter, in some kinds of work - medical
research, for instance - the experimental animals must eventually be killed so that
their tissue can be studied. Obviously, it would be impossible and undesirable to
work on a complete population of rats o r mice; after the show was over, there would
be no members of the parent population left to whom the results could be
generalised! The whole point of sampling though, is that if acareful selection is made,
it isn't necessary to go on adding to it, a d infiniturn. From a good sample,
generalisations can be made with some confidence.

Random sampling
Besides considerations o f cost, there are other reasons why we are forced to use In this section, I am going to outline the three main types of sample-drawing
samples. Sometimes, our population may be of infinite size, and it would be technique which are commonly used. Because all three methods are based on the
impossible to scrutinize every member o r item. If we are studying personality, orwish principle of random selection, meaning that at the outset, all possible items included
to measure it in some way, we cannot wait until people have lived their entire lives, in the population have an equal chance of selection, they are known as randotn
and then assess aN the characteristics they displayed during every moment. We have sampling. 'Random', in this context, does not mean 'haphazard', and in fact it means
to give a person a 'test', i.e. take a sample of their reactions o r behaviour within a anything but haphazard! It doesn't mean selected according to the whims and foibles
limited time period and treat this as a sample from which we can generalise to much of the researcher, either. It implies a very careful pre-selection plan, which has been
of the remainder of his o r her life. But when you think that only a few hoursgoes into drawn up to ensure that all items in the parent population have the same chance of
the test - if that - and that the results are meant to be generalised to a time period appearing in the sample.
embracing months o r years, your faith in such psychological tests and their results
goes a little limp. It wouldn't even be so bad if test scores presented a true picture, 1 Systematic sampling
but who, under close scrutiny from another person, behaves in an exactly typical The essence of this method is that each member of the population under study (for
fashion? instance the inhabitants of a town) is given a number, and then a subgroup is taken
Another practical consideration. and one which influences final sample size, is that for study. The term 'systematic' refers to the fact that the population has been
if we attempt to measure allmembers of a population - o r even a substantial portion numbered olf according to some convenient system, such as an alphabetical list of
-limited resources mean that this may bc done less well than i f we had taken a smaller names, class registers o r consecutive house numbers. Any kind of complete listing
sample, perhaps using fewer, but more highly trained, personnel, o r a more precise which is available may be used. Random number tables are made up from long
method of assessment. The more highly trained (and hence paid) the 'samplers' are, sequences of jumbled digits, in which all the numbers from 0 t o 9 will, in the long run,
119
Sampling Sampling

appear equally often. Although single digits are uscd in table co~istruction,W C call idcnrified, :rncl from one or Inore of these, the itcm for inclusion in t11e sample is
use the tables for numbers greater tlii~nI 0 I)y treating the digits as i f thcy wcrc taken. This n~ctliodhas the advantage of cheapness, in that by using only certain
groupcd into units larger than our maxi~numrequired numbcr. I f our sample of clusters, usually falling within a particular locality, subjects can be interviewed fairly
peoplc, objects or events was numbered I to 40, thcn we could obt;~inan order in conveniently, and the investigator does not need to travcl miles and miles bctwccn
which to usc them in a study by reading the digits from tlic table offill pairs, giving a cach one. Unfortunately though, this very advantage is also its weakness, for if a
minimum value of 00 and a maxim~~ni of99. Frorn t:~I)lcS9 at the end of this hook. in 17;~rtic~~la~- clustc~..I'ro~nwhich scvcral itcms wcrc t;~kcnfor inclusion in the sample,
wliich thc digits arc already groupcd i l l pairs, you would obtain tllc ~\uml)crs: was unrcprcscntativc in some way, thcn tlicrc will be a certain amount of systematic
19 90 70 99 00 20 21 14 68 86 14 crc. C ~ I -Io~ Ir- C S C I.I I
Thc first pair of digits. 19, is below 40, and so tlic pcrson with this numbcr would start
the ball rolling. The second person would bc numbcr 20, thcn 21, then 14, and soon, The man in the street
until everyone has been included. If the number of items in the list excccds 100, then Students commonly believe that using the 'man i n the street', perhaps sclccted 'at
in thiscase, the digits would be put intogroupsof three, toobtain an order. We would random' (e.g. by approaching every tenth passer-by), results in a better sample than
find that our first number was now 199, the second 70, followed by 990 and 20. We if the more usual procedure of roping in all available and willing students is followed.
can use random number tables to obtain a sample from a group of numbered items Also, therc is a kind of vague feeling that if 'real' people, asopposed tostudents, are
by taking the numbers from the table until we have obtained as many items as we wish to be uscd in a study, it is perfectly adequate to obtain people - litcrally - from the
our sample to comprise -e.g. fifty people from a population of a thousand. Quite street outside. In fact these beliefs are just as likely to give biased samples of the
often, researchers do not start at !he top left-hand corner of the table to obtain the 'general population' as the easier method of taking students from the dining room
first number, but close their eyes and point to a number somewhere in the middle of was. The only studies in which pcople walking the streets would make appropriate
the page. This then serves as the starting point. subjects would be those concerning things like the quality of the pavements,
provision of litter bins, aesthetic appearance of street lights, or proposed zebra
2 Stratified sampling
crossings. Any investigation involving people who have been plucked from their
This method can only be used when there is detailed knowledge of the population wanderings on the streets is going to miss out the tremendous proportion of humans
under study. Variables considered relevant to the sample are considcrcd, and when who are at work all day, or on night shifts, in hospitals, schools, convents or prisons,
tlie sample is taken it is made up of subgroups, each of which shows tlie vital factors, at sea, and so on. As these absentees from the street never had much of a chance to
and in the same proportions as in the parent population. Different values of the be included in a sample, a sample made up roughly from 'men in the street' would in
variables involved are known as srrura. fact be vcry far from random!
For example, if in a university there are twice as many undergraduates as
postgraduates, and a sample of the entire student body is desired, then the eventual
sample, if i t is stratified, will include twice as many undergraduates as postgraduates
also. Lists of students and a tablc of random numbers might bc uscd, and when the
correct number of items for one of the strata has been reached, then only items for
the other would be drawn. In fact systematic sampling, as outlined above, would
provide approximately tlie correct proportions in any casc, if it wcrc carried out
properly.
The advantage ofstratifyinga population before taking asample is that thcchanccs
of picking a deviant sample are smaller, and therefore estimates of population values
arc much more precise than is the casc with a simple random sample of the whole
population. The major limitation of stratified sampling is that it requires advancc
knowledge of the important factors within tlic population, and their relative The electoral roll in sample selection
proportions. Examples of factors which are often considered to be relevant arc age,
sex. social class, income and race. I f you are faced with the task of having to obtain hundreds of people to make up a
saniple which is sul)poscd to be typical of the general population of Britain, youmight
3 Cluster sampling wonder where on earth to start. Hopefully, you don't set about it the way some
This relies on the existence of natural groups, such as houses on a block, pcoplc in a political opinion pollsters in the States did a fcw years ago. They had the bright idea
family, or children in a class. These kinds of blocks or c1u.ster.s are numbered, and of obtaining pcoplcs' names from telephone directories. Can you spot the error
from them a random sample is selected, the nuniber used being mainly detcrn~incd therc? Tliey did~i't-and that year the election predictions wcre fantastically awry.
by tlic size of thesample required. Ncxt. from within cach cluster, subgroups are Only a sclcct portion of thc population has telephones, and especially a few years
Sampling Sampling

ago, it would be a portion tending to reflect a reasonable income and fairly high using only one investigator, or a variety of investigators \\rho have been thoroughly
standard of living - together with certsill political opinions, no doubt. The error of trained in the use of standardised instructions and procedures), or i t may be
using phone directories a s a source for names is a nice example of a procedure likely considered an IV in its own right and scrutinised accordingly.
to give a very biased sample. When particular stimuli are presented to subjects. then for later generalisation the
A much better way of obtaining names of peoplc who residc in Britain, and one far ones selected should be representativc of those encountered in the population to
less likely to result in a biased sample, is to use the electoral roll. It is a list of all voters, which later refercnce will be made. In selecting objects for inclusion, a list of suitable
and there is a legal requirement that evcryone over the age of eighteen shoutd be samples may be drawn up, and from this, random samples for presentation to the
included. Naturally, there will still be a few who escape the net - and you can bet that subjects, or for use in the study, can be drawn. This is known as stimulus sampling.
these people will show some form of bias, and probably in the opposite direction to When an investigator chooses particular values of an IV for an experimcnt, he or
the ones whose names appear in telephone directories. They will be the homeless, she is in effect taking a sample. Once again, if the findings are to be generalised, it is
those making frequent (and perhaps speedy!) moves, the illiterate, and those who important to make sure that this sampling of condifions is done appropriatety.
believe in ignoring official-looking documents. So no prizcs for deciding that the Finally, all investigations and measurements take place at a distinct point in time.
electoral roll is not quite as unbiased a source of names as the kcen researcher would How well the data from a sample can be generalised can depend upon when the
like to have at his or her disposal. It would be particularly important to make sure that sampling took place. The.example I gave earlier, of the first pints of beer drawn in a
the 'oversights' were contacted in research involving things like housing conditions, pub being used as a basis for judgement of all the beer sold at that pub, provides an
rents or low income families. illustration of poor time sampling.

The problem of sample size The general population


A major decision which investigators must make is the number of subjects to include 'General population' is a term to use with care. Strictly speaking, it means 'typical'
in a sample. This problem has no easy o r general answer, and each solution depends members of a society -or at least, the 80% or so who fall into thecentral valuesof all
upon a variety of factors which sometimes cannot be specified in advance. In general, the attributes which we take into consideration. Whilst the term doesn't apply to any
there are three main considerations. The first is the kind of statistical anatysis which group of people who are simply 'not students', neither does it refer to people walking
is planned. This is a complex and controversial question. Often it is possible to about in towns and cities! To do a study on members of the general population would
demonstrate that a significant difference exists between experimental groups when involve a great deal of careful pre-sampling planning, as I have already indicated.
each contains only a very small number of subjects. This is particularly the case when Don't confuse the population of a country with the slatisfical meaning of a
the IV has a dramatic effect upon the DV, o r when fairly precise interval or ratio population though. The statistical concept refers to all members of a group of people,
numbers are used to quantify a more subtle effect. 0" the whole, as an effect items or events sharing some quality - the population of a country refers to the
becomes more subtle, and thus harder to detect, larger groupsof subjects are needed people who actually live in a defined area.
before populations of scores can be distinguished. Financial considerations often
turn out to be a deciding factor in the long run.
Secondly, variability within the samples and results matters. On the basis of
previous experience, the expected variability can be takeninto consideration when
an investigation is planned. In some types of research much greater variation is
anticipated than in others, and so larger numbers of subjects would be used. Single
subject designs have played an important part in psychology though, the classic
example being the work of Ebbinghauson memory - with himself as the sole subject!
Finally, traditions develop in research areas concerning the appropriate numbers
for samples. The traditions are of course based on experience of work in a particular
field, and this in turn will reflect the relative importance of the factors which have just
been described.

Other kinds of samples


It is now realised that the sex, race and physical characteristicsof an investigator may
affect the kind of results obtained from subjects; this source of variation brings home
the point that in social sciences, the people carrying out the studies are themselves
part of a sample. This potential source of bias should be controlled (for instance by
122
Correlation

Now Ict's look at another case of association. Suppose I say that there is a
15 [orrelation correlatior~bctwee~ithe weight of clothirig I wear and the temperature outside. Thc
two variables I arn linking are weight of clothing and temperature - and clearly they
are connected. However, they arc not connected in such a way that an increase i n one
acco~npariiesan i~lcrcascin another. but just the oppositc. This is t~egufivecorrela-
tior~, and it dcscribescircumstances in which thc nrore there is of one variable, the less
therc is of the other. As the term 'ncgative'suggests, quantificationofsuch an inverse
rclationsliip is indicated by using a minus sign before the cocfficient, which will again
I,c n figure bctwecn O and l , but this time, - 1 (minus l ) , rather than 1 (plus 1).
Intermediate cases o f association arc given values which lie between O and -1, so a
correlatiou of -0.9 incans that the two variablcs have a very clearly established
ncgative association; of -0.5, that they arc inversely related to a moderate degree;
and of -0.2, that therc is only a tendency towards an inverse relationship. The
I shall now conclude the practical part of the book with correlu~iorc,a st:rtistica\
relative positions of correlation cocfficientsarc shownon the straight line in 6gurc 1.
technique which is unusual in that it can be used either in a descriptive capacity, o r a s
a means of drawing infcrcnces. It is not necessary to have read all the material which
has appeared earlicr in the text to be able to grasp what I cover in this chapter,
although an understanding of the difference between descriptive and inferential
techniques will help, and for the final section on probability, chapter 5 will providea Completely No relationship Variables
inverse is apparent are perfectly
good foundation. in step

A measure of association Figure I . The relative positions of various correlation coefficients

No doubt you will have come across the word 'correlation' before, and formed a Next, we can look at the relationships, between variables by means of diagrams, but
rough idea that it means 'association', o r 'going together'. first, it is necessary to understand how we draw ones which involve two variables.
'Expenditure is correlated with income.' Taking two axes, and joining them in the traditional manner, at the bottom left
'Cancer is correlated with cigarette consumption.' corner, the horizontal axis (or abscissa) will be used for measuring one of the
Statements like these may be familiar to you, and they tell us that the more income variables (A), whilst the vertical axis (or ordinate) will be for the other variable (B).
you have, the more you spend, and the more cigarettes you smoke, the more likely Mathematicians would probably label the horizontal axis X a n d the vertical axis Y,
you are to get cancer. Noticc that in both cases we are being told that the moreof one but 1 shall stick to A and B. Figure 2 shows A and B labelled for two variables, 'size
thing, then the moreof another. Rather than use fairly vaguc words like 'more'or 'a of garden' and 'annual income'. Each axis is divided up into the appropriate units of
little', mathematicians prefer to quantify things by using numbers, and so the measurement for the two variables. It is standard practice to draw graphs and dia-
mathematical technique of correlation was dcvised as a means of specifying precisely grams with the points of lowest measurerncnt meeting in the bottom left-hand corner.

t
the extent to which two things (variable's) are associated. The nunibers used to Annual
Annual
express correlation, o r cxtcnt of association, arc called correltrriorl coeljicierl~s.If two income income
mcasures are in perfect association, i.e. a great deal of one thing is always L20 000 E20 000
accompanied by a great deal of another, and when one is absent or nearly so. then the
other also has a low value, we have aperfec~posrtrvecorrelation, and the correlation
cocfficient which corresponds is the number + l . If on the other hand there is no
association between two variables, then we speak of there being no correlation, and
assign the number 0 to this situation.
As you may well have realised, most cases of paired varial>lesarc probably going
to lie somewherc between the values of 0 and 1. That is, they will be 'a little bit',
'Fairly' or 'very well' associated, rathcr than showing a pcrfcct relation~hip.Any
correlation which is less than perfect means that some of the pairs of scorcs from the
two variables don't quite fit the general pattern. All the intermediate positions Garden size in acres Garden size in acres
betwcen 0 and 1 can be stated numerically, and this mcthod isso satisfactory that it
Figure 2. Axes labelled for correlation Figure 3. Labelled axes with two points
becomes easier, and more precise. todeal with numerical descriptions rather than the
data plotted
vagucr verbal labels. 125
In correlation, we are always dealing with puiroti scores, and so values of the two-
variables taken together will he usetl to make :I tliagramn~aticrepresentation.
Suppose our findings tell us that we have a person with an income of 5000 pa, and
whose garden is 0.2 acres. A cross is marked on the diagram where imaginary lines
drawn from the two axes, at these values, mcet. It is sl~ownon figure 3, where it has
the number 1 beside it. Another person in our sample has an income of 20000, and
a garden of 15 acres. The point for this is labelled 2 in figure 3. and it lies where the
imaginary line drawn across from 20000 crosses that coming up from 15 acres,. So,
each point used in determining the correlation is plotted from all the pairs of scores
included in the data, and by using the two scales in conjunction. A completed
diagram for a case of positive correlation is shown in figure 4. where we int~nriably
have more of A going with more of B. Variable A
Point x shows where there is hardly any of either A or R. Figure 6. The cluster of points in zero correlation
Point W where there is an intermediate amount of both A and B.
Point y where there is a good deal of hoth A and R. which a line should be drawn in, so as to come closest t o the majority of points. Such
a line is called a regression line, o r line of bestfit. Unless you have actually gone
through the appropriate calculations, it is better to leave the line out, rather than
guess where it would go.
For the near perfect positive and negative correlations then, the points lie in
virtually straight lines pointing up o r down on the right, according to the relationship.
As correlation becomes further removed from perfect, and the coefficient starts to
approach 0 , so the points get rather morespread out, and move through an ovalshape
(again pointing up or down), towards a circular shape - the stage at which even the
most ardent optimist has to admit that no line is in the least bit evident! The shape of
a circle is formed because most variables will tend to give values which are normally
distributed. The central cluster of points on variable A will be halfway up on one axis,
and of variable B, halfway along the horizontal axis. Extreme scores on either scale
will be relatively rare, and on both variables together even rarer; they will be shown
Variable A Variable A on the scattergram by points which lie in any of the four corners.
Figure 4. Perfect positive correlation Figure 5. Perfect negative correlation The shapes of scattergram patterns associated with the varying extents of different
correlations are shown in figure 7.
Now look at the picture which is presented by negative correlation, and shown in When you write up any report which includes correlated scores, it is always best to
figure 5. A great deal of Bgoes with hardly any A . An imaginary line linking up the include a diagrammatic representation of the results, in the form of a scattergram
points plotted from a perfect negative correlation will always start high up on the drawn on graph paper. In the case of non-perfect correlations, but in which a line of
left-hand side and slope down to the right. points is roughly discernible, it is best not to draw it in, unless its angle has becn
What does a correlation of zero - no association -look like? You can't draw a line properly calculated.
connecting up the points which have been plotted, because the measures on the two
variables are so unrelated that all we sec is a large cluster of points (forming a rough Exercise
circle), as in figure 6. The point marked 1 shows where variables A and B both have 1 Draw scattergrams for the following pairs of numbers, and describe in words the degree of
high values; 2 is positioned where A is fairly high and B fairly low, and 3 where both association you consider to exist between the paired variables.
A and B are low. (a) A B (b) A B (C) A B (4 A B
2 2 1 2 1 7 10 1
These diagrams, representing visually the association between two variables, are 4 4 2 2 2 5 8 2
called scullergrams. When they are constructed, i t is fairly unusual for points to lie S 5 4 3 3 9 6 4
in an exactly straight line, and so it isquite rare to see an actual line joining them all, 9 9 4 4 5 4 4 5
as in figures 4 and 5. A more frequently mct pattern consists of several of the points 12 12 5 6 7 8 3 7
plotted lying more or less along a straight line. When such less-than-perfect 13 13 7 7 2 3 0 10
associations are shown, it is possible to work out mathematically the exact angle at 9 1
Correlation Correlatior

that the valuc of 6 which is used in Step 6 is unchanging, and does not alter with thc
size of sample.
In working out Spcarman's rho you have to rank the scores, i.e. put them intoorde,
of size. This ranking operation crops up as a preliminary step in several statistics
tests, and precise instructions arc given in schedule 6. The valuc of Spearman's r l ~ oi:
then calculated from ranks, rather than from the original values of the two sets o
scores. This means that it is quite possible to calculate a coefficient from data whict
are originally in the form of grades or ranks, and in fact Spearrnan's technique reall!
( a ) +0.9 A (b) +0.6 A comcs into its own in this respect, for many of the more sophisticated methods o
calculating a correlation coefficient require the variables to be given in rather more
precise units than just grades (i.e. interval o r ratio level of measurement must bt
achieved - for those of you who havc read chapter 10).
An example of a graded scale would be one which measures political affiliatior
along a continuum. According to their beliefs, people could be assigned a position or
the scale, ranging from, say, 0 to 10. Grades do not even need to be expresscc
numerically on the original scale though, and attitude measurement commonl)
* involves pinpointing a person's opinions on a continuum ranging from 'strongl)
A A disagree' at one end through 'disagree', 'don't know' and 'agree' to 'strongly agree
(c) +o. 1 (d) -0.3
at the other. Scalcs like this are called Likerfscales, and in using any data from then
to calculate Spearman's rho, the fivc degrees of opinion would be rated from 1 to 5 .
Suppose you obtained data which, when plotted on a scattergram, showed :
pattern similar to eitherof thoseshown in figure 8. When the plotted data points forrr
a curved rather than straight line, we have a curvilinear rnonotonic relationship

Figure 7. The direction and shapes of score clusters shown by various correlation
values

Spearman's rho
Notv we move on to the calculation of precise correlation coefficients from given sets
of data. There are a few methods in existence for calculating correlation coefficients Figure 8. Examples of curvilinear rnonotonic relationships
and. needless to say, the coefficients obtained from each are subject to slightly Although variables A and B arc both increasing o r decreasing together (i.e. as in :
different interpretations; fortunately, for niost of our work in social sciences, we do positive correlation), the rate of change in the variables is not equal. In the lef~
not need to be concerned with the mathematical subtleties. We shall start off by scattergram the fastest change in A is at higher values of B. Speed of change ir
looking at a simple and quick method for obtaining a correlation cocfficicnt, variableA is indicated by the angle at which the line lies to the vertical Baxis. It woulc
dc\.eloped by Spearman, and giving the coefficient known as rho. Rho is pronounced of course be possible for variable A to change at a different rate from A. Can y o
like 'row' in 'rowing boat', and is the Greek version of the letter r - this letter of the work out what the scattergram would look like in that case? Such non-lineal
alphabet having been chosen to indicate a correlation coefficient. relationships do not mean that a correlation o r association between the variable:
Step by step instructions for obtaining Spearnian's rho are given in operation does not exist, and Spearman's rho can be calculated on scores showing a monotonic
schedule 14. After you have completed the calculations, you should end up with a curvilinear relationship without any problems. However, if the eurve in the line start:
+
number between - 1 and 0, o r 0 and I . I f you obtain a coefficient which is largcr to change direction, as in figures 9(a) and 9(b), becoming arched or U-shaped, ther
than I , you can be absolutely certain that you have made a mistakc somewherc in even though there may still be a very orderly relationship bctwcen the two sets 01
your calculations. for the formula will work with nrr?. set of paired scores. Note also scores, it is too complex for Spearrnan's rho to be used as a means of describing it.
Correlation Correlation

Table 1

Quality of Customer
Public house interaction assessment

Rose and Trilby 30 5


Dog and Pups 27 1
- A A Magnificent Motcl 5 0
(a) Arched relationsllip (b) U-shaped relationship llusty Plough 22 4
Blink'sBar 20 2
Figure 9. Examples of arched and U-shaped relationships
Bent WalkingStick 12 0
. King Canute 32 3
Non-linearity means that correlation techniques must be selected with care. The
easiest way of detecting it is usually by means of a scattergram, and s o we have an From the basis of a correlation coefficient, Spearman's rho, what conclusion do you think
excellent reason for inspecting the data in diagrammatic form fairly early in the the manager would reach?
proceedings. Maybe when you did exercise 1 you noticed how much easier it was to
see a relationship from a diagram rather than a collection of numbers. Pearson's product-moment
When describing a relationship between variables by means of a correlation, it is
necessary to add two pieces of information to the value of the coefficient obtained. I have already told you that one of the great assets of Spearman's rho is that it can be
One is the number of paired scores used in the calculations, and the second is the used to calculate a correlation coefficient on data which are originally expressed only
likelihood of your having obtained a relationship accidentally, from the occurrence in grades. Therein lies its weakness though, for when a technique operates o n grades
of chance factors. This last value is expressed by a probability, and I shall say more rather than more precise scores, the lack of precision means that detailed mathemat-
about this aspect in the final section of the chapter. Meanwhile, although the ical tricks cannot be called into play, and sometimes, the lack of detail means that a
operation schedules include steps for obtaining a probability after rho has been subtle, even if definite, association is not discerned. However, when pairs of
obtained, don't worry about it when you carry out the next exercises. variables are both measured by means of scales which use precise numbers (the
interval or ratio scales - like inches, minutes, grams), different methods of
Exercises calculating a correlation coefficient become available. Pearson's product-moment,
2 Calculate Spearman's rho for the four sets of scores given in exercise 1. giving the statistic r , is one such method, and steps for obtaining it are given in
3 The manager of Beastly Breweries has just read a book on interpersonal communication, schedule 15. Unfortunately, even with a calculator, it takes much longer to calculate
and he decides that maybe the recent decline in takings could be attributed to poor than Spearman's rho - although many would say that this disadvantage is compen-
staff-customer interaction. He devises a means of assessing quality of interaction (by such sated by the fact that it supplies a more precise coefficient. However, there are also
things as gaze avoidance, proximity, attention) on a scale ranging from 0 to 35. In addition, other snags attached to obtaining Pearson's r. Not only must we have scores in the
in those pubs in which he has spent some time rating the quality of interaction. he also asks form of precise numbers (and not, for instance, data from Likert scales), but certain
customers to describe the place on a six-point scale ranging from 'abominable' to 'superb'. other restrictions exist concerning the distributions of the scores involved. This is
He converts the verbal category to a number, and uses the median category obtained to
because the product-moment belongs t o the group of statistical operations known as
describe each pub. So finally, he has the pairs of scores givcn in table 1 at his disposal.
parametric techniques. More information on parametric techniques and tests, their
strengths, weaknesses and the restrictions concerning their use, will be found in
chapter 10.
Yet another drawback to Pearson's r is that it is not a meaningful figure if it has
been obtained from a set of scores which shows any curvilinear relationship
whatsoever. Thus if you are considering calculating a product-moment coefficient, it
is essential to draw a scattergram and make sure that the data fall into an
unambiguous linear pattern. Social scientists, who often work with variables which
are expressed in ranks o r grades along a continuum, and who often have cause to
suspect the linearity of their data, will find that Spearman's rho is not only the
correlation coefficient most appropriate for their data, but that it also gives a
perfectly satisfactory degree of association.
Correlation
Exercises
4 Use a scattergram to dccidc what kind of rclatlonshil, is sliown by Ihc lollowing pairs of co~icludcdthat the poor staff-customer intel-action was responsible for the decline in
scorcs. Then state whether Pearson's producr-moment or Spearman's rho would provide alcoliol salcs - even though it seemed a plausible explanation. Lct's look at some
the more appropriate correlation cocfficicnl. Inore situations taken fro~rr'real life', to sec how the common n~isconccptionthat
(a) VariablcA V.'~rlableB
', (I,) VariablcA corrclation can imply causation arises. The thirtl example given below shows that
Variablc B variables which can he associated statistic;~llymay not necessarily interact at all.
I 4 H I
3 5 7 3
5 8 5 7
5 II 3 II
8 12 I
15
11 14
4 4 (c) VariableA 'Variable B
7 "

Examplc 1: Length of education and income arc highly correlated


The correlation (which we assume to be a positive one) tells us that the more
education a person receives, then the more money that person will earn. It seems
(d) VariableA Variable B VariableA Variable B plausible to deduce from this that it is education itself which directly determines
(e)
1 1 1 2 income - but a moment's thought will reveal this to be an erroneous conclusion. It so
4 8 5 happens that in our society, people who have studied longest tend to have the better
3 6 4
6 paid jobs. However, it may be the responsibility attached to thejob whichdetermines
11 6 pay, rather than training, although undoubtedly the latter will help! Intelligence is
6 13 7
8 another factor to consider, particularly as our society tends to base its understanding
15 8
9 of intelligence on academic (and hence educational) criteria. The so-called clever
1 Variablc A person, good at exams, staying longer at school and then proceeding to university,
Variablc B
7
2 will tend to end up in a more highly paid job than a person who 'dropped out' of
9
2
5 3 school at the first possible opportunity. In our society, education does play a part in
6 determining income, but as this in turn is affected by such things as personal
9 4
5 circumstances, intelligence and luck, it cannot be seen as the single, or even major
10 7
7
II 10 determinant. The high correlation simply describes the relationship which exists at
5
12 14 the time of measurement between the two variables, education and income.
3
Example 2: TV vicwing and the birth ratc are negatively correlated
5 Calculate the Pearson product-momcnt r for the sets of scores in exercise I . This interesting phenomenon could generate some fascinating research. As an
example of the correlation trap it is a little more'transparent than the first one, for we
Correlation and causation would not seriously consider that mysterious rays emitted by television sets have
contraceptive properties! We might speculate though, about people's activities when
I t is commonly believed, when there is a strong corrclation between two variables, there is no T V available, and a massive bulge in the birth rate which occurred nine
and hence a high degree of association, that one of the variables causes the other. months after a wide-spread T V blackout in the States suggests that there is some
Don't let yourself be numbered among the many who fall into this trap! connection between the two variables, albeit an indirect one! This example was of a
ASSOCIATION D O E S NOT MEAN CAUSATION. negative correlation. A positive correlation which is frequently bandied around
In exercise 3, as you would see from the answer (you ltave done the exercise. concerns the association between violence on TV and levelsof aggression. It is quite
haven't you?), the manager of Beastly Breweries was quite mistaken when he clear that in the minds of many people there is a definite causal relationship here. In
fact tlie relationship may o r may not be direct, but it is certainly unlikely to turn out
133
Correlation Correlation

to be a simple one. A s with T V and the birth rate, the problem provides good grist
values of a pair of scores is atypical, then the position of the intersection between t h ~
for experimental work and research, but cannot be solved on the basis of strong
two values on the scattergram will be away from the general line of points, as
correlation alone.
illustratedin figure 10, When the sample under consideration is small, the existence
Example 3: The strong correlation between left- and right-arm length
Perfectly true, but an example in which you can see quite clearly that the two
variables d o not interact at all, but were both determined by other factors, such as
genetic make-up or diet, which affected both arms equally. You would laugh if
someone suggested to you that the length of your left arm determined the length of
your right arm. Yet this isexactly the kind of mistake people make when they impute
any degree of causation to a correlation o r association.
It might help you to avoid the trap if you remember the way in which a coefficient I r
L

A
is calculated. The two columns of paired scores could have been written either way
round for arithmetical purposes; there is absolutely nothing in the mathematical Figure 10. The effect of an outlier
procedure which can tell us that the values in one of the columns in some way give
of an outlier can have a very misleading effect on data interpretation. It might suggest
rise to corresponding values in the other. So we obtain no more than a precise
that the data are really curvilinear (and in which case the product-moment would be
description of the relationship which exists between two sets of numbers. Variables
an inappropriate technique to use), or. if included in the calculations, bring about a
might interact in a direct manner, there may be an indirect link (as for instance
small coefficient, when in fact a strong linear relationship exists. SO, should outliers
between ice cream sales and the number of fainting guardsmen, or as in the previous
be excluded from analysison the grounds that chance eventscaused an atypical score,
arm length example), o r there may be no connection at all between them, any
or included as valid data? There is no easy solution I'm afraid. If you exclude outliers
correlation obtained being coincidental. You must consider all these possibilities
you will be accused of 'fiddling' your results, but if you include them, you risk losing
when interpreting a coefficient.
a strong relationship which may really exist (in other words, you commit a Type I1
Exercise error). If a strong relationship does emerge when outliers are ignored, then you must
6 Examine the variables described in the followingcorrelational studies, and decide whether at least compensate for their omission from thecoefficient calculations by mentioning
they interact directly, indirectly,or not at all. such scores in your report, and giving an account of why you suspect each one arose.
(a) The positivecorrelation betweenpintsofbeersold in a puband temperaturecentigrade. As the number of scores in a sample increases, s o it becomes easier to judge whether
(b) The negative correlation between the number of people seen waiting at bus stops and outliers are either atypical scores, or indicative of a curvilinear relationship. If the
the number of buses in evidence. former, they can be included in the analysis, for as part of a large sample they will not
(c) The slight positive correlation between premature births and later school difficulties.
be able to exert as much influence on the coefficient obtained as they would if they
(d) The positive relationship between motorway accidents and aircraft accidents.
(e) The negative correlation between orders for fresh milk taken by the local dairy, and the were part of a small sample. Note: this is another reason why it is better to avoid
number of trifles appearing for sale in nearby shops. basine correlations on small samples, if possible.
(f) The negative correlation found between the growth rate ufa plant and the incidence of
car horn-blowingin the nearby streets.

Misleading coefficients
I have already told you that it is incorrect to use Pearson's product-moment on
curvilinear data, as it will give a misleading value of r. Of course there is absolutely
nothing to stop you taking any set of paired scores and obtaining a coefficient from
them, regardless of the shape they make on the scattergram. When you carry out the
calculations there is no built-in warning device which indicates that you are violating
assumptions, and that your final coefficient is misleading. It is up to you to know what
assumptions underlie each kind of coefficient, and to respect them. Two more data
patterns, besides linearity, provide aspects for consideration. One concerns oli~licrs
and the other partialsamples.
Outliers were mentioned in chapter 2, when we looked at them in connection with
dispersion. An outlier is an atypical extreme score. If, in correlation, one of the
Correlation correlation
Finally, we move on to partial sampling. I f a researcher undertaking correlational
connects points 1 and 2. 11 ;I perfect correlation cxisted bct\\ccn i r ~ c o ~; iI ~I Ic~sizc o f
studies chooses to work with scorcs taken fl-on1 ollly olle c21rd of a rangc of values
gardcn, then (INinstances plottcd on the scattergram would fill1 along the e d ~ of e the
obtainable from a particular variable, he o r slic risks failing to detcct an associ;~tion
ruler, thus forming a regression linc. [Jse i t now to find out \vhat garden sizc you
which is rcally present -and oncc again we see a Type 11 error. Thc cffcct o f t l ~ i kind
s
would anticipate for sonieonc known to cnrn f 10000 pa or f l 5 000 \)a; wliat incomc
of rcstriction of rangc is best shown by mcans ol';rn illustration. Figure 11(a) shows a
would you anticipate for a pcrson whose garden occupics h;~lfan acre? If there had
sci~ttcrof points for the full range of vari;thlc A plotted ng;iinst v;rri;il~lcD ,in which a
not been a l>erfcctcorrcl;~tionI~ct\vccngarden size ancl income, Out the points had
rough linc of corrclation can bc disccrncd. Clc;~rlythcrc is a fail-ly strong positive
fallcn into ;I band. like the one shown in figure 7(a), tllcri instead o f being nblc to
relationship bctwecn the two variablcs. Ilowcvcr, ifscores are taken only fro111t l ~ e
predict and sti~tc;I single figure for garden size o r inconic, \\.c could only gi\.e a rangc
11~irldleof the range of variable A , as shown in figurc l l(b), then cxrtc~ly1 1 1 ~satne
of values likely to include the point we arc estimating.
.scores give rise to the pattcrn shown by the dotted lines, which indicates to us that
In cducation, exam marks are frequently used to predict perforrn;~nceover a
thcre is n o correlation bctwccn the two variablcs.
period of time. For instance A-level results can be linked with performance at
university; indeed thc very system of making university entrance dependent upon
A-level grades assumes that there is adistinct relationship bet\veen the two: students
who d o well at the A-level hurdlcs are considered likely to succeed in the university
stakes.
CorreIation and prcdiction are often used by psychologists in connection with
personality and intelligence tests. The relationship between different test results
might be examined, o r between a test and some aspect of behaviour, education or a
disorder. For instance extraverts (people who get high scores on paper and pencil
tests designed to measure the 'sociabiIity' aspect of personaIity) have been shown to
(a) Variable A (b) Variable A take longer to acquire conditioned responses than intro\.erts. A psychologist,
Figure I I. knowing a person's extraversion score, might be able to predict the conditioning rate
for that individual, or vice versa -predict the person's degree of extraversion from
On the other hand, if a researcher includes data from both extremes of a
performance during conditioning procedures.
continuum, and ignores the intermediate range of scores, than a corrclation Still on the subject of measuring personality, and by means of paper and pencil
coefficient will bc obtained which is higher than that which would havc been obtained tests, psychologists need to know whether a tcst which has been developed for a
from a complete set of data. This kind of misleading coefficient provides an excellent special purpose can bc regarded as reliable. A reliable test is one in which the scores
cxample of how to 'lie' with statistics! obtained by subjects are known to be consistent, and unlikely to change because of
factors which are not connectcd with the test procedure. Assessing the reliability of
tests is quite difficult for a number of methodological reasons, but all the methods
Correlation and prediction involve carrying out correlations. I f a high positive correlation is obtained, then the
test under scrutiny can be regarded as reliable. Tests are also normally assessed for
As correlation coefficients describe the relationship between two variables, then,
their 1-alidiry.A valid tcst is one which measures what it is supposed to measure, and
regardless of whether o r not one causes the othcr, we are able to use the technique to
not something else! Again, correlation is the statistical technique which provides the
make predictions about scores. Suppose that there is a strong positive correlation
backbone for this aspect of psychological tcst research and development. If
between students' exam marks i n chemistry and physics. The coefficient tells us that
intelligence were found to be directly related to hcad size, then to assess it by usinga
if studcnts d o well on one of the papers, then thcy will also d o well on the other. The
tape measure, and giving a ratingin termsof unitsoflcngth \\.ould bc quite sufficient,
highcr the coefficient is, then the more points on the scattergram will lie along a
and we would have a perfcctly valid tcst of intclligcncc. Unfortunately. such a
straight line, father than form an ellipse. I mentioned earlier that whcn points only
correlation docs not appear to exist (perhaps the sayings 'too big for your boots',
roughly form a linc it is best toomit it rather than guess whcrc it lies. You can find out
'bossy boots' and 'clever clogs' indicate that we should conccntratc on foot size,
precisely at which anglc and whcre it lics by carrying out the appropriate niathcmat-
instead!), and as psychologistsdisagrce over what intclligcnce rrcilly is. wc don't havc
ical procedures; the line of bcst fit calculated in this way is also called a regrcs.sio~ililtc,
any tcst which at prescnt is agrccd on bp one and all to hc a valid mcasurc of
and it will pass through o r close to the maximum number of points plottcd. It is by
intelligence. Scores on IQ tests might correlate well with such things as subsequent
using such a line that we can find out what valucs of one variablc will accompany
exam performance. status and inconie, but this is probably because thcy all involve
certain values on the other, and the more clearly a line is formed by thc scorcs (i.c.
the same nient;~lskills. i.e. they arc correlated hcc;\use they ;III have common
thc highcr thc corrclation coefficient), the more precise we can bc about cxtrapol;~t-
elements.
ing from one variablc to thc othcr. Turn back to figure 3, and place a rulcr s o that it
The statihtical technique of corrclation is used for prcdiction in fields as diverse as
137
Correlation Correlation
economics, geography, town plannilig. sociology. medical and physical sciences, -
ordinary brothers and sisters, parents and children - when thc extent of diffcr~ng
;~rchaeology,biology and linguistics. hereditary components can be scrutinised. Correlational studies have also provided
the basis for many studies aimed at investigating the extent to which intelligence is
acquired through hereditary factors, rather than as a result of early training or
exposure to a stimulating environment. It is mainly because ethical consideration5
prevent direct manipulation of children and environments to produce dullards and
geniuses that correlational studies - in which existing relationships are assessed -
predominate. So although asingle correlational study cannot throw any light on \vhat
causes what, several correlational studies built around the various variables thought
to be involved can comprise a satisfactory experimental programme of intestigation.

Probability and correlation


Much earlier in this chapter I told you that after the numerical value of a correlation
Correlation in experimental work has been determined, it is normal to state the number of pairs which were used in the
calculations, and to give a probability value to your coefficient. These steps are
So far I have only discussed correlation as a descriptive technique, although several
carried out so that we (and other people) can know just how much confidence can be
of the examples I have mentioned make it clear that a strong correlation is not simply
placed in a particular coefficient. It is fairly obvious, if a description is based o n only
a result of coincidence, but that either direct o r indirect links exist between the five numbers, that such a small sample is not likely to be representative of the
variables. I have also emphasised that the existence of an association cannot be
populations from which they were drawn, and that chance events, if they have
interpreted as implying a causal relationship -and I would now like to consider the
occurred, will have a big influence on the evaluation. If something is coincidentally
extent to which correlation can be used in experimental work, when the whole aim is
related in one pair of readings out of five, this comprises a fifth of the total sample,
to discover laws of cause and effect. At the start of this chapter I quoted the example whereas the same coincidence would only affect one-hundredth of the sample if there
of the link between cancer and smoking. Another example is the possible link
had been a hundred pairs of measures. A simple practical exercise will give you good
between dietary cholesterol and heart disease, and this provides an interesting
insight into the role of chance in correlation.
example of the way in which correlation can be used in research. Although it appears Take a piece of paper and tear it into eight pieces, each piece about the size of a bus
that there is an association between heart disease and the fatty deposits o n the walls ticket. Number the pieces 1 to 4, twice. You should now have four sets of paired
of blood vessels, and the latter are connected with the type of fat included in the diet, numbers. Divide the papers into two sets, each containing four different numbers,
researchers are plagued by the fact that despite a general positive correlation, there turn them over and shuffle them. Now take a clean piece of paper and head two
are always individuals who must be noted as outstanding exceptions. These columns A and B. Without looking at the numbers on the papers, draw one from the
exceptions illustrate the complexity of the matter and show that the association first pile, write its number down bn column A , and then draw a number from the
between diet and heart disease is not completely straightforward. However, in recent second pile. This number will go into the corresponding B column. D o this three
studies of dietary habits carried o u t on a wide range of human groups, the times more, without puttingany numbers back, taking numbers from alternate piles,
relationship between animal fat consumption and incidence of heart disease for and pairing them off in the two columns. Now pretend that these numbers are
societies take11a s a whole turns out to be an extremely strong positive correlation. So material for a correlation coefficient, and inspect them. Their order may be quite
even though it is still impossible to make a detailed prediction for any individual, our haphazard -without any discernible pattern-or there might be evidenceof some sort
knowledge of existing associations means that we can predict trends within given of trend in the two columns (either separately, o r looked at in conjunction with each
societies. other). You know yourself that the numbers were selected at random, and that there
Much of the controversy surrounding the issues of diet, smoking and diseases
was no link of any sort between the first and second numbers of each pair, yet there
arises from the very fact that the evidence has been largely drawn from correlation
is a reasonable chance that just by coincidence you have obtained a good positive o r
studies, so that the argument can always be advanced that other factors, such as
negative correlation. The order in which the pairs of numbers emerged would not
personality o r physical make-up, are really the cause of the disorder. However,
matter, but the pairs themselves would have to be either 414, 313, 2/2 and 111 for a
experiments can be designed in which the proposed linking factors are subject to
perfect positive correlation, o r 4/1,3/2,2/3 and 114 for a negative one. There is just a
scrutiny, and, still by means of correlational techniques, the various relationships
very slight chance that one of these sets of pairing may have occurred. On a slightly
become better understood. An improved understanding of the role of hereditary
bigger scale then, perhaps you can see that positive and negative correlations can be
factors in schizophrenia has come about partly through correlational studies of the
partially determined by chance factors, and that some kind of allowance must be
incidence of this mental disorder in related people -identical twins, fraternal twins,
made for this. The larger the sample, the less likely it is that a positive o r negative
Correlation

16 In the lost onolysis ...


correlation' will be attributable to such chance elements. Fortunately. stating the
chance element which will be associated with any particular correlation coefficient
derived from variously sized sets of scores is quitc simplc - a matter of looking up the
probability values on the appropriate tablc. In this book, tablc S7 is for use with
Spearrnan's rho, and table S8 for Pcarson's r. Thc only information you need in order
to ascertain the possible role of chance factors (called thc levelofsigr~ificarlce)for a
stated coefficient value is the number of pairs of scorcs making up a sample. The
minimum level of significance which is acccptablc, before an association can be
regarded as a 'real' one, is the one in twenty, o r 5% level, which was mcntioncd in
connection with the acceptability of statistical test results in chapter 8. Exactly the
same levels apply, the one in a hundred (1% o r 0.01) and one in a thousand (0. or
0.001) levels being regarded as stronger evidence of a relatimship than the 5 % (0.05)
level.
In this chapter I shall provide some hints for writing experimental reports which will
In addition, as in looking for a difference between two samples and using a test for
be particularly useful to 'social science students. Then we move on to material of a
analysis, in correlation we also need to state in advance whether we are establishing
more light-hearted nature, which 1 imagine you will b e able to appreciate without
a one- o r two-tailed hypothesis (see chapter 7). If we predict either a positive or a
having read the entire book beforehand. I hope it provides evidence that people can
negative correlation, our prediction is directional, i.e. one-tailed; if we are more emerge from scientific and statistics courses with unbroken spirits!
vague, and merely say we expect a significant correlation in eitherdirection, then our
hypothesis is two-tailed. Needless to say two-tailed hypotheses are relatively rare in General guidelines for report writing
correlational studies!
There are really only four vital questions to bear in mind:
Exercises
WHY?
7 Give one-tailed probability levels for the following values of Spearman's rho. HOW?
(a) N = 7,r, = 0.84 (b) N =9,R, = -0.6 (c) N = 10,r, = 0.65 WHAT?
(d) N = 11, r, = -0.91 (Take the value as lying be~weenthose for N = 10and N = 12.) SO WHAT?
(e) N = 20, r, = 0.58 (f) N = 28, r, = 0.30
8 Using Pearson's r , and one-tailed values, what coefficientswould be rdquired for significance The key to successful report writing lies in answering them in an appropriate manner.
at the0.05 level when (a) N - 2 = 8, (b) N - 2 = 6, (c) N - 2 = 30, (d) N - 2 = IS, and The first point to bear in mind is that there is no single format which is regarded as
at the 0.01 level with Spearman's rho when (c) N = 5, (f) N = 8, and (g) N = 20? 'correct' by everyone. Rather there are many variations, and almost everyone
(including your teacher or lecturer) has his o r her favourite.
It is through writing and presenting reports that most scientific communication
takes place, and s o it is important that you learn to deal adequately with this aspect
of your course. Listed below are some general guidelines on writing style.
1 Use complete sentences in reports, and d o not present material in note form.
2 Avoid using the first person (either singular or plural) in your writing.
3 Although the convention has, until recently, been to use It-.: masculine pronoun in
writing about a person of unspecified sex, it is increasingly acceptable (and
desirable) to either include the female pronoun and its derivatives, orwrite in a way
which avoids rnentioninggender. For instance, 'When the subject arrived, he was
asked to . . .' could become either 'When subjects arrived, thcy were asked
to . . .' or, 'On arrival, subjects were asked to . . .'.
4 Avoid using slang expressions.
5 It is quite acceptable to use the abbreviations S, E, DV and N for subject,
experimenter, dependent variable and independent variable respectively. .
Always make the attempt to write up your report assoon as possible after you have
conducted the work - and certainly within days, rather than weeks. Otherwise you
will confirm for yourself the amazing rapidity with which unrehearsed and un-
organized material is forgotten.
141
111 the last analysis .. . In the last analysis . . .
Sections of reports ( C ) Procedure
I Title Describe what happened, very carefully, and - note this- what actually happened,
This should indicate in a precise manner the nature of the topic being investigated. not what should have happened! You should provide enough information for a
Take care over the detail o f whether you call the report an 'experiment', 'survey' or reader to be able to go away and carry out an exact replication of your experiment.
just an 'investigation', for c:~chterm implies a slightly different kind of study. If the experiment included specificinstructions given to the subjects, then put these
in thissection, as a direct quotation. If the instructions are very long, then write them
2 Abstract (or summary) out on a separate sheet of paper, which is headed 'Appendix I. Instructions to
Although it is rather inconvenient when you are actually writing your report, 'the subjects'and put this right at the back of the report. In the'Procedure'section, statc
custom established for papers publishcd in academic journals is that the summary 'The exact instructions which were given to subjects are included in Appendix I.' An
appears at the beginning! It is done s o that potentially interested readers can quickly appendix is an extra bit, providing information which not everyone might want, but
discover whether they wish to read the main body of the article, without having to which would certainly be needed by someone who took a serious interest in your
wade through it all, looking fora summary towards theend. Itshould'comprise avery findings, and perhaps wished to replicate the experiment precisely.
brief statement - no longer than 100 words -of what you did, the method you used, If the seating arrangement of the experimenter and subject(s), o r subject and
and what you found. apparatus, is not straightfonvard, use diagrams to convey the information. Such
diagrams d o not need t o be backed up by long verbal descriptions.
3 Introduction
( d ) Experimental design
In this section, attempt to answer the following questions:
Make a formal statement of the design used in the experiment, and the statistical
(a) What is the general nature of the problem?
analysis which is planned. If you decided beforehand which level of significance you
(b) What have other people had to say about the problem, and why are their
would accept as the minimum, then give it here. There should also be reference toall
findings open to doubt, ambiguous, o r in some way inadequate?
procedures used to identify, isolate o r control all variables considered relevant to the
(c) Why is this particular experiment being undertaken?
study.
(d) What is your experimental hypothesis-and is it one-or two-tailed? If you like,
you can also state the null hypothesis at this point. 5 Results
4 Method This section should contain a condensed version of the data you obtained in your
experiment. The original, o r 'raw' data are those observations which you made
Use the past tense throughout this section. There are several sub-headings which are
during the experiment, and will only b e included in this section in unchanged form if
commonly used in the 'Method' section. These include:
they are fairly brief. T h e reader wants to see fairly swiftly what happened, and s o it
(a) Subjects is the usual practice to put raw data into an appendix at the back, and to include only
Mention the number of subjects who participated - o r 'ran'- in the experiment, their 'derived' data - such as descriptive summaries, means, standard deviations, etc.,
sex, ages and general educational background o r occupations. Include any other often in the form of a table, in this section. Derived data include figures, graphs,
relevant information concerning the subjects here. Were the subjects volunteers, o r statistics and summaries compiled from the raw data. Any o r all of these can be given
selected according to some criteria? in this section, but the main criterion to be met is that a reader should be able to
(b) Apparatus understand quickly what the results were, without having to read any other part of
If your apparatus has a recognised technical name, state it, and give a very brief the report.
description of its purpose; if not, describe i t as accurately as possible, using diagrams It is not correct to include statistical calculations, formulae o r details of computa-
where necessary. tion in this section. Again, all this type of clutter should be tucked away in an
appendix at the back, where it is available, should a reader wish to refer to it.
It is quite common, and acceptable, to open your 'Results' section with the
following statements . . .
'Raw data are included in Appendix I, and statistical calculations in Appendix 11. A
summary of the mean reaction times obtained from subjects in the experimental and
control groups is shown below in table 1. The difference in reaction times between the
two groups was significant at the 0.05 level (r = 2.56; d j = 10; independent test,
two-tailed hypothesis) and the null hypothesis was rejected. It wasconcluded that . . .'
All tables, figures and graphs should have a title, and in thecase of diagrams, there
should be labels so that the information they convey can be quickly assimilated by the
reader. ~ a k e ' c a r eover the detail of whether a figure is a graph, histogram,
scattergram o r whatever - they are all slightly different.
143
In the last analysis. c.
In tliqlasl analysis . . .

Although brcvity is thc kcynotc tllroughout this scction-, take cart to include rrll
rcsults (summarised if ncccssary) tcr which rcfcrcncc is madc in thc nest scctiol~- thc
'1)iscussion'.
6 Discussion
llsc the past/prescnt/future tenses as appropri:itc.
Thc first question which nceds;~nswcringi s ' l ' o w h ; ~cxtcnt
~ tlo your rcs~rltssupport
thc cxpcrinicntal hypothesis?' I t is norm;ll tc~start this scction off I)y rcpcating tlic
main 1~rr.11~11 findings and conclusions which you g;~vcin thc 'Kcsults'. I ~ u crirlitting
l
tict;~ilso f tlic statistical test used. statistic v;~lucand (If. D Dyour
~ findings suggest any
furthcr hyl~othcseswhich could be investigated cxperificntally'? Arc thcre any faults
in thc dcsign of the cxpcrilncnt (which werc not evident at the dcsign stagc) which
could or shoul(l be eliminated?
The second question to deal with. if your cxpcrimcntal hypothcsis was not
supported, is 'Why not?' In this section you can suggest reasons as to why the
cxpcrimcnt didn't work, and again, suggrst outlines for future cxperimcnts which
might clear the problcm up.
T o conclude, and possibly in the form of a brief 'Conclusions' subsection, which
must not be confused with the 'Summary' o r 'Abstract', make an attempt to provide
;in answer to thc original problem which was described in the 'Introduction', and spell
out the ways in which your experiment and findings have moved our knowledge along
from that point.
In thc 'Discussion' scction i t is quitc acceptable to present conclusions which you
k ~ v drawn
c which wcre ,101 anticipated in theoriginal formulation of the problcrn and
its likcly outcomes.
7 References
Give detailed rcfcrences of all work you mention in your report. There are several
standard ways to present references, and I am giving just one of them below. If you
mention 'Watson (1913)'somewhere in the body of your rcport (or in an essay), then
the rcfcrencc should appear in the correct place in an alphabetical listing:
Watson, J. B. Psychology as a behaviorist views it. Psychol. Rev., 1913, 20,
. .
158-177
and for the book 'Woodworth and Sheehan (1964)'.

D
Woodworth. R. S., and Sheehan, M. R. Conren~poraryschools of psychology.
(3rd ed.) New York: Ronald Press, 1964. B faocss zf=x3
Whichcvcr systcm of rcfcrcncing you usc, stick to it throughout thc whole list. If f'N REQff0Uf'-
you quote the results of a survey or experiment from a tcxt book, rather than from PROCESS n z F 2 P47
thc original article, then list the name of the investigator, but follow it with the source N0.16 -
of your information: PfiT RAT R A T R f S T f i K f
Goldstein (1957), as reported in Hilgard, E. R., Atkinson, R. C., and Atkinson,
R. L. It~troducriot~to Psychology. (6th ed.) New York: Harcourt, Brace,
Jovanovich, 1975.
Although references seem unnecessary, and are a bore to compile and write out, it
is useful to acquire the habit of keeping tabs on information you refer to fairly early
in your scientific career. Also, when it comes to revision time, you may well need to
find the sources of your material quite rapidly, and it iscxtremely frustrating to have
to wastc time searching out long-forgottcn items.
144
In the last analysis. .. In the last analysis. ..
The language of report writing The resultsshow a clear trend in the o u r supply of subjects driedup,
predicted direction which might be or,
Now that-we have dealt with the structure of scientific reports, it is time to examine
expected toreach significance with a I couldn't be bothered to run any more.
in more detail the language of which they are composed. The following phrases are
examples of technical vocabulary commonly used in reports - together with their larger sample.
'real-life' equivalents. Nonparametric statistics were felt to be We couldn't interpret the complex
highly appropriate for analysingthese patterns ofstatistics whichemerged
data. from a computer-run parametric
Introduction analysis.

It isgenerally agreed that . . . We decidedduring the coffee break. .. Discussion


It haslong been known that. . . I have lost the original reference! T h e results corresponded well with Whilst I was running the experiment I
theoretical predictions.. worked out what sounded like good
It would appear to b e consistent with My excuse for doing this work is. . .
previously reported findings to predictions.
speculate that. . . It might bearguedthat . ... I have a really good answer for this
In a pilot study. . . We started to carry out the experiment, criticism, soshall raiseit now.
but it was adisaster, andso we were Further workis required toelucidate I don't understand the results of the
forced to make a fresh start. this finding. experiment.
Asurveyshowed that. .. I asked a coupleof the secretaries what (Personal communication) (according to this chap I met in the
their views were. pub. . .)
It is hoped that this study will stimulate That isabout theonly respect inwhich
Method
furtherwork in the field. this report might be useful!
The experimental animals were The rats all escaped from their cages The whole area is rubbish, but I am
randomised. These phenomena would seem to be
one day. committed toit for two more years.
worthy of further investigation.
The procedure used was based on that We have just pinched Smith's idea.
first developed by Smith (1968). Acknowledgements
Subjectswere permitted to familiarise It took ages, and many trials, before we I would like to acknowledge the Bill Smith did all the work, andBloggs
themselveswith the task. realised that the subjects hadn't assistanceof Bill Smith, and valuable explained what it meant.
understood the instructions properly. advice from DrBloggs.
Subjects were trained to the appropriate I can't remember exactly what the . . . andmy wife for her speedy and I had to have the Department Secretary
criterion. subjects had to do. efficient typing. retype her manuscript.
Results
Some subjects were found to have Some subjects went to sleep during this
unexpectedly long reaction times. long and boring experiment.
Computer analysisof the data was I haven't the faintest idea what he does
conducted by D r Bloggs. with my data, but I don't ask questions!
The resultsshownin table 2 were I fiddled my results.
adjusted toallow for variations arising
from errors in the sampling technique.
However, when plotted on a log scale, The last resort in order to present
theeffect is apparent. hopelessdata in an impressive way.
. . .was nevertheless significant. . . .was the only minor deviation from
complete randomnessin the entire
experiment. - .
Typical resultsare shown. The best resultsareshown.
111tlie last ai~alysis. . In the last analysis. ..
Some observations on the diseases of ~ r h n u sedwardii (species
nova)
These extracts are taken from a paper which appeared in Tire Vererirlary liecord, 1
April 1972. My thanks go to thc editor of the journal for kindly agrecing to my
reproducing part of the paper in this book. The authors of the paper are D. K.
Blackmore. D. G . Owen andC. M. Young. Notice how only the most formal style of
writing is used throughout thc paper, as iscorrect for scientific prose.

The correct specificand generic terminology for Brur~icsedwardiiisdiscussed, and the


results given of a survey involving 1599 complete specimens and 539 miscellaneous Case I. Torticollis and loss of limb
appendages. These results indicate that primary infectious agents do not occur, and
that the species is safc for children to handle. Suggestions as to thc future role of the careful positioning of the rcplaccment so that the acoustic membrane faced ventrally
profession in relation t o thisspecies are made. to prevent the development of muffled speech.
Introduction Case 2. A young bear owned by a child of six months was found to be suffering from
'soggy ear' when ~ e m o v e dfrom the owner's cot one morning. Oedema of the pinna
For more than a century, the species Brurlus edwardii has been commonly kept in
was a commonly occurring condition in bears belonging to children under 18 months
homes in the UK and other countries in Europe and North America. Although there
age, who slept with an ear clamped firmly in their mouths. Treatment consisted in
have been numerous publications concerning the behaviour of individuals (Milne,
removal from the owner, lavage and drying in an airing cupboard.
1924, 1928; Daily Express (numerouseditions)), there have been noseriousscientific
contributions, and a careful search of the I~terature,using abstracting journals and Case 3. A ten-year-old bear, which had been owned successively by three siblings.
computerised data retrieval systems, has failed to reveal any comprehensive survey The normal yellow coat colour had changed to a dirty grey, there was extensive
of the diseases of these creatures. A few of the previous publications include alopecia which had progressed t o 'threadbareness' over the ears, nose and limb
references to certain disease syndromes, and Milne (1928) refers to obesity extremeties. The axillary and inguinal seams were weak, resulting in an intermittent
associated with the excessive intake of honey, and to psychological disturbances dislocation of limbs, but there was no herniation of stuffing. Old age, and persistent
associated with territorial disputes with Tiggers, Heffalumps and even small handling with transport by one limb were the main reasons for the chronic debility.
children. One publication (Bond, 1958) concerning a certain individual known as for which there is no satisfactory treatment.
Paddington, refers to the animal receiving treatment from medical practitioners
without a veterinary qualification. These records emphasise two disturbing faclors,
firstly, the obvious need for treatment of diseased individuals, and secondly, the
infringement of the Veterinary Surgeon's Act of 1966 that would presumably be
involved if such animals were treated by any person not on the Veterinary Register.
Commonly-found syndromes included coagulation and clumping of stuffing,
resulting in conditions similar to those decribed as 'bumblefoot' and ventral rupture
in the pig and cow respectively, alopecia, and ocular conditions which varied from
mild squint to intermittent nystagmus and luxation of the eyeball. Micropth;~lmus
and macropthalmus were frequently recorded in animals which had rcccived
unsuitable ocular prostheses.
The following case notes illustrate the complexity of both the causes and resulting
manifestationsof disease in thespecies.
Case 3. Alopecia, discoloration
Cu.\e I.A six-month-old bear, owned by a four-year-old male, was found to bc
suffering from acutc dyslalia, torticollis and loss of one lower limb. The general
condition of thc animal was good, with a normal pelage. The injury had been the Case 4. A sixteen-year-old bcar, with an asymmetrical cxprcssion and obvious
result of disputed ownership. The dumbness was the result of a ruptured acoustic emotional disturbance, was found at the back of a cupboard. After the removal of
membrane, and complete renewal of the voice box wa:, ncccssary. This involved superficial dust, the coat condition was secn to be good, but the animal had a
laparotomy, rcmoval of the damaged organ from the surrounding viscera, and permanent squint. due to carclcss replaccnlcnt of thc right cyc with a shoc button.
139
In the last analysis. .. In the last analysis . . .

Case 4.Lopsided squint A case of emotional disturbance,


hypertension

implications of this fact are obvious, and it is imperative that more be known about
their diseases, particularly zoonoses o r other conditions which might be associated
Tracing of the case history revealed that this bear had suffered recurrent ocular
with their close contact with man . . .
prolapse, which had progressed to total rupture of the filamentous orbital attach- The importance of avoiding the use of colloquial names in scientific contributions
ments, and the loss of the eye. It was hoped that a new owner might be found for this
has been stressed by Keymer er al. (1969), but previous publications have apparently
animal, and that with newly-matched eyes, his expression and psychological state
used the term 'teddy bear'. Preliminary studies have suggested that this term might
might improve. include several different strains, if not species. However, it was found that teddy
Case 5. An aged, cobweb-covered bear, found in an attic. Its general condition was bears will accept cutaneous, and even limb grafts from other bears without showing
poor, with loss of forelimb, and hernation of stuffing. The frontal seam was ruptured, signs of rejection. These findings suggest that all teddy bears are genetically
exposing a rusted voice box with helical weakness. The animal was heavily infested homozygous and of the same species. W e therefore consider the correct generic and
with commensals, which included a pair of Mus musculus with two generations of specific terminology to be Brunus edwardii.
young, a total of 23 individuals. . .Treatment of this case included vigorous shaking,
dusting with pyrethrum, a stuffing transfusion, and a forelimb graft. Materials and methods
( i ) Source of material for survey. A total of 1598 specimens of Brunus edwardii was
examined. Of the 1600 owners approached, 1599 agreed to examination of their bear.
and the majority were able to provide a comprehensive case history. One specimen
was eventually unavailable, as it was in quarantine because its owner was affected
with rubella.
A further 539 miscellaneous appendages were made available for examination by
nurseries, schools and children's hospitals in the London area. These specimens were
in a dilapidated condition, but careful grafting restored 136 intact bears, with only
one surplus ear, which has been stored in liquid nitrogen for future use.
( i i ) Examination technique. Examinations were carried out as quickly as possible,
because many owners were reluctant to be parted from their bears for very long. NO
restraint was necessary, as the bears showed no apprehension and were obviously
used to being handled. A n attempt was made to record body temperature, but this
was abandoned, as all specimens appeared to be homoiothermic. Each bear was
Case 5.Anic bear and mice given a thorough external examination, and data were collected on approximate age.
weight, condition and colour of coat and physical disabilities. Stuffing and internal
condition were assessed by careful palpation. Where necessary, radiographs were
taken, and biopsies obtained to identify the stuffing material. Sub-cutaneous and
deeper tissues often protruded from superficial abrasions, and where necessary, a
Pet ownership surveys have shown that 63.8% householdsare inhabited by one or small seam incision was made, a sample taken, and the opening sutured with Coates
more of these animals, and that there is astatistically significant relationship between Machine twist 30, using a standard Millwards darning needle. Voice boxes, where
their population and the number of children in the household. The public health present, were tested by percussion and auscultation.
150
In the last analysis. ..
The psychological state of the bear was assessed by thc facial expression, and also
by investigating thc case history with special rcfcrence to the frcquency and duration Operation rrheduler
of association with children.
Results of the survey
Classification of thc results of thc survey carricd out on 1598 intact spcciniens, plus
miscellaneous appendages was attcmpted, but almost all cases wcre of multi-factorial
aetiology, and it was impossible to determine thc primary agent. Similar lesjons
appcared at many sites, making systcmatic tabulation of results impracticable. No
primary pathogens were isolated, and the predominant cause of pathological change
was cxtcrnal mechanical trauma, which was cithcr scvcre and suddcn in onset, caus-
ing loss of limbs and appendagcs, or morc insidious, giving rise tochronic wear and tear.
Commonly-found syndromes included coagulation and clumping of stuffing,
resulting in conditionssimilar to those described as bumblefoot and ventral rupture.
Operation schedule 1. The mean

.
Discussion
Data requirements
This survcy has revealed many facts of interest to both the comparative pathologist Numbersused tocalculate a mean must be of at least intervalstatus (see chapter 10).
and the clinician. It is with considerable relief that it can be recorded that Brunus
edwardii appears to be resistant to any pathogenic organisms and cannot, therefore Example
be affected by any zoonotic condition. However, this species can be involved in a Six rose bushes bear the following numbers of flowers: 5, 26, 13, 12, 19, 21. The mean
variety of commensal relationships, as was described in Case 5 . . . number of flowers per bush will be calculated.
Teddy bears can act as transitory mechanical vectors of human pathogens.
Step I : List the numbers in a vertical column.
Although superficial contamination with rubella virus has no direct effect on the
5
bcar, the unskilled treatment of carrier teddies can result in serious secondary
26
disease. Examples found included a singed integument caused by over-heating 13
during decontamination in a domestic oven, and coat discolouration due to
treatment with an unsuitable disinfectant.
True diseases of Brunus edwardii can therefore be classified as either traumatic or
emotional. Acute traumatic conditions, characterised by loss of appendagcs, are Step 2: Add the numbers together.
often the result of disputed ownership. Chronic traumatic conditions are usually
associated with normal wear and tear, and are not necessarily detrimental, as there
appears to be a statistical relationship between the presence of such lesions, the lack Step 3: Count the number of items making up the list, to obtain N.
of cmotional disturbance, and the affection given by the owner. The list comprises 6 itcms; N = 6.
Emotional disturbances are eithcr apparent or inapparent. Apparent emotional Step 4: Divide the total of Step 2 by the value of N fouhd in Step 3.
disturbances are recognised by changes in facial expression, and in almost all cases
the condition is the result of unskilled remedial surgery. Inapparent cmotional 6 - 16
96
A -

disturbances are not fully understood, but scem to be related to thc fact that an The mean number of fiowcrs pcr bush is 16.
unloved teddy is an unhappy teddy. Few adults (cxcept perhaps the present authors)
have any real affection for the spccics, and as children mature, thcir teddy bears may Abbreviations
be neglected and relegated to an attic or cupboard, where severe emotional The formula for calculating the mean is
disturbances develop. ZX
Thc authors consider it significant that B. edwardii appears to bc classless in both -
N '
the taxonomic and socio-economic sense.
where X = the individual score, N = the total number of scores.
References I: = the sum of (pronounced 'sigma'),
Bond, M. (1958). A bearcrrlled Padrlir~grori.Collins. London. The symbol X(pron0unced 'bar X') isoften used to represent the mean. If the means
Milne. A. A . (1924). Wl~er~we were ~~eryyo~trig. Mcthucn & Co. Ltd. London. of lists which had bcen labelled A or Y had been found, then the means would be
Milnc. A. A . (1928). Tlrel~ort.sen!Pool1 Corrier. Mcthucn & Co. Ltd. London.
denoted by the symbols A and Y , respectively.
153
Operation schedules 4 The standard deviation

Operation schedule 2. The median smaller from the larger. Record a value of 0 when a score has the same value as rlls
Data reqt~irements mean.
16-5=11 16-13=3 19-16=3
Can be used with all types of numbers, except those of nominal status. 2 6 - 1 6 = 10 16-12=4 21 - 1 6 = 5
Example Step 4: Add together the differences found in Step 3.
The median of [he numbers 13.21, 12,4,26, 19. 11+10+3+4+3+5=36
Step I: List all the scores in ascending order, in a straight line. Step 5: Divide the value obtained in Step 4 by the value found in Step I.
4 12 13 19 21 26
-
36 - 6
Step 2: Count the number of items making up the list to determine where the 6
mid-point will lie. It will lie on a whole number if the list comprises an odd number of The mean devialion for the set of scores is 6.
items, and something-and-a-half if the list is even-numbered. Formula
.There are 6 items on the list; the mid-point of 6 is 3;.
Odd-numbered lists only
N
Step 3: Count along the items from the left-hand side until you arrive at the number II = find the difference
lying in the position found to be at the mid-point in Step 2. This is the median. where X = the individual score
X = the mean for all the scores X '= the sum of
Even-numbered lists only
Step 3: Count along the list until you arrive at the item which is in the position of the N = the number of scores in the list
number with the !added on.
4 12 13
Operation schedule 4. The standard deviation
13 is in the third position. Data requirements
Step 4: Take the value arrived at in Step 3, and add to it the value of the next number Scores must be o f at least interval status.
to the right from the original list. Add the two numbers together and divide by two. Example
This gives the median. The standard deviation of the numbers 18,20,22,24,26.
Step l : Count the number of items on the list, to obtain N.
There are 5 scores making up the list, and so N = 5.
Check: If Steps 3 and 4 are re-calculated, but the counting started on the right-hand
Sfep 2: Find the mean for the scores.
side rather than the left, the median should have the same value.

Abbreviations
As the median is obtained by counting, no symbols are used to describe its Step 3: Subtract the mean (Step 2) from every observation.
computation. The symbol for the median itself is Md. 1 8 - 2 2 = -4 2 2 - 2 2 = 0 2 6 - 2 2 = + 4
20-22=-2 24-22=+2
Step 4: Square each of the differences obtained in Step 3.
Operation schedule 3. The mean deviation
(-4)' = 16, (-2)' = 4, (0)' = 0, (+2)' = 4, (+4)* = 16
Data requirements Step 5: Obtain the sum of the squares by adding together all the squared differences
Scores must be of at least interval status. obtained in Step 4.
1 6 + 4 + 0 + 4 + 16=40
Example
Step 6: Subtract 1 from the number of observations in the group (Step I).
The mean deviation of the numbers 5,26,13, 12, 19.21
5-1=4
Step l: Count the number of items on the list, to obtain N.
Step 7: Divide the sum of squares (Step 5) by the value found in Step 6, to obtain the
There are 6 items on the lisl, and so N = 6. variance estimate.
Sfep2: Find the mean of the numbers.
-
40-- 10
5 + 2 6 + . . . + 2 1 -96
---- 16 4
6 6 Step 8: Take the square root of the variance estimate, found in Step 7.
Siep 3: compa;e each score with the mean. Subtract whichever value of the two is d l 0 = 3.16
Operation schedules 6 flow to rank sets of scorcs

Note
population rather than a sample. This gives the variance estimatc o r thc variatlce
T h e s e steps a r e the ones normally followed in obtaining a standard deviation when a respectively.
sattlpleis used. They will give a slightly larger value than that obtained by the method
appropriate for a complete population, a n d which is rarely used. T o obtain the
standard deviation of a complete population, instead of subtracting the value of 1
Step 9: T a k e the square root of the value obtained in Step 8.
from the number of items o n the list (Step 6 above), you simply divide the sum of
v 1 0 = 3.16
squares by the actual n u m b e r of items, N. making u p t h e list (Step l).
Thc valuc of Oic srand;~rddcvialion for tllc sarnplc is 3.16.
Formulae
Formulae
(a) for a sample estimate (b) for the population value
(a) for a sample estimate (b) for a population value

where X = t h e individual score N = the number of scores used


X = t h e mean of the scores I; = t h e sum of
where X = the individual score . I;X = the sum of the scores
Operation schedule 5. The standard deviation X XZ = t h e sum of t h e squared scores N = t h e number of scores used
(alternative method) (I;X)2 = t h e square o f the sum of the scores
It is m o r e convenient t o use this method of obtaining the standard deviation when
you have large sets of numbers, o r when t h e mean is a n 'awkward' number - for
Operation schedule 6. How to rank sets of scores
instance with several decimal places. T h e method will give precisely the same result
as that given in schedule 4. T h e purpose of ranking is to give your scores a number, according t o their size. It is
Example rather like 'numbering off', with t h e smallest score normally given the number 1.
The standard deviation of the sample 18,20.22,24.26. Example
Step I : Count the n u m b e r of items o n the list, t o obtain N. Rank the scorcs 10, 15,13,22,21,9,22, 14.8.14, 12,17,22,22,9,14.
There are 5 items on the list, and so N = 5 for this sample. Step l : O n a piece of scrap paper, write the scores down in order of size, a n d starting
Step 2: Find t h e total of t h e scores. off with the smallest at the left-hand side.
1 8 + 2 0 + 2 2 + 2 4 + 2 6 = 110 8 9 9 10 12 13 14 14 14 15 17 21 22 22 22 22
Sfep 3: Square each o f t h e original observations. Step2: Count the number of scores which a r e t o b e ranked.
IS2 = 324, 202 = 400, 22' = 484, 24' = 576, 26' = 676 The group comprises 16 scores.
Step 4: A d d together the squares obtained in Step 3, to get the uncorrected sum of Step3: Below each score, write a number. Start at the left with number 1 a n d proceed
squares. t o number off each score until you finish with the final score a t the right-hand side.
T h e final score should have t h e s a m e number a s that found in Step?.
8 9 9 10 12 13 14 14 14 15 17 21 22 22 22 22
Stcl~5: T a k e the total of the scores (Step 2), and square it.
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6
1102= 12100
Step 6: Divide the squared total (Step 5 ) by the number of items in the list (Step l ) , Step 4 . Look along the top line of the numbers for any which are the same. These are
to get t h e correction term. called tied, a n d their ranks, o n the bottom line, must reflect the fact.
Thcrc arc two scorcs of 9, ~ h r c eof 14 and four of 22.
12 100
- 5
= 2420
Step 5: If there a r e two numbers in a tie, then the rank given to these two will be
.Trep 7: Subtract the correction term (Step 6) from the uncorrected sum of squares halfway between; i.e. the ranks will not be a whole number, but something-and-a-
(Step 4), t o obtain the corrected sum of squares. half. T h r e e tied numbers will each take t h e value of the middle rank, i.e. a whole
number, as will any uneven numbered group of tied scores.
2460 - 2420 = 40
Step 8: Divide the corrected sum of squares (Step 7) by the number of observations The two velucs o f 9 tic for ranks 2 and 3, and each will lake thc \aluc 21. The three values of
14, occupying ranks 7, 8 and 9. will each ~ a k rank
c 8. The four talucs of 22, taking ranks 13.
( S t c l ~I ) minus I , o r t h e number o f observations in the case of numbers comprisinga
14.15 and 16, will each have \he rank valucof 14!.
156
Operation schedules

Step 6: When all the ranks are sorted out. arrange the scores and the ranks in two
vertical columns, headed 'Score' :in({ 'Rank' respectively.

Score Rank

8 1
9 24
9 2i
10 4
12 5
13 6
14 8 Hint TableS1. Percentage area underthestandard normal curvefrom rho Incan Otozscores
In several nonparametric tests and Spearman's
rho calculations you have to work with the z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
ranks, rather than the actual scores. It is easy to
get ranks and scores muddled, and particularly 0.0 00.00 00.40 00.80 01.20 01.60 01.99 02.39 ( ) ? . ~ I J 03.19 03.59
when you are working with more than one set of 0.1 03.98 04.38 04.78 05.17 05.57 05.96 06.36 (0.75 07.14 07.53
0.2 07.93 08.32 08.71 09.10 09.48 09.87 10.26 I().(A 11.03 11.41
scores. You can avoid this by writing the ranks
0.3 11.79 12.17 12.55 12.93 13.31 13.68 14.06 14.43 14.80 15.17
out in a different coloured pen, s o that they can 17.00 17.36 17.72 18.08 18.44
0.4 15.54 15.91 16.28 16.64 18.79
be distinguished at a glance.
0.5 19.15 19.50 19.85 20.19 20.54 20.88 21.23 21.57 21.90 22.24
0.6 22.57 22.91 23.24 23.57 23.89 24.22 24.54 24.HO 25.17 25.49
0.7 25.80 26.11 26.42 26.73 27.04 27.34 27.64 27.94 28.23 28.52
0.8 28.81 29.10 29.39 29.67 29.95 30.23 30.51 30.78 31.06 31.33
0.9 31.59 31.86 32.12 32.38 32.64 32.89 33.15 33.40 33.65 33.89
1.0 34.13 34.38 34.61 34.85 35.08 35.31 35.54 35.77 35.99 36.21
1.1 36.43 36.65 36.86 37.08 37.29 37.49 37.70 37.00 38.10 38.30
1.2 38.49 38.69 38.88 39.07 39.25 39.44 39.62 39.80 39.97 40.15
1.3 40.32 40.49 40.66 40.82 40.99 41.15 41.31 41.47 41.62 41.77
1.4 41.92 42.07 42.22 42.36 42.51 42.65 42.79 42.02 43.&j 43.19
1 .S 43.32 43.45 43.57 43.70 43.82 43.94 44.06 44.18 44.29 44.41
1.6 44.52 44.63 44.74 44.84 44.95 45.05 45.15 45.25 45.35 45.45
1.7 45.54 45.64 45.73 45.82 45.91 45.99 46.08 4h.10 40.25 46.33
1.8 46.41 46.49 46.56 46.64 46.71 46.78 46.86 40.03 46.99 47.06
1.9 47.13 47.19 47.26 47.32 47.38 47.44 47.50 47.56 47.61 47.67
2.0 47.72 47.78 47.83 47.88 47.93 47.98 48.03 48.08 48.12 48.17
2.1 48.21 48.26 48.30 48.34 48.38 48.42 48.46 48.50 48.54 48.57
2.2 48.61 48.64 48.68 48.71 48.75 48.78 48.81 48.84 48.87 48.90
2.3 48.93 48.96 48.98 49.01 49.04 49.06 49.09 4'). 1 1 49.13 49.16
Operation schedule7.The Wilcoxon matched-pairs signed ranks 2.4 49.18 49.20 49.22 49.25 49.27 49.29 49.31 d'j.32 49.34 49.36
test 2.5 49.38 49.40 49.41 49.43 49.45 49.46 49.48 4').40 49.51 49.52
2.6 40.53 49.55 49.56 49.57 49.59 49.60 49.61 40.62 49.63 49.64
Data requirements 2.7 49.65 49.66 49.67 49.68 49.69 49.70 49.71 40.72 49.73 49.74
2.8 49.74 49.75 49.76 49.77 49.77 49.78 49.79 40.79 49.80 49.81
Scores must be paired off in some way, and of at least ordinal status 49.81 49.82 49.82 49.83 49.84 49.84 49.85 49.85 49.86 49.86
2.9
3.0 49.87 49.87 49.87 49.88 49.88 49.89 49.89 40.89 49.0 49.90
Example
. .
The Wilcoxon test can be carried out on the rating scores obtained for thc two brands of Source: F. C. Powcll. C a r n b r i d g e r n a ~ h e r n a t i c a ~ a n d ~ l a l i ~ ~ i ~pagc71
a~lab~e ~ , Cambridge
1076.
washing up hquid. Gresego and Kwikclcnc (see pages 68-70). The null hypothcsis (H,,) is that University Prcss.
159
Operation schedules 7a Evaluation of Wilcoxon's T statistic

the two set; of scores d o not differ, and the experimental hypothesis (H, or H,), is that TableS2. Critical values of Tforthe Wilcoxon matched-pairs signed rankstest. Tmust
Gresego gives better results than Kwikclene. be equal to or less than the stated value to be significant

Subject Rating Subject Rating Lcvel of significance forone-tailed test


Level ofsignificance forone-tailed test
0.05 0.025 0.01 0.005 0.05 0.025 0.01 0.005
Grescgo Kwikclene Grescgo Kwikclenc
Levelofsignificance for two-tailed test Level of significance for two-tailed test
1 8 5 6 7 6
2 7 5 7 9 N 0.10 0.05 0.02 0.01 N 0.10 0.05 0.02 0.01
5
3 9 2 8 6 5 - 28 130 117 101 92
4 7 6 9 5 1 -
5 6
6 2 1 - 29 141 127 111 100
5 8 9
7 4 2 0 . 30 152 137 120 109
8 6 4 2 0 31 163 148 130 118
9 8 6 3 2 32 175 159 141 128
Step 1: The data will be paired off, but in doing so, it does not matter which column
10 11 8 5 3 33 188 171 151 138
comes first. Subtract each number in the first column from its partner in the second. 11 14 11 7 5 34 201 183 162 149
Record the differences, being sure to note the sign. 12 17 14 10 7 35 214 195 174 160
8-5=+3,7-5=+2,...,5-6=-1 13 21 17 13 10 36 228 208 186 171
The differencesare: +3, +2, +7, + l , -1, + l , +4, + l , -1 14 26 21 16 13 37 242 222, 198 183
Step 2: Rank the differences according to size, and giving the smallest difference rank 15 30 25 20 16 38 256 235 211 195
1. DO not rank any values of 0 which occur. Ignore the signs during the ranking 16 36 30 24 19 39 271 250 224 208
17 41 35 28 23 40 287 264 238 221
procedure. (See schedule 6 for ranking instructions.)
18 47 40 33 28 41 303 279 252 234
Differences +3 +2 +7 +l -1 + l +4 + l -1 19 54 46 38 32 42 319 295 267 248
Ranks 7 6 9 3 3 3 8 3 3 20 60 52 43 37 43 336 311 281 262
Step 3: Add the ranks for all the differences which are positive, and all the negative 21 68 59 49 43 44 353 327 297 277
differences. Keep the two totals separate. 22 75 66 56 49 45 371 343 313 292
Positive difference ranks 7 + 6 + 9 + 3 + 3 + 8 + 3 = 39 23 83 73 62 55 46 389 361 329 307
Negative difference ranks 3 + 3 = 6 24 92 81 69 61 47 408 379 345 323
25 101 90 77 68 48 427 397 362 339
Step 4: Whichever total of the two obtained in Step 3 is the smaller is the value of the 26 110 98 85 76 49 446 415 380 356
test statistic T. 27 120 107 93 84 50 466 434 398 373

Source: F. Wilcoxon and R. A. Wilcox, Some rapid approximare sfafisfical procedures, page
Step 4: T o determine N, in order to decide whether a given valuc of T i s significant 28, table 2, 1964. New York: Lederle Laboratories. Reproduced with the permission of
using table S2, count the numbcr of pairs of scores used, but subtract any pairs whose American Cyanamid Company
difference was found to be 0 in Step 1.
There are 9 pairs of scores, and no values of 0. N = 9.

Operation schedule 7a. Evaluation of Wilcoxon's Tstatistic Step 2: Locate the appropriate value for N in table S2, using the column headed N.
Opposite N = 9, you should read the values 8 6 3.2.
Example Step 3: Move along the row until you arrive at the tabled value which is equal to or
The data analysed using Wilcoxon's test, in schedule 7 will be evaluated for significance, just larger than the value of Twhich you have just obtained.
using table S2. From the data, T = 6, and N = 9. l'hc second value of 6 is equal to the Tjust obtained.
Step 1: Decide whether the experimental hypothesis (H,) was a onc-tailcd o r Step 4: Read thc appropriate significance level off f ~ o m
the topof the column reached
two-tailed hypothesis. in Step 3. Take the one- or two-tailed probability level stated, according to the
In the experiment on washing up liquid, it had been hypothesised that scores from Gresego dccision made in Step 1.
would be higher than those from Kwikclene. Therefore H, is a one-tailed hypothesis. The obtained value of 6 is significant at the 0.025 level for a one-tailed hypothesis.
Operation schedules
8 The sign test

Step 5: State the conclusion.


The results were significant at the p = 0.025 level, using Wilcoxon's test ( N = Y, T = 6.
Step 2: Count the number of times the less frequent sign occurs, t o obtain the statistic
one-tailed hypothesis). Therefore the null hypothesis can be rejected, and it is concluded S.
that the performance of Gresego was superior to that of Kwikclene. The minus signs are less frequent, and occur twice. So S = 2.
Step 3: Count the total number of pluses and minuses, to obtain N. (Note that any
Note differences of 0 wilI not be included in N.)
If the obtained value of T falls between tabled values, both of which are significant, There are 9 signs, and so N = 9.
then the most conservative (i.e. the least significant, and with the highest value ofp) Step 4: Decide whether your test comprises a one- or two-tailed test.
is quoted. The hypothesis under consideration is a one-tailed hypothesis.

TableS3. Critical valuesof Sforthesign test. Smust beequaltoorlessthan thestated


value to be significant

operation schedule 8: The sign test


Level of significance for one-tailed test
Data requirements 0.05 0.025 0.01 0.005 0.0005
N
The scores must be paired off in some way, but can be the simplest type of data (i.e. Level of significancefor two-tailed test
nominal). 0.10 0.05 0.02 0.01 0.001

Example 5 0 - - - -
6 0 0 - - -
The data derived from the washing up experiment and analysed in schedule 7 will be used
7 0 0 0
- -
once more. -
8 1 0 0 0
-
9 1 1 0 0
-
Subject Rating Subject Rating 10 1 1 0 0
11 2 1 1 0 0
12 2 2 1 1 0
Gresego Kwikclene Gresego Kwikclene
13 3 2 1 1 0
1 8 5 6 14 3 2 2 1 0
7 6
15 3 3 2 2 1
2 7 5 7 9 5
3 9 2 8 16 4 3 2 2 1
6 5 4 3 2 1
4 7 6 9 5 17 4
6 1B 5 4 3 3 1
5 8 9
19 5 4 4 3 2
20 5 5 4 3 2
25 7 7 6 5 4
Step I : Put the data into a table, as above. It does not matter which column of scores
30 10 9 8 7 5
comes first. Mentally subtract each value in the second colunrn from its partner in the 11 10 9 7
35 12
first. If the answer is positive, give the pairs a plus (+) sign, and if negative, a minus
(-)sign. Note 0 if the two values are equal.
Step 5: Use table S3 to evaluate the statistic S (from Step 2), in conjunction with N
(determined in Step 3). The table is used in just the same manner as table S2 for the
S Gresego Kwikclene Sign S Gresego Kwikclene Sign Wilcoxon test (see schedule 724).
Opposite the value 9, we read 1,1,0,0, -. Our obtained value of S = 2 exceeds all these
1 8 5 + 6 7 6 + tabled values, and so we conclude that as the probability exceeds that given for the0.05 level,
2 7 5 + 7 9 5 + one-tailed test, our results are not significant.
3 9 2 +- 8 6 5 + Step 6: State the conclusion.
4 7 6 + 9 5 6 -
The results of the analysis were nonsignificant (S = 2, N = 9, sign test). and so the null
5 8 9 -
hypothesis cannot be rejected. It is concluded that the two washing up liquids do not differ
in their performance.
163
Operation schedules 9 The Mann-Whitney U test

Operation schedule 9. The Mann-Whitney Utest Step 6: Add the ranks which were given to the items on list A (i.e. R A )
Data requirements 1+2+3t+5t+11+12=35
Scores need to be of at least ordinal status. Step 7: Add the value found in Step 2 to that found in Step 4, and subtract the value
Example of Step 6.
The results gathered whilst investigating a new memorising technique, described in chapter
36 + 21 - 35 = 22
1 . will be used for analysis. They are: Step 8: Subtract the valuc of Step 7 from the result of Step2.
Set l : 30, 35, 45, 50, 7 5 , 80 (the unaided group) 36 - 2 2 = 14
Set 2: 45, 50, 5 8 , 62, 69, 7 0 (the aided group) Step 9: T h e values of Steps 7 and 8 give two values for the statistics U and U'
Theexperimental hypothesis was two-tailed, and the null hypothesisstates that there will be (pronounced 'U prime'). T h e smaller value will be U, and the larger U'
no difference between the two sets of scores. U = 14 and U' = 22
Step I: Put the two sets of scores into two vertical columns, with a good gap between Step 10: Use the two values of NA and N B , found in Step 2, in conjunction with table
them. If one of the lists is smaller than the other, put it first, and call it 'list A'. If the S4 (overleaf), t o determine whether the obtained value of U is significant. If U is
lists are the same size, then it doesn't matter which comes first. Call the second list equal to o r smaller than the critical values listed, then the null hypothesis can be
'list B'. rejected. Two significance levels are given in the table, for the 0.05 and 0.01
probabilities (two-tailed), and the 0.025 and 0.005 probabilities (one-tailed tests).
ListA List B Looking in the table at the point where N , = 6 and NE = 6 intersect, we read off the two
critical values of 5 and2. Our value ol U ( 1 4 ) exceeds both numbers, and so we cannot reject
30 45 the null hypothesis.
35 50
45 58
Step 11: State the conclusion.
50 62 The results of the analysis were nonsignificant ( U = 14, N , = NE = 6 ; Mann-Whitney U
75 69 test). and so the null hypothesis cannot be rejected. It is concluded that thescoresobtained
80 70 from subjects who were using the memory aid do not differ from those obtained by subjects
without the aid.
Step 2: Count the number of scores in each list, to obtain NA and NB. Then multiply
the two values together. This gives NANB. Note
NA=6, NE=6; NANE=6X6=36
A useful check is to rework all the calculations, but call the original list A , list B. You
Step 3: Take the number of scores in list A , and add the value of 1. should obtain precisely the same values for U and U'. Again, the smaller of the two
6+1=7 values is labelled U, and evaluated.
Step 4: Multiply the value found in Step 3 by the number of scores in list A , and divide
the answer by 2.
Formulae
T h e formula for U is either
Step 5: Rank all the numbers in both groups (see schedule 6 for ranking procedure),
taking both sets of numbers together, and giving the smallest score rank 1. Write the
ranks out in two more columns, headed RA and R,, and put each to the immediate
right of the original lists of scores A and B.
where N A = the number of items in list A
List A R, List B R, NB = the number of items in list B
R , = the sum of the ranks given to items in list A
30 I 45 31 R E = the sum of the ranks given to items in list B
35 2 50 5!
45 3? 58 7
50 51 62 h8
75 II 69 9
80 12 70 10
10 T h e t test for related s a m p l c
Operation schedules

Table S4. Critical values of Ufor the Mann-Whitney test. For each value of N,q and NB
there are two numbers. The top o n e is the value of Uwhich m u s t not b e exceeded for
significance at the 0.005 level for a one-tailed test (0.01, two-tailed test); the lower one
gives the value for the 0.025 level for a one-tailed test (0.05, two-tailed)

NA
2 - - - - - - - - - - - - - - - - - - 0 0
- - - - - U 0 0 0 1 1 1 1 1 2 2 2 2
3 0 ( ] 0 1 1 1 2 2 2 2 3 3
- - - - 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8
4 - - - - - 0 0 1 1 2 2 3 3 4 5 5 6 6 7 8
- - - 0 1 2 3 4 4 5 6 7 8 91011 11121314
5 - - - - 0 1 1 2 3 4 5 6 7 7 8 9 1 0 1 1 1 2 1 3
- 0 1 2 3 5 6 7 8 9 11 12 13 14 15 17 18 19 20
6 - - - 0 1 2 3 4 5 6 7 8 1 0 1 1 1 2 1 3 15161718
- - 1 2 3 5 6 8 10 11 13 14 16 17 19 21 22 24 25 27
7 - - - 0 1 3 4 6 7 9 10 12 13 15 16 18 19 21 22 24
- - 1 3 5 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
8 - - - 1 2 4 6 7 9 11 13 15 17 18 20 22 24 26 28 30
- 0 2 4 6 8 10 13 15 17 19 22 24 26 29 31 34 36 38 41
9 - - 0 1 3 5 7 9 11 13 16 18 20 22 24 27 29 31 33 36
- 0 2 4 7 10 12 15 17 20 23 26 28 31 34 37 39 42 45 48
10 - - 0 2 4 6 9 11 13 16 18 21 24 26 29 31 34 37 39 42
- 0 3 5 8 11 14 17 20 23 26 29 33 36 39 42 45 48 52 55
11 - - 0 2 5 7 10 13 16 18 21 24 27 30 33 36 39 42 45 48
- 0 3 6 9 13 16 19 23 26 30 33 37 40 44 47 51 55 58 62
12 - - 1 3 6 9 12 15 18 21 24 27 31 34 37 41 44 47 51 54
- 1 4 7 11 14 18 22 26 29 33 37 41 45 49 53 57 61 65 69
Operation schedule 10. The t test for related samples
13 - - 1 3 7 10 13 17 20 24 27 31 34 38 42 45 49 53 57 60
- 1 4 8 12 l6 20 24 28 33 37 41 45 50 54 59 63 67 72 76 Data requirements
14 - - 1 4 7 11 15 18 22 26 30 34 38 42 46 50 54 58 63 67
- l 5 9 13 17 22 26 31 36 40 45 50 55 59 64 69 74 78 83 1 T h e pairs of scores must be related t o each other.
15 - - 2 5 8 12 16 20 24 29 33 37 42 46 51 55 60 64 69 73 2 T h e scores must be of a t least interval status.
- 1 5 10 14 19 24 29 34 39 44 49 54 59 M 70 75 80 85 90 3 The scores in each g r o u p must be normally distributed.
16 - - 2 5 9 13 18 22 27 31 36 41 45 50 55 60 65 70 74 79 4 T h e two sets of scores must have similar variances.
- 1 6 11 15 21 26 31 37 42 47 53 59 64 70 75 81 86 92 98
17 - - 2 6 10 15 19 24 29 34 39 44 49 54 60 65 70 75 81 86 Example
- 2 6 11 17 22 28 34 39 45 51 57 63 69 75 81 R7 93 99105 A family of hares challengesa family of tortoises to a race. The tortoises agree, provided t
18 - - 2 6 11 16 21 26 31 37 42 47 53 5R 64 70 75 81 87 92 the hares wear lead boots. It so happens that each family comprises eight individuals;
- 2 7 12 18 24 30 36 42 48 5.5 61 67 73 80 86 93 99 106 112 parents and one grandparent, chiIdren aged 7 , 6 , 5 , 3 years, and baby. Thus individuals (
19 - 0 3 7 12 17 22 28 33 39 45 51 57 63 69 74 81 87 93 99 be paired off for statistical comparison purposes. The tortoises also stipulated t
- 2 7 13 19 25 32 38 45 52 58 65 72 78 85 92 99 106 113 119 performance for both families taken as a wholeshould be u x d for judging purposes, not.
20 - 0 3 8 13 18 24 30 36 42 48 54 60 67 73 79 86 92 99 105
- 2 8 14 20 27 34 41 48 55 62 69 76 83 90 98 105112119127 instance, a count-up of the number of winners in each family. The time to complete
course is measured in hours and minutes, but as the adjudicator, Statman. isn't too g(
Source: J . G. Snodgrass, Thenumbersgome, table C.7, 1978. Oxford University Press with a time clock, it is decided to round each time up to the nearest complete hour. No (
stated who the likely winners would be; the hypothesis is thus a two-tailed one. The res
appear overleaf.
Operation schedules
10 The t test for related samples

Individual Hare times Tortoise times Slc~p11: Takc the squ;~rcroot of the value obtained in Step 10.
(h r) (hr) \/(l. 1339 = 0.366
SIC^ 12: Dividc thc \.;~lueof Stcp 5 (thc difference bctwecn thc two means) by that
Granny found in Stcp I I . This pivcs thc valuc of 1.
Dad
0.75
Mum r = - = 2.01')
Offspring 1 0.366
Offspring2 SIC,) 13: Find the signific;~~lcc of I . using tablc S5 (ovcr1c;if). :rlld thc (lj'obtaincd in
Offspring 3 Step 2. The obtaincc1 I ~ n u s texceed thc valucs stated on thc tablc for significance at
Offspring 4 the various probability levels.
Baby Oppositc tl/ = 7 . \\.c rc;~dthe valucs 1.895, 2.365, 2.998, 3.499, and 5.408. Our valuc of
2.049 cxcccds thc first value, which is significant at thc 0.05 lcvcl for a one-tailed test only.
Step I : Count the number of pairs of scores involved. This gives N. Although thc harcs i~nmcdiatclyclaimcd that all along thcy had known that rhey would win,
There are 8 pairs of scores. N = 8. and so a onc-tailcd evaluation was appropriate, Statman ruled that in the absence of a
Step 2: Subtract 1 from the value of N (Step 1). This gives the df. writtcn statcrncnt bcforchand, the hypothcsis was two-tailcd. So the null hypothesis could
. df=8- 1=7 not be rcjcctcd; a draw wasdeclarcd and cvcryonc retired to thc Bull and Bush. exhausted!
Slep 3: Multiply the values from Steps 1 and 2 together. Note
8X7=56 The tortoiscs wcrc sorry that thcy had stipulated that family timesshould be taken as
Step 4: Find the mean for each set of scores. a wholc. Although the harcs wcrc sickencdover their failure to establish a one-tailed
1 5 + 4 + 9 + 9 + 10+10+ 12+17-86 hypothesis, the tortoises were equally dismayed, during the 'post-mortem', to realise
Hare mean time = - - -- 10.75 hr that when matched animals were compared, their family could boast five winning
8 8
Tortoise mean time = +
14 4 + 10 + 8 + 1 0 + 9 + 10 + 15 - 8 0 - 1 0 h r
---
times, as opposcd to thc hares' one winning time!
8 8
Formula
S ~ e p5: Subtract the smaller of the two values found in Step 4 from the larger, to give
the difference between the two means.
10.75 - 10 = 0.75
Slep 6: Subtract each score in the second column from its partner in the first. Note the
signs (+ o r -).
15 - 1 4 = + l 10 - 10 = 0 where X = the mean of the first list
4- 4= 0 10- 9=+1 Y = the mcan of the second list
9 - 10= -1 12 - 10 = + 2
9- 8=+1 17 - 15 = + 2
D = the difference between each pair of X and Y scorcs
S ~ e p7: Square each difference found in Step 6, and total the values.
N = the numbcr of pairs of scorcs #.L
= the differcncc
1 ~ + 0 ~ + ( - 1 ) ~ + . . . + 2 ~ ++ 2o ~+ =1 1+ I + o + 1 + 4 + 4 = 1 2
= the squared diffcrcnces,
S ~ e p8: Add up the differences found in Step 6, taking sign into account.
= the diffcrcnccs, totallcd
1+0+(-1)+1+0+1+2+2=6
Srep 9: Square the value found in Step 8, and divide by N (Step 1).
-= 4.5
8
Step 10: Subtract the value found in Step 9 from that of Step 7, and divide by the value
which was found in Step 3.
- -- -4.5 - 7.5
12
= 0.1339
56. 56
Operation schedules l1 The r test for unrelated samples

Table S5. Critical values of t. t must be equal to or more than the stated value to be
significant Operation schedule 11. The t test for unrelated samples
Data requirements
Level of significance for one-tailed test 1 The scores must be of at least interval status.
0.05 0.025 0.01 0.005 0.0005 2 The scores in each group must be normally distributed.
3 The two sets of scores must have similar variances.
Level of significance for two-tailed test
4 0.10 0.05 0.02 0.01 0.001 Example
A driver is able to follow two routes home, but is unsure which is the quicker, and so records
his journey time each day. On alternate weeks he follows the Spaghetti Junction Special
route ( S ) and the Traffic Light Trail (T) route, and obtains a mean score for the week. He
started and finished with an S week, thus getting an uneven number of times for comparison.
The null hypothesisstates that there is no differencein time between the two journey speeds.
The data, in minutes, are given below.

Week Mean S route Week Mean Troute


times (min) times (min)

1 32 2 24
3 35 4 28
5 36 6 30
7 37 8 32
9 36 10 35
11 42

Step 1: Count the number of scores in each set, toobtain Nsand NT; then
add the two together.
Ns=6, NT=5, 6 + 5 = 1 1
Step 2: Divide the total obtained in Step l by Ns and NT multiplied
together.

Step 3: Add the scores in list S.


32 + 35 + 36 + 37 + 36 + 42 = 218
Step 4: Square every score in the S group, and total all the squares.
(32)2+ (35)2+ (36)2+ (37)2+ (36)2+ (42)'
= 1024 + 1225 + 1296 + 1369 + 1296 + 1764 = 7974
Step5: Square the value obtained in Step 3, and then divide the result by
the value of Ns (Step 1).

Source: Powell, page 72 Step 6: Subtract the value found in Step 5 from that of Step 4.
7974 - 7920.667 = 53.333
Step 7: Add the scores in list T.
24 + 28 + 30 + 32 + 35 = 149
Operation schedules 12 Simple chi-square

Step 8: Square cvcry scorc in thc Tgroup, and tolal all thc squares whcrc X = the mean of thc first group of scorcs, list X
( ~ 4 +) (28)'
~ + (30)' + (32)' + (3512 Y = thc Incan of thc sccond group of scores, list Y
= 576 + 784 + 900 + 1024 + 1225 = 4509
/ ( = thc diffcrencc bctwccn tiic ~ w o
Step 9: Squarc thc valuc obtained in Stcp 7, and thcn dividc ~ h rcsult
c hy Xx2= thc sum of thc squarcd valucs o n list X
thc value of NT (Step I ) .
Z Y' = thc sum of thc squarcd valucs on list Y
(149)2 - 22 201 = 4440,2 (XX)* = 111csquare of thc total of list X
5 5
(X Y)' = the square of thc total of list Y
Step 10: Subtract thc valuc found in Slcp 9 froni that found in Stcp 8. N, = thc numbcr of scores in list X
4509 - 4440.2 = 68.8
N,. = the numbcr of scorcs in list Y
Srep 11: Add together thc valucs of Stcps 6 and 10. = multiply
53.33 + 68.8 = 122.133
Step 12: Subtract 2 from the value of Stcp I . This givcs thc (If. Operation schedule 12. Simple chi-square
11 - 2 = 9, df= 9
Data requirements
Step 13: Divide the value of Step 11 by the df (Stcp 12), and thcn multiply thc result
1 Data must be in frequcncy form, i.e. counted, rathcr than scores.
by the value of Step 2.
2 Entries in each cell must bc indcpendent.
'E
9
3
X 0.3667 = 13.57 X 0.3667 = 4.976 3 The expected number for each cell must not be lcss than 5.

Step 14: Take the square root of the valuc obtaincd in Step 13. Example
II
d4.976 = 2.231 One hundred and fifty-six boys and two hundred and four girls were asked whether they !
liked vanilla slices. Ninety-four boys said that they did, and one hundred and
Step 15: Obtain the mean of list S (Step 3 dividcd by Ns), and for list T(Step7 divided
seventy-five girls. The remainder did not like them. Is thcre any evidence that thcre is a
by N,). Then subtract the smallcr froni thc largcr. sex difference in preference for vanilla slices?
218 149
mean of list S = - = 36.333, mean of list T = - = 29.8 Step I: Plot the data into a large 2
X 2 table. The number in each cell is the observed
6
36.333 - 29.8 = 6.533
5 frequcncy. Obtain row and column totals, and a grand total (v.
Step 16: Dividc thc valuc of Step 15 by that of Stcp 14 to obtain t.
Column 1 Column 2 Totals
Like slices Dislikeslices
Step 17: Evaluate the significance o f t , using tablc SS, and the df obtained in Step 12.
The obtained t value must excccd the valucs stalcd on thc table for significance. CcllA Cell R
Row l
Opposite df = 9, we read the valucs 1.833,2.262.2.821,3.250,and4.781 .Our v;1lueof2.928
Boys 94 62 156
just exceeds that of the third along, 2.821, and so reading the probability lcvcl off from the
top of the column, we conclude that it is significant at the 0.02 lcvcl for a two-tailed test. The
Ccll C Cell D
null hypothesis can be rejected.
Row 2
Step 18: Statc the conclusion. Girls 175 29 204
The time taken to travel route Twas found to be lcss than that for route S. The difference
was significant at thcp = 0.02 level (two-tailed I test; t = 2.928, cif = 9). Totals 269 91 360

Formula
I X - 31 Step 2: Multiply thc row 1 total with thc column I tolal, and divide the answer by thc
overall total, N. This givcs the expected frequcncy for ccll A .
By multiplying thc row and column totals rclating to each cell togcther, and dividing
the answer by thc ovcrall total, N, you obtain the espected frequencies for cach ccll.
Write thc value obtaincd for each cell bclow the o b m i ~ l e dfrequcncy (with a diffcrcn~
coloured pen).
Operation schedules 12 Simple chi-square

269 = 116.6 TableS6. Critical v a l u e s ~ f ~ . ~ ' m beequal


ust to or more
Cell A : than the stated value to be significant
360
156 91
Cell B: - X

360
= 39.4
Level of significance for one-tailed test
0.05 0.025 0.005 0.0005
Cell C: 204 269 = 152.4
360 Level of significance for two-tailed test
204 X 91 df 0. l 0.05 0.01 0.001
Cell D : - = 51.6
360

Step 3: Find the difference between each observer! and expected frequency for each
cell, always taking the smaller from the larger. Then, for eachcell value, subtract 0.5.
Cell A: 116.6 - 94 - 0.5 = 22.1
Cell B: 62 - 39.4 - 0.5 = 22.1
Cell C: 175 - 152.4 - 0.5 = 22.1
Cell D : 51.6 - 29 - 0.5 = 22.1

Step 4: Square each of the cell values obtzined in Step 3, and divide the answer by the
. -
exoected freauency for that particular cell.
(22'1)2- (22'1)2-
Cell B: - - 12.396
Cell A: - - 4.189
116.6 39.4
(22'1)2- (22.1)2= 9,465
Cell C: - - 3,205 Cell D: -
152.4 51.6

Step 5: Add together the four values obtained in Step 4 to get


4.189 + 12.396 + 3.205 + 9.465 = 29.255

Step 6: Evaluate the value of ,y2, using table S6. In a simple chi-square test the dfwill
always be 1.
The obtained value of 29.25 exceeds the tabulated value of 10.83 which is significant at
the 0.001 level (two-tailed test). The null hypothesis can be rejected. Source: Powell, page 73
Step 7: State the conclusion.
A chi-square test carried out on the data obtained was significant at the 0.001 level
Ly2 = 29.254, df = 1). and so it is concluded that there is a difference between the sexes
in liking for vanilla slices.

Formula
T h e formula for chi-square, in which Yates' correction is incorporated, is

whcrs E = the expected frequencies


0 = the observed frequencies
. . Z = thesurnof
Operation schedules 13 Complex chi-square
--
Subjects Degree classification Totals

I Ili IIii 111 Pass

Languagcs 5 10 20 35 30 100
Maths 35 40 80 25 20 200
Economics 0 10 10 20 20 60

Totals 40 60 110 80 70 360

Step 2: T o computc the expected frequency for individual cells, follow this procedure.
Take the row sum and the column sum which intersect at the cell, and multiply the
two valucs togethcr. Then divide the answer by the grand total, N.
For thc top left-hand cell, Languagcs, and a class I degree. it will be
loo40 4000-
Row 1, Col. 1: ----- - ,],]l]
360 360
Don't round off the decimal places at less than three.
Row 2, Col. 1: 200 40 = 22,222
360

ROW3, Col. 1:
360
-
60 X 40
= 6,667

100 X 60
Row l,Col. 2: -------- = 16.667
360
Row 2, Col. 2: 200 60 = 33.333
360
60 X 60
Row 3, Col. 2: = 10.000
360
Operation schedule 13. Complex chi-square Row 1, Col. 3: -
100 X 110
= 30.556
360
Data requirements
Data must be in frequency form, i.e. counted, rathcr than actual scores. ROW2, Col. 3: 200 'l0= 61.111
360
Entries in each cell must be independent.
60 X 110
The expected number for each cell must not be less than 5. ROW3. Col. 3: = 18.333
360
Example
Row l,Col. 4: 1 0 0 x 8 0 = 22.222
-----.

Thc diffcrent degrcc classifications obtaincd by finalists at Wetwang University (page 360
99) providc matcrial for a complcx cl~i-squarcanalysis. Thc null hypothcsis states that 200 X 80
Row 2, Col. 4: ----- = 44.444
thcrc is no differcncc bctwccn thc thrcc groups in tcrms of the proportions obtaining thc 360
diffcrcnt classifications. 00 X 80
Row 3, Col. 4: ----- = 13.333
Step I : Put the data for the thrce groups of studcnts and fivc degrec classcs into a 360
3 X 5 contingency table. The numbcrs in cach cell stand for thc numbers of studcnts
70 = 19.444
ROWl,Col. 5: --------
loo
obtaining the particular degrec and degrcc class, and arc the observed frequcncics. 360
Add together the numbcrs across the rows to gct row totals, and d o w r ~the columns 200 X 70
to give column totals. Thc row and column totals addcd togcthcr should agrcc, and ROW2, Col. 5: ------ = 38.88')
300
that total will be thc ovcrall total. N. Lcavc plenty of spacc in thc tablc. for you will
later b c writing thc expected valucs in each ccll.
Operation schedules 13 Complex citi-squarc

S ~ c .p?. W h e n thc eupectetl v ; ~ l l ~ eI ~s a v c\ x e n calculated, put each o n e into t h e table, (44.444 - 25)' - (19.444)2
Row 2 . Col. 4: --=p= 378.069 8.507
bclow t h e appropriate obtuined rrcqi~cncy.Check that they all exceed 4. 44.444 44.444 44.444
(20 - 13.333)' - (6.667)' 44.449
Row 3. Col. 4: W-
- - 3.334
Subjccts Degree cl:~ssification Totals 13.333 13.333 13.333
(30 - 19.444)' - (10.556)' 111.429
I Ili llii Ill Pass l i o w l , Col. 5: P -= 5.731
19.444 19.444 19.444
Lansuases 5 10 70 35 30 100 (38.SS9 - 20)' (18.889)' 356.794
Roar 2, Col. 5: ------ = 0.175
11.111 16.667 30.556 12.222 19.444 38.889 38 889 38.889
20 200 (20 - 11.667)' (8.333)' 69.439
Maths 35 40 S0 25 Row 3. Col. 5:
22.222 33.333 61.111 44.444 38.889 I l .h67 11.667 11.667

Econo~nics 0 10 10 20 20 60 Srep 5: O b t a i n the value of by adding all the values obtained in S t e p 4.


6.667 10.000 18.333 13.333 11.667 3.361 + 7.348 + 6.667 + . . . + 9.175 + 5.952 = 74.696
Totals 40 60 110 80 70 360 Step 6: Obtain the df which will be needed toevaluateX2. It will always be t h e n u m b e r
of rows minus 1 multiplied by the n u m b e r of columns minus 1 , unless you are using
the steps for a one-sample chi-square, when the clf is always the number of cells
minus 1.
Step 4: T o obtain t h e values which will be added together to get x2, the following
procedure must b e undertaken for each cell. T h e difference between the obtained
and expecred frequency for every cell is found, i.e. by subtracting the srnallcr valuc Step 7: U s e table S6 to evaluate the significance of the value obtained.
from t h e larger. This is then squared, and the value divided by the expected frequency When df = 8, we read off the values 13.36, 15.51, 20.09 and 26.12. Our value exceeds
for that oarticular cell. all these, and so we conclude that it is significant at the 0.01 level for a two-tailed test.
(11.111 - 5)' (6.111) 37.344 - 3.361
Row 1, Col. 1: -p---
Slep 8: S t a t e t h e conclusion.
11.111 11.111 11.111 The different degree classifications for finalists in languages, maths and economics were
(35 - 22 222)2 - (12.778)' - 163.277 - 7,348 analysed, using chi-square. The value of X' was found be significant at the 0.001 level
Row 2, Col. 1: -p---

(,y2 = 74.696, df = 8), and so it was concluded that the proportions of students
22.222 22.222 22.222
obtaining the five classes of degree vary substantially between subjects.
(6.667 - 0)' - (6 667)' - 44.449 -
Row 3, Col. 1: -p---

6.667 6.667 6.667


(16.667 - 10)' (6.667)' 44.449 2,667 Note
Row 1, Col. 2: -p---

16.667 16.667 16.667 T h e test only tells us t h a t t h e numbers c o m e from distributions having different
(40 - 33.333)2 - (6.667)' - 44.449 shapes. W e have t o decide how t h e shapes differ, a n d where the most substantial
Row 2 , Col. 2: -p---

discrepancies lie by inspection of the d a t a . In addition, take care over t h e interpreta-


33.333 33.333 33.333
tion of this significant result. We cannot say that 'brighter'students take rnathsetc.,
Row 3 , Col. 2:
(10 - 10l2 - (0)' - 0 for the results may b e d e p e n d e n t upon factors o t h e r than students' abilities, such a s
10 10 10
departmental marking policies. Chi-square is only a measure of association.
(30.556 - 20)' - (10,556)' - 111.429
Row 1, Col. 3: -p--- - 3.647
30.556 30.556 30.556
(80 - 61.111)' - (1S.889)' - 356.794 - 5.838
Row 2. Col. 3: -p---

hl.lll 61.111 61 I l l
(18.333 - 10)' - (8,333)' -_ 69.439 - 3.7S8
Row 3, Col. 3: -p---

18.333 18.333 18.333 where E = the expected frequencies


(35 - 22.222)' - (12.77s)' 103.277 0 = the obserr,ed frequencies
Row 1, Col. 4: -p--- - 7.348
22.222 22.221 22.222 C = the sum of
,S/(,/I 2.- N o w casl the dar:i irlro 3 tal,lc \tp;ll1scvcn colurnr-is. Thesc \ \ r i l l I x :
S T h c Ial~cIf o r each subject, animal 01- snurcc of paired scores, using n u m h c ~ -os r
letters of the tilphabct.
A 7'hc scorcs on onc var-~ttblc.
Operation schedule 14. Spearman's rho I1 T l ~ sc c o ~ on ~ stlie scconcl vari:ible.
Data r c q u i r c n ~ e n t s l < , T h c scorci [I-om 1 2 . givcn ranks (scc sclicdulc 6 ) .
I Scores must be of at least ordinal status. l<,, 7'hc scorcs fro111U . givcn I-;inks.
2 Scores f o r comparison must b e paircd off in s o m e m a n n e r . Usually they will havc
L) Each valuc in list R,, subtracted iron1 its partncr in R,.
been obtaincd from o n c s o u ~ . c cb, u t this may not always b e t l ~ case.
c
3 When plottcd o n a scattergram, the points can b e in a linear s r slightly c u n ~ c dpattern. D' E a c h valuc in D hquarcd.
Exarnple
I t was notcd that rcsidcnts at the Hcyshott Asylu~iifrcqucntly rcfcrrcd to l'cnnyson's
Imem Martd. Encluirics rcvcalcd that thosc paticnts most seriously afflicted with ir1s;lnity
(as assessed by the Hcysliott Lunacy Scale, HLS) appcarcd to have rcad thc pocni rathcr
oftcn. All inmates wcrc given a rating on the HLS, and askcd ho-, many tirncs thcy had
rcad Maud. The rcsults wcrc thcn uscd to obtain Spearman's rho.

Subjcct HLSscorc Maud rcadings

S I ~ 3:
O C o u n t thc n u m b e r of paired scores in t h e s a m p l e , t o obtain N.
Thcrc arc 7 ~ ~ ; ~ofi scores;
is A' = 7.
Bccausc the staff anticipated that dcgrcc of insanity would be positively associatcd with
11~1nibcr
of M ~ u rcadings,
d thcy havc established a onc-tailcd prediction.
Step 4: Multiply N by its own value, twice, a n d thcn s u b t r t ~ c tits own valuc.
/ (7~7x7)-7-313-7=336
Slep I: D r a w a scattergram for the two sets of d a t a ; tlie vertical axis measuring o n e
variable, and the horizontal axis t h e o t h e r . S ~ e 5:
p Total the values in column 0'.
0+2;+6:+4+4+ 1+1=18!

Strl) 6: Multiply t h e valuc obtaincd in S t e p S by the n u m b e r 6; then divide t h e result


by the valuc found in S t c p 4.

t value of S t e p 6 l'roln tllc n u m b e r 1. Retain the sign.


7; T o find rhu, s u h ~ r a c the
.~'IL>/)
T h e result should al\vnys lie between - 1 a n d 1. +
1 - 0.3301 = +0.6696

Srep 8: Evaluate the significi~nccof rllo, using table S7. Follow the numbers along the
row opposite thc appropriate N, to find t h e valuc of r.110 which must b c exceeded for
the various significance Icvcls given.
Number of Afaud readings
When N = 7 , W C rcad 0.711. 0.756, 0.893 arid 0.929 fol- thc one-t:~ilcdsignificance
Figure 1. Scattergram showing the relationship between HLS scores and n u m b e r of Icvcls. Our ol)lairic(l value of rlrr~.O.hh9h. does ~io!cxcc.c.d ariy of thcsc. anti so thc null
Maud readings 111pnthcsis ca~inothe rcjcctcd.
O p e r a t i o n schedules 15 T h e Pearson product-moment correlation

Step 9: S t a t e t h e conclusion. Operation schedule 15. The Pearson product-moment


From the results of a correlation carried out on the number of times a patient had read correlation ( r )
the poem Maud, and their rating on thc HLS, it was found that there d ~ notd appear to
be a significant association between the two variables (rho = +0.67, N = 7). However, Data requirements
the association was positive, and the relationship may be found to be a significant one if l S c o r e s f o r comparison must b e paired off in s o m e m a n n e r . Usually they will h a v e
a larger sample were used. b e e n o b t a i n e d f r o m t h e s a m c s o u r c e , b u t this m a y not ;ilivays b c t h e case.
F o r m u l a a n d abbreviations 2 T h e relationship b e t w e e n t h e t w o variables must b e a linear o n e . T h i s can b e
assessed f r o m n s c a t t e r g r a m .
S p e a r m a n ' s rho is s o m e t i m e s written p o r r,. T h e formula used for calculations is'
6 1D 2 3 T h e scores must b e of a t least interval status.
r s = l -- 4 T h e s c o r e s must b e normally distributed.
(N3 - W
5 T h e two s e t s of scores must h a v e similar variances.
where C D = ~ the squared values of the differences between the ranked scores, totalled
'
N = the number of paired scores Example
A schoolgirl, recently annoycd by thc antics of her small sistcrs, vowcd that shc would
nevcr inflict siblings on any child shc might have u h c n shc was older. Curious to know
her classmates' opinions, she asked six o f thcm how many children they thought was a
good number to have, and compared this with the number of children under the age of
Table S7: Critical values of S p e a r m a n ' s rho. Rho m u s t b e
fifteen actually in each one's family.
equal to or rnorethan the stated value to b e significant
Bearing in mind her own experiences, she imagined that thc more siblings a person
had, the fewer they would think was desirable! Thus she has predicted a negative
Level of significance for one-tailed test correlation: this directional prediction constitutes a one-tailed hypothesis. T h e data she
0.05 0.025 0.01 0.005 collected are given below:

Level of significance for a two-tailed test


Subject NO.of children in No. of children
own family thought ideal

Christine 1
David 2
Eric 3
Lynne 4
Margaret 4
Tom 5

N = the number of paired scorcs uscd.


Trcat a negative value of rho as if it were positive, whcn
using the table, but when interpreting it don't forget that it will
indicatk an inverse relationship. Source: Snodgrass, table C . 6
Operation schedules

Srep I: Dra\\, a scattergrarn for the two sets o f d a t a , the vertical axis m c a s u i - i ~ ~one
g .Stol).?: Count thc n u n i l x ~of
- scores whic11 a r e paired
O f f 3 to
h,,
varik~ble.the horizontal the o t h e r . A t this stage, chcck for l i n c ~ ~ r i t y . .l'lic~-carc 6 pairs of scores: N = 0.
Slcp 4: Multiply thc total o f the cnlumn for A' by N to obtain ,+,FA?
71 X 6 = 426
S t c l ~-5: Obtain (\'A)' hy squ:irilig the total of the column fol A
19 X 19 = 361
Step 6 Subtract thc valuc o i S t c p 5 from that found in S t c p 4.
416 - 361 = 65
Step 7: Multiply the total of the colunln for B' by N to obtain m B ?
56 X 6 = 336
Stcp 8: Obtain (XB)' by squaring the total of the colunln fol- B.
Ih X 16 = 256
Step 9: Subtract the value of Step S Iron1 that found in S t c 7.
~
336 - 256 = 80
Step 10: Multiply together t h e values obtained in Steps 6 a n d 9.
Ideal number of children h5 X 80 = 5200
Step 11: T a k c the square root of the valuc found in S t e p 10.
Figure 2. Scattergram showing the relationship between t h e number of children in a v5200 = 72.11 i
y o u n g person'sfamily and the n u m b e r t h a t t h e y wouldconsider to be'ideal'
Ste11 12: Multiply the total of the column f o r A B by t h e value of N (Step 3 ) .
40 X 6 = 210
Step 13: Multiply together the totals o f the columns for A and 13
Siep 2: N o w cast t h e d a t a i n t o a table, with six columns. T h e s e will be: 19 X 16 = 304
S T h e label f o r cach s u b j e c t , aiiiinal o r source of paircd scores, using numbcrs o r Step 14: Subtract the value o f S t e p 13 Irom that of Step 12, making sure you k e e p thc
letters of t h e alphabet.
appropriate sign.
A T h e scores o n o n e variable ( a n d which a r e a d d e d to give X A ) .
230 - 303 = -64
E a c h A score s q u a r c d ( a n d \\,hich a d d e d gives CA').
Step 15: Divide thc valuc of Step 14 by thc valuc of S t e p 11, to get r.
B T h e scores o n the s e c o n d variablc ( a n d which a r e a d d c d to give C B ) .
B2 E a c h B s c o r e s q u a r e d ( a n d which a d d c d gives XB2).
A n E a c h A score multiplied by its matching B scorc. ( A d d e d givcs C A B) Step 16: Evaluatc t h c significance of I . , using table SE. First obtain the valuc of N - 2
A d d all the columns, a p a r t from the first o n e . This is a good stage t o chcck t11;lt tlic \vliicl~will b e used for the table, by subtracting 7 from the valuc of N f o u n d In S t c p 2 .
data meets all t h e requirements o t h e r than linearity. h-2=4
Opposite N - 2 = 4. we rcad thc values 0.71-9. O.Sll. 0.9172 and 0.9741 for the 0.05.
0.025, 0.005 and 0.0005 probability Icvcls (onc tailcd hypothesis). Our valuc of 0.8875
cxcecds that of 0.81 1 givcn for the 0.025 Icvcl..hut not thc ficure of 0.9172 for the 0.005
U ~ ~.
Icvcl. Tlicrcforc we conclude t l i ; ~ t\+,c have a ncp;~ti\.ccorrelation, \vhicli I S significant at
the 0.025 Icvcl.
Sic11 17: State tlic conclusion
From thc rcsulrs of thc corrclarion carried out on tlic number of childrcn in a young
person's family. and the numbcrclf childl.cn thc!~con~idcredideal. a significant ncgativc
association was found (r = -0.887, N = 6 , p = 0.07_5).I t \vasconcludcd that the morc
siblings thcre are in a family, the less likely a child of school-age is to apprcciatc tlicir
r)unihers, whilsronlychildrcn appcarcd toco~isidcrbrotl~cl-sors~sters morc of an assct!
2 Coninicnts: (a) Quitc well. (b) Not too wcll. Tlic scorcssccln to havc higher and lower valucs
rather than hc around the mean. (c) As for (b). (d) T h e scorcs are cvcnly sprcad, the 40 being
balanced by thc 0 , and-with the mcan in the centre of the group. (e) A disastcr. No score is
anyxvherc near 74.8. (1) Anothcr disastcr!
3 (a) Ycs. (b) and (cl Just about. ( d ) A s the scorcs arc so widely spread, the valuc of the mcan
as a iicscril,tivc statistic is dubious. (c) N o .
1 ( a ) Mean 12.5. No. It is ratlicr a high v:~lue.\r,l~cnmost of the scorcs arc 10. Tllcre arc only
two 'typical' scorcs, 10 ancl 20. This is not 1ic;lr citlicr. ( h ) Mcan 10.142857 - roundcd off to
10.13. Therc is only one atypical vaiuc fairly close to 10, and so this figurc, or the mcan, \\tould
both bc appropriate. (c) Again we havc a single atypical scorc, but this timc s o wildly different
tliat it raiscs thc mean to 10.396. Better to give a valuc of 10 in this casc, and indicate that t l ~ c r e
is one atypical scorc which has not been included in thc computation of the mean. (d) 11.58-
Chapter 1
tinc.
1 Using statistics tests, tlic results of evaluating the scorcs arc: 5 (a) 28 (b) 381 (c) 385 (d) 3.4 (c) 20 (f) 60
Experiment I: There is such a clcar-cut difference bctwecn thc scts of scorcs that WC can be 6 5(a) and l(a): They are so similar that it doesn't rcally matter. 5(b) and l(b) Again, close,
almost 100% certain that the memory techniquc works. but neither valuc particularly good. 5(d) and I (c), also 5(e) and l(d): Median and means very
Experirnent2: T h e two groups appear to bc reasonably different - it sccms WC can havc some close in values, but not ideal descriptive figure. 5(f) and l(f): Both figures a r e as poor.
faith in the memory technique.
7 (a) Mcan 13.7, rncdian 13, mode 13. (b) Mcan 13.7, median 13.5, no mode. (c) Mean 13.583,
Experirncnt3: Thcrc appears to bc very littlc diffcrcncc bctwccn thc two groups, and WC must mcdian 15, modcs 10 and 19. (d) Mean 21.69, median 23, mode 25.
concludc that thc memory aid does not havc a beneficial effect.
8 (a) All the averages are closc, so nonc would be too misleading, although the s h i p c of the
Experirnent 4: Thc group using thc new technique arc dcfinitcly poorcr at rccall than the
distribution is too'flat' to be labelled normal, with any confidence. (b)Thedistribution isagain
control group - thc memory 'aid' not only fails to aid recall, but actually appcars to impair it! a rather flat one, and thc values of the mcan and mcdian should only be used with qualifying
Later, you will learn about the statistical tests used to arrive at these conclusions. Meanwhile, statements. (c) This is a bimodal distribution, and so both modes must bc givcn. There is no
ask yourself how you arrived at your decisions - and whether they agreed with the 'statistical' scorc which resembles the mean or rncdian. (d) This is a skcwcd distribution. The modal value
oncs. Did you glance through the numbers, gctting a 'feel' for the scorcs, or did you work out is probably the best to give, and perhaps the median.
avcragcs? You might also havc lookcd a t thc total spread of scorcs, i.e. considcrcd thc lowest
and highcst in each group, and dctcrmincd the cxtcnt of any overlapping which occurred. 10 A -normal; B - negatively skcwed; C - positively skewcd; D - bimodal
There will be more about averages and spreads anon. 11 From the way thc data are grouped it is not possible to obtain precise values for the mcan,
2 (a) T h c man should usc descriptivc statistics. H c won't tell his wifc \\.hat tlic ten cars cost mcdian and mode. As the distribution is skcwcd, an indication of all three would be rclcvant.
individually, but present hcr with an average; maybc with an indication of thc type of pricc Thc modc will be in thc block 60-79; the media11 at thc top end of 4C-59; the mcan, whose
variation which exists. Of course, hc has sclccted his 'examplcs' carefully, so that the price of approximate valuc can be calculated from thc mid-point of cach block (i.e. 1 X 10, 8 X 30
his intended purchase comparcs very favourably with the avcrage pricc. ctc.) will be around 57.
(b) They could usc descriptivc statistics (averagcs) to arrive a t a typical journcy timc for each
route, and thcn inferential tcchniqucs to dccidc whether thc two scts of journcy times rcally
diffcred.
(c) Descriptive statistics. T h e fivc days' weight loss would bc avcragcd, and would bc
prcscntcd in such a way that the avcragc would hc sccn to bc equal to that of thc two days'
weight gain.
(d) Descriptivc and inferential techniques. Average yiclds of tomatocs could Oc obtaincd for
each bush, togcthcr with an indication of the range of variat~oncncountcrcd. lnfcrcntial
tcchniqucs could then bc used to comparc thc dcscriptivc st;itistics obtaincd for thc two scts of
plants.
(c) Descriptive statistics. A n avcragc would bc obtaincd [or cach 'samplc' shclf, possibly with
the rangc of cost. Estimation for thc other shclvcs would thcn hc on the basis of the avcragcs,
but taking a high value, rather than a low one, which might result in an ovcrall under-cstima-
lion.

Chapter 2 Exam marks obtaincd


1 (a) 27.0 (b) 370.4 (c) 3.5875 - roundcd off to 3.59. bccausc nic;lns arc usually givcn to
onc decimal placc rnorc than thc original data ( d ) 21 (c) 73.') (f) 60. Figure 7. T h e distribution of s t u d e n t s ' e x a m m a r k s
Answers Answers

Chapter 3 l l (a) 33 (b) 62 (c) 58 (d) 48 (e) 1 6

l (a)3.33 (b)8.S6 (c)6.4 (d)O.83 12 All values are rounded off to whole numbers: (a) 100 (b) 31 (c) 16 (d) 7 (e) 26
(f) 29 (g) 33 (h) 38
2 (a) Range 16 - finc. (b) Range 22 - a well-scattered bimodal distribution. (c) Range 22 - a
poor measure, as thcre is ;In outlier, ancl withou[ it the range would only be 4. (d) Range 4 - 13 (a) 0.750h (b) 3.01% (c) 35.94% (d) 50% (e) 81.59% (f) 92.92% (g) 99.55%
finc. 14 (a) 0 (b) f0.44 (50% f 17%) (c) f 2 . 4 1 (50% + 49.2%) (d) f 0 . 7 0
(50% + 25.8%) (e) -0.44 (50% - 17%) (f) -2.33 approx. (50% - 49%)
3 Modc l , median 1.5, range 51, mean deviation 12.2. The mode, range and mean deviation
arc fairly appropriate. but the mecli;~nand mcnn arc not good descriptive statistics to use. as Remember for (e) and (f) that the tabled values only cover 50% of the distribution, from the
tlie distribution is skcwed. Bcst to give ull the me;lsures of ccntr:il tcnclency. mean upwards. Percentages will be the same, but the z scores need a minus sign before them,
toindicate that they are below the mean.
4(a)112,18.67,4.3? (b)586,83.71,9.15 (c)330,66,8.12 (d)14,1.167,1.080
15 (a) 68 (b) 95.44 (c) virtually 50 (d) 0.13 (e) 15.74 (13.59 + 2.15) (f) 15.86 (twice
5 The standard deviations are all larger than the mean deviations. They would be larger still 7.93) (g) 74.92 (36.43 + 38.49) (h) 2.28 (50 - 47.72).
if N - 1 instead of N were used.
16 (a) 6.68% (b) 78.88% (c) 0.09% The number of applicants for Brillia in Never-Never
6 (a) 4.73 (b) 9.88 (c) 9.08 (d) 1.128. Land would be 30.85% of 30 million i.e. 9255000
7 (a) Normal distribution, mean, median and mode all 5. S D 0.707, range 2. (b) Normal 17 (a) 34.13% of 10000 = 3413 (b) 34.13% of 10000 = 3413 (c) 95.44% of
distribution, mean, median and mode all 4. SD 2.05, range 8. (c) Bimodal distribution, modes 10000 = 9544 (d) 99.74% of 10000 = 9974. .
Oand 10, range 10. S D not appropriate.
8 (a) S D 0.739 (b) SD 2.16. Chapter 5
Chapter 4 2 (a) 3.3% (b) 1% (c) 66% (d) 5 % (e) 0.1% (f) 0.2% (g) 100% (h) 10%
2(a) No. T h e offic~alname for a distribution with this shape is recrangulur. (b) Positively 3 (a) 0.033 (b) 0.01 (c) 0.66 (d) 0.5 (e) 0.05 (f) 0.001 (g) 0.002 (h) 0.1
skewed (c) Near enough to normal (d) Bimodal (e) Negatively skewed
3 Fairly different, as we know that 68% will vary between 12 and 20. Chapter 6
4 (a) About 1.14 dwarfs, i.e. 1. A score of 8 is two SDs below the mean, and so in a normal 1 (a) Domestic cats kept in Britain. (b) IQ scores of adult o r teenage people. (c) A population
curve only 2.28% would have scores below this. O u r sample comprises 50, so we haIve the of virtually infinite size, all raindrops. (d) All tinned sardines - o r perhaps, to name a
percentage. (b) and (c) 12 and 20 are both one SD away from the mean. We would expect sub-population, all tinned sardines of a particular brand. (e) All adultfteenage maleslfemales
15.87% to be below and above these values, i.e. about 7.93, o r 8 dwarfs from the sample of a particular nationality. (f) In the early morning, it is likely that the third person you saw
below and 8 above. (d) 28 is three SDs above the mean. 0.13% would be expected, s o in our would be a n adult, and perhaps more specifically, someone going to work. Therefore we might
sample of only 50 we would not anticipate anyone obtaining this score. hazard a guess at teenagetadult maleslfemales in employment, resident in a particular town o r
5 In a sample of 100 we would expect to find 0.13 in the extreme position of three SDs above locality. (g) Again, in broad terms, humans, but in the late afternoon less likely to be
tlie mean. T o get one score in this portion then, 0.13 has to be multiplied by 8, to give 1.04. So commuters. Shoppers, people out walking, jogging, taking dogs for walks, children playing,
a sample of 100, multiplied by 8 becomes 800. O u r original sample comprised 50, s o we would etc., would all make u p a likely afternoon population. (h) Responses on a particular task;
have to multiply it sixteenfold before we might expect to come across such a very very kind sub-population, responses after alcohol intake. (i) Population, cemeteries; sub-population,
dwarf! London cemeteries; sub-sub-population, Victorian London cemeteries. (j) Population,
women; sub-population, female members of Royal families.
6 That his sample is not typical (assuming that the test had been properly constructed). This
sample shows a skewed distribution. Alternatively, the dwarfs might have been deceiving
him by presenting themselves in a favourable light-a ploy not uncommon in human subjects! Chapter 7
7 Mean pie consumption is 1000 pies. 1600 pies is thrce SDs above the mean, and so this . l (a) IV = punishment; D V = child's personality; one-tailed
consumption would only be anticipated o n 0.13"/~ of occasions - provided the pies were (b) 1V = number of heads; D V = quality of intellectual output; one tailed
av:~ilable,of course! (c) IV = presence o r absence of someone; D V = affection; one-tailed
X ( a ) 2 2 . 5 a n d 7 . 5inches ( h ) f650andf350 (c) 12nnd4.8seconds (d)YSand65elephants. (d) IV = presence o r absence of someone; D V = thoughts of that person; one-tailed
(e) IV = stone movement; D V = amount of moss present; one-tailed
9 Although MS Wink has a slightly lower mean, she shows grcatcr variability of performance.
(f) D V = quality of laughter; IV = order of laughter; one tailed
Shc will have good days which ;Ire much better than MS Swcctie's good days, but on the other
hand, hcr 'off' days will be worsc. MS Scroogc's decision will rather depend on whether she 2 (a) one-tailed (b) two-tailed (c) two-tailed (d) one-tailed (e) two-tailed (f) one-
prefers consistency to variability. tailed
l 0 ( a ) Mean 60; 65.03,70.06,75.09, 54.97.49.94.44.91 4 (a) one-tailed (b) null (c) two-tailed (d) one-tailed (e) null (f) two-tailed (g) one-
( b ) Mean 7; 9.25, 11.50, 13.75,4.75,2.50,0.25 tailed (h) two-tailed (i) one-tailed (j) one-tailed
Answers Answers

Chapter 9 -3 ( : I )I = 2,403;(l/' :: 12 ( I ) 11 c; 0.0?5 (ii)p S O.tl.5


( h ) t == 0.24tl: (l[ = l2 ( I )1) S 0.0005 (ii) [J S 0.00 l
I l'hc IV is the typc o l \\,;tsliing up liquid used. I t has two 'v;~lucs'. citlicr Grcscgo 01.
(c) t = 3.002; (/I = 6 ( I )[J S 0.Ol (ii) [J 0.02
Kwikclcnc. Thc DV is the ratingscorc givcn for each pcriorrn;rnce in washing up by the pcrson
(d) I = 1 .OS(I4; tlJ = 12 (i) 11ot signilic;~nt (ii) not significat~t(And notice the diffcrcncc
wlio carried out tlic opcri~tion, and to cclcli o l tlic t\vo scp;lr:rtc ~~crforrn;inccs.The
~ l s tllo\c 01' .<(c).\vIic11tlic s i ~ i di~lil
I I ~ I \ V C C I I tIlc\c c o ~ i c l ~ s i o;)lid ~ ~ e\\'ere malcllctl.)
cxl)cri~iicnt;~l hy[~otl~csis is t l ~ ; l t Grcscgo scorcs \\.ill c ~ c e c dthose fro111 Kwikclcnc (i.e.
C; scol-cs > K scores). 7'llis is ;I oilc-tailed I~yl~otllcsis. :She ~)ullliyl)c~rllcsisi h t l l i ~ tthe r i ~ t i ~ ~ g s -1 ( a ) N ~ ~ I .111lctric~-
I ~ L H t~tilchs\ I O U sul)\~.r~l>c to t l ~ c\ ~ e \ vt11;1t I Q scorch :~cliic\c ; I I ~iritcr\~;~I lcvcl
for tlic pcrl'orni;~nccol'tlic two washing L I licluids\\~ill
~ riotclillcr. i.e. 111;11 ~licii-
numcricalscorcs 01' I ~ I ~ ~ I \ ~ I I ~(l))
I I N
~ ~~I II ~
~ ~
. I ; I ~ :I'<III I ~het~ .,l~ I scores
I ~ ~ cslio\v . ;I I)IIIIO(I;II ctibl~-iI~~~lit)~l. (c)
havc I)ccn derived fro111only one parciit population. llrlt L\\o sclur;icc, disting~iisli;il)lc.. orics. N o ~ ~ ~ ~ ; ~ r ; ~011 ~ iI iI I cC tgrou~id\
ric. t11;11 ; ~ t ( i t ~ ;~I < S ~l c
~ ~ ~ I I I C I I I S C ~po\sibly
I I ~ I ~ ~ ~give \c(Ircs ~ h i c l:IIK
i
hol)l1i\lic;ltc~/ c11011g~ll0 i l 1 1 ; l i l l i l l t ~ r \ ~ 1 1s1;1111s.
1 ( d ) G1 ;l1115 ; l t t l l i l l the r;llio Ic\'cl i l t IIleLISUrCI11CIlI;
2 (a) Tlic ranks of tllc lcss [I-cqucntly occurring s i p Sl\,e '/' = -3.Wllcri IV = S. the liull
~)Ii!.sic;~Icliar:~ctc~-istics tc~idto sIio\v 110rnlill d~st~.i[)utioilh, so provided the \.;~rianccs\\.cl-c
Iiyl)otlicsis c:~nl)c rcjcctcd, and the signilic;~nccIcvcl cluc)tcd is [J S 0.025. (l)) 7'= 9, N = 10. \ i l i i i l ; ~ ~ -;I. ~);iri~nletric ; ~ ~ ~ i l l y\vould
s i s be lillc. ( c ) Nonl);irclrnctric. as tl1c1-c too much of 21
Signific;~r~t at t l ~ 0.05
c Icvcl. onc-t;~ilcdtcst.
d ~ \ c ~ c p ; ~ nb c yt w c c ~tlic ~ varianccs of I0 and 100 sec..
3 (a) [J S 0.025 (b) 1) > 0.05 (i.c. nons~gnificant) ( c ) [ l S 0.005 (d) [J S (1.01 (c) p S
( , I )non\iyi~licant (11)0.05 (c) 0.05 (d) 0.01
0.025 ( l ) p > 0.05 (i.e. nonsignificant) (g) p S 0.025 (11) p 4 0.05
4 (a) /J S 0.05 (b) /J > 0.05 (i.e. nonsignificant) (c) p S 0.01 (cl) p 9 0.02 (c) 1) S
0.05 ( f ) / ~> 0.05 (i.e. nonsignificant) (g)/) S 0.05 ( h ) p > 0.05 (nonsignificant) Chapter 12

5 (a) N = 8, S = 2 (nonsignificant) (b) N = 10. S = 2 (nonsigliificant) I (;I)silnplc. 2 X 2 ( h ) complex. 3 X 2 (c) complex, 3 X 3 (d) one sample, I X 2 (c)
one samplc. 1 X 6
6 (a) p S 0.05, onc-tailcd; /J > 0.05 (nonsignificant). t\\,o-tailcd (h) nonsignificant, both
onc- and two-tailcd (c) p S 0.025, onc-tailed; /J S 0.05, two-tailed (d) p < 0.005. onc- 2 0.001 lcvcl (two-tailcd tcst). I t is concluded that thcrc
= 30.217. which is significal~tat t l ~ c
tailcd; p S 0.01, two-tailcd (c) p S 0.025, one-tailed; p S 0.05, two-tailcd (f) p S is a tliffcrcncc bctwccn t l ~ ctwo groups, with tlic Bymor customers, illdeed. buying more-or,
0.0005, one-tailed; p S 0.001, two-tailed (g) p S 0.05, one-tailed; p > 0.05 (nonsignifi- at Icast, speritling more! If i t had bccn prcdictcd [hat thc Bymor customers would spend more
cant), two-tailed (h) This could only be significant at t h e p S 0.05 lcvel for a one-tailed tcst. than thc Ripoff patrons, thcn WC would havc established a one-tailcd hypothesis, and the
N is too small to obtain a highcr significancc Icvcl. rcsults would havc I)ccn significant at tlic0.0005 level. Howcvcr, if thcoppositc prcdiction had
bccn mi~dc,thcn strictly speaking, wc could not ha\fcrcjcctcd the null hypothesis.
3 = 5.6516, d / ' = 5. This valuc docs not cxcccd the minimum valuc of 11.07 (for a
8 (a) U = 31, U' = 311; N,, = 5, N, = 7; onc-tai1cd.p S 0.025; two-tai1cd.p 0.05
(b) U = !, U' = 120!; N, = N, = I l ; onc-tailcd,p O.UO5; two-tailcd,p S 0.01 two-tailcd hypothesis), and so WC can conclude that thc pattern of sickness just 0vcr.a dccadc
ago tlocs not dillcr from that shown by the mctliacval monks. Did you notice tllat the figures
(c) U = 10:. U' = 14!; N, = N, = 5; thc valuc is too Iargc to bc significant, cvcn with a
givcn were pcrccntagcs? Ideally, thc statistical comparison should bc based on the original
onc-tailcd test.
numbcrs, not pcrccntages, but as the samplcs were known to be fairly large. and the valuc of
9 (a) [J > 0.05. nonsignificant (b) p > 0.05, nonsignificailt (c) p S 0.025. one-tailed; clri-\qu;~rea long way ollsigni!icancc, this violation of thc rule can be overlooked.
p 0.05. two-tailed (d) p 0.025, onc-~ailcd;p S 0 05, two-tailcd (c) p S 0.005, onc- 4 ,y' = 7.810. dJ = 3 . This is significant at thc 0.05 lcvcl for a two-tailcd test, and so i t is
tailed; p 0.01, two-tailcd (f) p > 0.05, nonsignificant
concluded that tllcrc is a real diffcrcncc bctwccn [he [owns when it comes to cyclc safety!
10 (a) Wilcoxon (b) Mann-Whitney (c) Wilcoxon ( d ) Wilcoxon (e) Wilcoxon
5 Using the c.q,ccled classcs of 5,25,40,25 and 5 , \VC gct a X 2value of 26.9544. When dJ = 4,
1111scxcccds the valuc of 18.47 givcn for the Iii_~hcst
significancc Icvcl of 0.001 for a two-tailcd
Chapter 10 tcst. WC concludc therclorc that thc pattcrn of dcgrec classification3 d~ffcrsfrom rhc pattcrn
(a) nominal (b) ratio (c) ordinal (d) ratio ( c ) ordinal (The Beaufort scale is ovcr a of thc normal distribution.
ccritury old. and the intervals dcvelopcd wcrc b;~scdon persona: observation, and calculatcd h (;I)[J 0.025 and 0.05 (oric- and t\vo-tililcd) ( h ) 0.05 and not sienific:~nt ( c ) llcithcr valuc
according to the cflcct thc wind was crcating and its cstimatcd spccd. Thus on land, a force 4 signilicant (d) 0.0005 and 0.001 (C) 0.005 and 0.01 (1) ncither probability valuc signifi-
brccze movcssmall branches; at sca. a forcc 8 (frcshgalc)causcs'smacksto make for harbour'! c:1nt.
An ordered systcm, but not prccisc cnough to bccatcgor~scdat highcr lcvel than ordinal.) (f)
orcli~ial (g) ordinal, although somc paycl~ologisc> trcat IQ scorcs as intcrval Chapter 13
(h) o r d i ~ ~ a l(i) ratio (j) nominal
I IV - usc of tr;inquill~sers;DV - cxam marks
Chaptcr 11 2 Tlic student5 and their previous cx;lm resultb. IQ, sex, mctlical rccorti. subicct, prc-cuam
drug intake
I (a) rcl;ltcd (h) indcpcndcnt (c) indcpcndcnt. unless people wcrc matched on a onc-to- 3 Anxicty, fatiguc, rnci~lscatcn prior to exam. body \\'eight
onc basis, using v;iriablcs such as height or weight (cl) unrcl;~[cd.unlcss, again. thcrc is somc 4 Bias (cxplaincd i i i the text)
basis for pairing [he scorcsofl (c) related. 5 One-tailed
2 ( a ) I = 2.493 ( b ) I = 6.240 (c) r = 3.(102 ( d )1 = 1.6804. The v;~lucof r issmallcr. and h 7-hiit there \\.ill l)c IIO dillcrcncc i n cs;im marks bct\vecn \tu~lcnt\\v110 do ;~iidtl~ose\v110 do
thus lcss likely to be signilicant. when the d;11;1arc 1101 rn;~tcI~cd. riot rake tr;~nclt~illiscr\ I>cSorccs;~lns
193
7 Rejected
8 Certainly n Mann-Whitney U test, but poss~blyan unrelated I test if the exam marks are
considered to be interval data
9 There is a one in twenty chance that the difference between the two sets of results is due to
the effect of chance factors.

l " I
Chapter 15 0 5 1 0 15 0 5 10 1 5
Variable A Variable A
l
(a) Good positive, but slightly curvilinear. (b) Perfect negative, linear. Pearson's pro-
Therefore Spearrnan's rho would be duct-moment is appropriate - and we
used. have here a rare occasion when a line
can be drawn in!

Variable A Variable A

Variable A Variable A
(C) Spearman's rho can be used for this (d) For this pattern, which is U-shaped,
curvilinear association. neither Pearson's nor Spearman's rneas-
ures of association would be appro-
priate.

Variable A Variable A

Figure 2

2 (a) + l (b) +0.97 (c) -0.21 (d) - l


3 r, =. 0.741.The probability, if you decide to obtain it, for a one-tailed hypothesis, when
N = 7, is 0.05. Note that the results would not have been significant if a one-tailed prediction
had not been established. Variable A Variable A
Well, the manager of Beastly Breweries pats himself on the back here, and is so ~mpressed
by what he considers to be a correct diagnosis, that he enrols for Psychology A level! During (e) Almost perfect positive linear relation- (f) Almost two separate events here. As
the course he discovers that correlation need not mean direct causality; closer inspection of his ship. Again, Pearson's product- the line does nor bend back on itself. we
staff and their working conditions reveal that the height of the counter is a crucial factor in moment could be used, and a line could can use Spearman's rho here; however.
determining the quality o f staff-customer interaction. Additionally, variables other than ones also be drawn in. the sharp bend in it precludes Pearson'sr.
directly relating to the pubs might havecaused a drop in sales, e.g. stricter drinking and driving
regulations, price increases, rise in unernploymenl. Figure 3
Answers

5 (a) r = + l (b) r = +0.937 (c) r = -0.35s (d) r = -0.987


I t is intcrcstirlg to coniparc thcsc valucs with thconcsohtained usingSpc;~rm;ln'stcchniquc.
n l c drop from -1 to -0.99 in the final pair of scorcs reflects thc small 'kink' cvidcnt in the
scattergram.
6 (a) An indircct relationship, in that i t is warm wcathcr which rnakcs pcol~lcconsurnc morc
bccr. I t n~iglitalso bc that during the summcr months folk get out arid at~outrnorc, and visit
pubs whilst on thcir travcls. Areas under t h e normal curve
(h) A dircct link. Thc bus's appcarancc or non-appc;lrancc is a causal afcnt.
(c) Very difficult to tease out. I t !night be that cvcnts associated with being horn ~~r~crn;~turcly
also causc a ccrtain amount of brain damagc. tiowcvcr i t might also bc that agents which cilusc Standard deviatioris
prc~naturity(rathcr than oncs acting at thc tirnc o f birth) also give rise to brain conditions
3 2 1 Mean 1 2 3
which undcrlic later schooling difficulties.
(d) Indircct. Factors connectcd with standard of livingcomc into this. Thc more pcoplc have
cars and travcl on motorways, thc rnorc likcly it is that thcir society is onc which fosters trips
by. planc.
. Therc may be a morc dircct link hcre, additionally. Thc wcathcr conditions could
affect the accidcnt rate for cars and plancs simultaneously.
(c) Not too indirect, in that surplus milk (a rcsult o f low salcs) is convcrtcd into triflcs.
(f) Difficult to sce any connection at alI bctwccn these two variablcs. although the morc
imaginative rcadcr may be ablc to dcvisc somc link which cxplains what looks like a
coincidental relationship.
7 All values are of one-tailed probabilities:
( a ) p S 0.025 ( b ) p S 0.05 ( c ) S
~ 0.025 (d)p S 0.005 (e)p S 0.01 (f) nonsignifi-
cant
8 (a) 20.549 (b) f 0.621 (c) 20.296 (d) +0.412 (c) f1.000 (f) 20.833 (g) 20.534.

Cu~iiulativearea under the curve


Appendix

Experimental design and statistical tests


Related (or matched) samples Unrelated (or independent) samples Page numbers in bold indicate main entries.
Matched subjects design Independent subjects design abbreviations 2 6 , 4 4 , 5 8 , 6 2 , 141 single subject 122
Subjects a r e paired off, s o that each All available subjects a r e divided into ANOVA 29,30 unrelated, see independent subjects
score in o n e group can be n ~ a t c h e do r two groups which a r e then given differ- arched distribution 130 within subjects 108, 109, 112, 198
area under normal curve 33, 3M; table dispersion 2.24-30.84-6
compared specifically with a particular ent treatments. A basic assumption is
159; figure 197 distribution
score in the second group. that the groups a r e comparable at t h e averages 2 , 13-23 arched 130
Pairing is in terms of relevant variables. outset o n all t h e variables considered see also central tendency bimodal 22
Identical twins a r e often used. relevant. Because there may be a great curvilinear monotonic 129
baseline 46
Repeated measures design deal of variation between all t h e sub- bimodal distribution 22 Gaussian 31
jects in t h e two groups, this design is leptokurtic 32
Two scores a r e obtained from every Cattell's 16 PF41
less sensitive t o slight changes which normal 21,31,32,33-6,84,85,91;figure
subject, a n d these a r e then analysed. cells 92,93,94,100, 101 197
'Before and after' types of studies a r e a may occur a s a result of t h e experimen- central tendency 2.13-23 platykurtic 32
common example of this design, which tal treatment. and normal distribution 21 skewed 18,21,22,40,91
It is also known a s the betweerr subjects chance 6 U-shaped l30
is also known as t h e within subjects see also probability
design. double-blind design 104
design. chi-square 92-101. 173-9; table 175
complex 98,99; OS 176 electoral roll 121, 122
Statistical tests for related samples Statistical tests for unrelated s a m p l e s and degrees of freedom 100, 101 errors
and explanation 94.95 Type 166,104,116
for independent samples 94-6 Type I1 66,76,84,116,135
Minimum hlinimum one sample 9- see also variable, confounding and
data level data level restrictions 93,94 nuisance; samples, bias
simple 94-6; OS 173 ethical considerations 102, 109-1 1, 139
Nonparametric Nonparametric Yates' correction 99 experimental work 3 4
Sign test Nominal Mann-Whitney Utest Ordinal confounding error, see variable stages of 5 4 4 6 3
Wilcoxon Ordinal control 4 , 5 4 , 102, 103 F test 29,65,86
correction factor 29; OS 156 fatigue effect 109
Parametric Paramelric correlation 124-40, 18C-7 frequency diagram 20,21
Related ttest Interval Unrelated r test Interval and causation 132-4
and experimental work 138,139 Gaussian distribution 31
and linearity 131, 134,135 generalisation 51, 113-19
Notes Pearson's product-moment 131; OS 183, goodness of fit, test of, 91-101
1 T h e division of tests is into two - related a n d unrelated - despite t h e division of table 186 homogeneity of variance 86
and prediction 136, 137 hypothesis 57-63
designs into three. R e p e a t e d measures a n d matched subjects designs a r e given
and probability 130, 139, 140 abbreviations for 62
identical statistical treatment, e.g. the related t test if the d a t a meet the requirements Spearman's rho 128-31; OS 180; table alternative 59
for parametric tests. 182 directional 58,61,62
2 All t h e tests answer t h e question: ' D o the sets of scores c o m e from o n e o r t w o counter balancing 105, 106. 109 experimental 5 7 , 5 9 , 6 0 , 6 l - 3
underlying populations?'Those listed above answer t h e question o n t h e basis of t h e curvilinear monotonic distributions 129 null 6 0 , 6 1 , 6 2 , 6 3 , 8 4 , 9 2
actual values of t h e scores, whilst the chi-square test answers o n the basis of t h e degrees of freedom 8 8 , 8 9 one- and two-tailed 58,59,61-3,67,69,
pattern of t h e distribution which the scores make. and chi-square 100, 101 70,140
3 Don't confuse correlation with testing. I n correlation w e a r e simply describing t h e design 102-12, 198 IQ scales 81,82
ABAB 106
relationship between two sets of numbers, a n d sometimes using the information t o ABBA 106, 108,109 leptokurticdistribution 32
make predictions. between subjects 112, 198 Likert scales 129, 131
double-blind 104 line of best fit 127
independent subjects 107,112, 198 Mann-Whitney Utest 75-7,87,91,107, OS
matchedsubjects 107,108,110,111,112, 164; table 166
198 mean 13-15,19; OS 153
repeated measures 107,108,109,112,198 mean deviation 25-27,30; OS 154
related, see matched subjects median 15-17; OS 154
Milgr:~n~.
S. l l O events I l 6
mode 17-19 and cxpcrirncnts 54-(,
naturc-tlurturc dcbatc 110 indcpendcnt 87,Y I . 107
nonparamctric methods 84.85.86.9 1, 107 matchcd 87,91, 107, 108
normal distribution 3 1 4 2 . 9 1 ; tnblc 159; parfiiil 1 3 4 4
figurc 197 random 119-22
propcrtics of 32 size of2S,94, 117-10. 122. 135
nurnlxrb 79-83 and statistics 2S
cardir~alS3 sti~nulus123
continuous 83 stratified 120
discrctc 83 systematic I I9
interval 79.82.84 tirnc I lh, 123
nominal 79, 80.81 scalc, see numbcrs
ordinal 79.81 scattergram 1 2 ( ~ S 131
,
ratio 79.82.83. 81 Siegal 65
and stat~sticaltcsts 83-h sign tcst71-S, 84-91; OS 162; table 163
significance. see probability
outlicr 16, 134 skewed distribution 18.21.22.40.91
Spcarman's rho, see correlation
parametric methods 84.85-9, 107, 131
spread, see dispersion
assurnplions underlying 84
standard dcviaiion 2G30.34-8,49; OS 155.
Pcarson's product-moment, see corrclation
156
placcbo 86, 104
standardisation 41.42
platykurtic distribution 32 standardised scorcs 37-41
population 28, S W . 91.92, 118
statistics
, parent 51.85, 114. 115, 123 descriptive 1 , 2 , 19.49. 124
general 121, 123
inferential 3 4 . 4 9 , 121
power 74,75,84,87. 109 sum of squarcs 27; OS 155
powcr-efficiency 71 correctcd 29; OS 156
practice cffcct 109
surveys 5.95, 102
probability 43-8
and correlation 130. 139. 140 1 test
87-90, 167-72: tahlc 170
and experiments 6.43.47.48.64.71 related 88; OS 167
cxpressions of 43-5 Studcnt's 87.88
and statistics h, 43-8.68-71 unrclated 88, 107: OS 171
twins 110. 11 1
randorn number tables 106. 118. 119. 120:
table 187
~andornisation106, 107 validity 137
rangc 21,25 variable 57.58.59,o.i. S1.91
ranking OS, 157 confounding 103. 104-9
regression linc 127 deperldcr~t58.59.60. 63, (A, 67. 94
reliability 137 indcpcndcnt 58.59.h~l.63.61.67.SS. 91
intcr-judgc 114 ~ ~ u i s a n c103
e
rcplication 47,71, 113 variance 27.29.30; OS 155-7
rcports. writing of 99. 1 1 1 3 homogcncity of S6
Roscnthal. R . 99 ratio tcst. F tc,t
liosno\\.. R . L. 99
WAIS 1 1 . trtr~l.rc,ctrl\o 10 sc:~lcb
sarnples 28. 51M Wilcoxon t o t 07-75. S1. N7. SS.91 : O S 159;
biascd 106. 107, 115-17. 121-3 lalllc 161
clustcr 120
conditions 123
and correlation I3l-h

You might also like