Quantitative data analysis: Hypothesis testing techniques

CHAPTER
12
Quantitative data analysis:

hypothesis testing
Topics discussed
Type I errors, type II e rrors and statistical power

Choosing the appropriate statistical technique
Testing a hypothesis about a single mean
Testing hypotheses about two related means
Testing hypotheses about two unrelated means
Testing hypotheses about several means
Regression analysis
Stand ardized regression coefficients
- Regression with dummy variables
- Mu!ticoUinearity
- Testing moderation using regression analysis: interaction effects
Other multivariate tests and analyses
Excelsior Enterprises - hypothesis testing
Data warehousing, data mining, and operations research
Some software packages u seful for data analysis
CHAPTER OBJECTIVES
After completing Chapter 12, you should be able to:
1. Describe the process followed in hypothesis testing.
2. Describe the concepts type I error, type n error, and statistical power.
3. Describe how to choose the appropriate statistical technique to test hypotheses.

4. Explain when and how to use the most important statistical techniques to examine
hypotheses.
5. Explain how to use regression analysis to test moderation and mediation.
Introduction
In Chapter 4 we discussed the sleps to be followed in hypothesis development and testing.
These steps are:
1. State the null and the alternate hypotheses.
2. Determine the level of significance desired (p = 0.05, or more, or less).
3. Choose the appropriate statistical test depending on the type of scales that have been used
(nominal, ordinal, interval, or ratio).
4. See if the output results from computer analysis indicate that the significance level is met.
When the resultant value is larger than the critical value, the null hypothesis is rejected, and
the alternate accepted. If the calculated value is less than the critical value, the null
hypothesis is accepted and the alternate hypothesis rejected.
In this chapter we will discuss hypothesis testmg. First, we will pay attention to type I errors,
type II errors, and statistical 1KJWer. Then, we w ill d iscuss various univariate and bivariate
statistical tests that can be used to test hypotheses. Finally, we w ill come back to the Excelsior
Enterprises case and test the hypotheses that were develo ped in the previou s chapter.
Type I errors, type II errors, and statistical power

In Chapter 4 we explained that the hypothetico-cieductive method requires hypotheses to be
falsifiable. For this reason, null hypotheses are d eveloped. These null hypotheses (Ho) are thus
set up to be rejected in order to support the alterna te hypothesis, termed H A .
The null hypothesis is presumed true until statistical evidence, in the form of a hypothesis
test, indicates otherwise. The required statistical evidence is provided by inferential statistics,
such as regression analysis or MANOV A. Inferential statistics help us to draw conclusions (or
to make inferences) about the population from a sample.
The purpose of hypothesis testing is to determine accurately if the null hypothesis can be
rejected in fa vor of the alternate hypothesis. Based on the sample data the researcher can reject
the null hypothesis (and therefore accept the alternate hypothesis) with a certain degree of
confidence: there is always a risk that the inference that is d rawn about the population is incorrect.
There are two kinds of errors (or two ways in w hich a conclusion can be incorrect),
classified as type 1 errors and type II errors. A type I error, also referred to as alpha (a), is the
p robability of rejecting the null hypothesis when it is achtally true. In the Excelsior Enterprises
example introduced in Chapter 11 , a type I error would occur if we conclud ed, based on the
data, that burnout affects intention to leave when, in fact, it does not. The probability of type I
error, also known as the sigllificance level, is determined by the researcher. Typical significance
levels in business research are 5% 0.05) and 1 % 0.01).
A type U error, also referred to as beta ({3), is the probability of failing to reject the null
hypothesis given that the alternate hypothesis is actually true; e.g., concluding, based all the
data, that burnout does not affect intention to leave when, in fact, it does. The probability of
type. II error is inversely related to the probability of type I error: the smaller the risk of one of
these types of error, the higher the risk of the other type of error.
A third important concept in hypothesis testing is statistical power (1 - /3). Statistical
power, or just power, is the probability of correctly rejecting the nuU hypothesis. In other
words, power is the probability that statistical Significance will be indicated if it is present.
Statistical power depends on:
1. Alpha (a): the statistical significance criterioJ1 used in the test. If alpha moves closer to zero
(for instance, if alpha moves from 5% to 1%), then the probability of finding an effect when
there is an effect decreases. This implies that the lower the a (that is, the closer a moves to
zero) the lower the power; the higher the alpha, the higher the power.
2. Effect size: the effect size is the size of a difference or the strength of a relationship ill tile
population: a large difference (or a strong relationship) in the population is more likely to be
found than a small difference (similarity, relationship).
3. The size of the sample: at a given lcvel of alpha, increased sample sizes produce more
power, because increased sample sizes lead to more accurate parameter estimates. Thus,
increased sample sizes lead to a higher probability of finding what we were looking for.
However, increasing the sample size can also lead to too much power, because even very
small effects will be found to be statistically significan t.
Along these lines, there are four interrelated components that affect the inferences you
might draw from a statistical test in a research project: the power of the test, the alpha, the
effect size, and the sample size. Given the values for any three of these components, it is thus
possible to calculate the value of the fourth. Generally, it is recommended to establish the
power, the alpha, and the required precision (effect size) of a test first, and then, based on the
values of these components, determine an appropriate sample size.
The focus of business research is usually on type I error. However, power (e.g., to
d etermine an appropriate sample size) and in some sihlations, type n error (e.g., if you
are testing the effect of a new drug) must also be given serious consideration.
Choosing the appropriate statistical technique

After you have selected an acceptable level of statistical significance to test your hypotheses,
the next step is to decide on the appropriate method to test the hypotheses. The choice of the
Univariate techniques:
Testing a hypothesis on a single mean:
Mefr;c da la:
Nonmetric dala :
One sample I-test

Chi-square

Independent samples
Independent samples H!st
Meticdafa :
Chi-square
Nonmotllc data :
Mann-Whitney lJ.test
Rela ted samples
Metic data
Nonmetric data :
Paired samples /-lest

Chi-square
Wilcoxon
McNemar

Metric data:
Nonmetric data:
One-way analysis of variance

Chi-square
Multivariate techniques:
Dna metric dapenden! variable
Analysis of variance and covariance

Multiple regression analysis
Conjoint analysis
One nonme/ric dependent variable
Discriminant analysis
logistic regression
More than one metric dependent variable
Multivariate analysis of
Canonical correlation
variance
figu re 12.1: Overview of univaria te and multivariate statistical techniques.

appropriate statistica l techn ique largely depends on the number of (independent and
dependen t) variables you are examining and the scale of measurement (metric or non metric)
of your variable(s). Other aspects that playa role are whether the assumptions of parametric
tests are mel and the si7.e of your sample.
Univaria te slatistica /lec1llliql jes are used when you want to examine two-variable relation ~
ships. For instance, if you want to examine the effect of gender on the number of candy bars
that students eat per week, univariate statistics are appropriate. If you, 0 11 the other hand, are
interested in the relationships between many variables, such as in the Excelsior Enterprises
case, mll/tivnriaie statist ical techlliques are required . The appropriate univa riate or multivariate
test largeJy depends on the measurement scale you have used, as Figure 12.1 illustrates.
Chi-sq uare analysis was discussed in the previous chapter. This chapter wiU discuss
U,e other techniques listed in Figure 12.1. Note that some techniques are d iscussed more
elaborately than others. A detailed discussion of all these techniques is beyond the scope of
this book.
Testing a hypothesis about a single mean

The one sample t -test is used to test the hypothesis that the mean of the population from which
a sample is drawn is equal to a comparison standard. Assume that you have read that the
average student stud ies 32 hours a week. From what you have observed so far, you think that
students from your university (the population from which your sample will be drawn) study
more. Therefore, you ask twenty class mates how long they study in an average week. The
average study time per week tums out to be 36.2 hours, 4 hours and 12 minutes more than the
stu dy time of students in general. The question is: is this a coincidence?
In the above example, the sample of students from your university differs from the typical
student. What you want to know, however, is whether your fellow students come from a
different population than the rest of the students. In other words, did you select a group of
motivated students by chance? Or is there a " true" difference between students from your
university and students in general?
In this example the null hypothesis is:
Ha: Tile. I1Ilmber of study hours of stude/1ts from ollr university is equal to the number of study
hours of students ill general.
The a lternate hy pothesis is:
H ,: The lIum!Jer of study II01Irs of students from Ollr university differs fro'" the number 0/ study
hours of students in general.
The way to decide whether there is a significant difference between students from your
university and students in general depends on three aspects: the value of the sample mean (36.2
hours); the value of the comparison standard (32 hours); and the degree of u ncertainty concerning
how weU the sample mean represents the population mean (the standard error of the sample mean).
Along these lines, the following formula is used to compute the t-value:
X-I'
t,, _1 =
s/.,fii.
Assume thai the observed standard deviation is 8. Hence, the t-statistic becomes:
t=
36.2 - 32
8/.;20
2.438
Having calculated the t-statistic, we can now compare the t-value with a standard table of
I-values w ith 11 - 1 d egrees of freedom to determine whether the t-statistic reaches the
threshold of statistical significance. When the t-statistic is larger than the appropriate table
value, the null hypothesis (no significant difference) is rejected.
Our I-statistic (2.438) is larger than the appropriate table value 0.729). TIlis means that the
difference between 36.2 and 32 is statistically significant. The null hypothesis must thus be
rejected: there is a significant d ifference in study time between students from our university
and students in general.
How does this work in SPSS?
Under the Analyze menu, choose Compa re Means, then One-Sample T Test. Move
the dependent variable into the "Test Variable(s)" box. Type in the value you wish to
compare your sample to in the box called "Test Value."
..
..
......
".
---.-- -
'
. ..'..
.... . ..
,'
......
..
......
.....
.. . ....
"
..,.'.
,...
,.
,.
,..
"
,.'.
....''...
'.....
.
. ..-'. ,.
,.'.
.
..
'
,
.
'
.
,. . ... .." ,.,. ',. ... '.
,. ,. ,.
. .. ..
'
.
..',.. ..,.. ... ,..,.,... .''. .,. ..',.. ..,..,. .''..

.. ,. .. ..
..
.. ... ..'. '. ..
" .
..
..
.. ... ,.'. ,. ... ,.
. .. ... . .. ..'. .. .."
.... ..... .. .. .. ,.
..
.. .'.
"
"

We can also do a (paired samples) t-test to examine the differences in the same group before
and a fter a treatment. For example, would a group of employees perform better after
undergoing training than they did before? In this case, there would be two observations
for each employee, one before the training and one after the training. We wouJd use a paired
samples t-test to test the null hypothesis that the average of the d ifferences between the before
and after measure is zero.
Under the Ana lyze menu, choose Compare Means, then Paired-Samples T Test.
Move each o f the two variables whose means you want to compare to the "Paired
Variables" list.
.. '. ,.. ,. ,.,.,.

,.'. . ,.'. ,.,.
,.
,. . ,.'. ,.
,.,. ,.'.
-.... '. .. ,.,.,.,.
".".".". ,.,.,.'. ,.,.,'... ,-,-,_.
,' -'
.. ,.,. , _ .
.. ,. '.. '. ,.',..
""
- '. ,.''.. ..
>.
>.
,
,>
>
>.
'a
a
a
>.
>.
'
,'
,
>
'
>.
>.
>.
>.
>.
111111 la,
,->
>.
>.
,.
'. ''..
',.. '.
'.>.
>.
>.
>.
'''...
,.
-- -
'. '.
,.
'.
>.
,.>.
>.
>.
>.
>.
>.
>.
>.
>.
,.,.>.
,.'.
,.,.
,. ,.,. ,.,.,.
,.,. ,.
,.'. ,.''.. ,.'.

,.
...,.
,.
,. ,.
'
.
'
.
.. ,. ,.
,. ,.
,.
,.
,.
,.
'
.
.
,.
''.. ,'.. .. '. ,.,.,.,. ,.,.'. ,'..
'. ..'. ,.
'0
>
>.
>.
>..
>
0
0
>
'
>
'
~_ J~
>.
>.
~'_'I
,,.
,
,.
,.>. ,.
~
~
,.
~
~
>.
>.
>.
>.
>.
'0
>.
>
,.
'''...
,.
,.
,.
''..
,.,.
,.
,,..
>.
>.
>
>.
>.
>.
>.
EXAMPLE
A university professor was interested in the effect of her teaching program on the
performance of her students. For this reason, ten students were given a math test in the
first week of the semester and their scores were recorded. Subsequently, the students
were given an equivalent test during the last week of the semester. The professor now
wants to know whether the students' math scores have increased.
Table 12.1 depiCts the scores of students on the math test in the first and in the last
week of the semester.
TABLE 12.1 Math scores of ten students in the first and last
week ofthe semester.
Math scores
Student
Score first week
Score last week
Difference
+ 20
75
55
......................................................................................................................
65
80
+ 15
2
...............................................
.................................................................................
75
+5
70
3
................................................................................................................................
+5
60
55
4
...............................................................................................................................
45
+5
40 ..........................................................................
5
.....................................................
- 5
55
60
6
..........................................
...........................
...........................................................
75
- 5
80
7
...............................................................................................................................
70
+ 35
35
8
.....................................
.. .........................
.. ..............................................................
1
75
55 .. ........................................................................
9
.......................................
.............
90
+ 30
60
10
..............................................................................................................................
+ 20
Average score
57.5
70
22.5
To find out if there is a significant difference in math scores we need a test statistic. The
test statistic is the average differencej sdifference j .,fij.
In th is example we get: 22.5j 13.79412j J1i) = 5.158.
H aving calculated the t-statistic, we can now compare the I-val ue with a standard
table of I-values with n - 1 degrees of freedom to determine whether the t-statistic
reaches the threshold of statistical Significance. Again, when the I-statistic is larger than
the appropriate table va lue, the null hypothesis (no significant difference) is rejected.
Our t-sta tistic is larger than the appropriate table value (1.83). This means that the
difference between 70 and 57.5 is statistically significant. The null hypothesis must thus
be rejected: there is a significant increase in ma th score.
The Wilcoxo n sig ned-rank test is a non parametric test for examining significant differences between two related samples or repeated measurements on a single sample. It is used as
an alternative to a paired samples t-test when the population cannot be assumed to be
normally distributed.

Under the Analyze menu, choose Nonparametric Tests, then Two Related Samples.
Move the variables you want to compare into the "Test Pairs" box. Select Wilcoxon from
the Test Type group and click OK.
.
--- -'. '-'... -..'-.. ''... -... --. ... - -.-.-. -.-.-'. - -. ..'.
.... ..'"'. ,..
.
.
~ ,
'. '.
.
m
-
LJ0
,
.
..
..
,.
. .. '.
-
,
'
.
. .. ,,
,-,
....... . ..,. ,. '.
, _.
-.
,.
'" -. . '.
,i
f
::
.
..,. ,-.
'
.
'
.
,
.
.
.. ..,.,'.. ,.-'.. '-.. ,.,.,. ,.,.-. ,. ,. .. ,'"... .. ....'" ..".
.
,. '. -.
,.'. ,.
..
.
,.
,'.
.
'
,.
.. ... . . -
,.,
, ,.

.,
'"
_.
a
'
'.
'
0 __
,,-
'
LJ!e . ....
'
'"
"
'"
'"
'
.s
'"
,'"
'"
'
'
G '
's
.s
'"
"'"
"
'
'
'
'
'
'
'"
McNemar's test is a non parametric method used on nominal data. It assesses the
significance of the difference between two dependent sa mples when the va riable of interest
is dichotomous. It is used primarily in before-after studies to test for an experimental effect.
In the following example, a researcher wants to determine whether the use of a new training
method (called CA RE) has an effect on the performance of athletes. Counts of individual athletes
are given in Table 12.2. The performance (average/good) before the treatment (the new training
method) is given in the columns (244 athletes delivered an average perfonnance before they
trained with the CARE method, whereas 134 athletes delivered a good perfom,ance before they
adopted th is method). You can find the performance after the treatment (average/good) in the
rows (190 athletes delivered an average performance after using the new training method; the
number of athletes that delivered. a good performance increased to 188).
The cells of Table 12.2 can be represented by the letters n, b, c, and d. The totals across rows
and columns are marginal totals (a + b, c + d~a + c, and b + d). The grand total is represented by
II, as shown in Table 12.3.
McNemar's test isa rather straightforward technique to test marginal homogeneity. Mnrgillni
homogeneity refers to equality (or the lack of a significant difference) between one or more of the
TABLE 12.2 Performance of athletes before and after new

training method.
Before
average
After
good
totals
average
good
totals
112
132
244
78
56
134
190
188
378
"'TABLE 12.3 A more abstract representation ofTable 1 2 .2 .

Before
average
After
"-
good
totals
average
good
totals
b
d
b+ d
a+b
c+ d
c
a +c
n./
marginal row totals and the correspond ing marginal column totals. [n this example, marginal
homogeneity implies that the row totals are equal to the corresponding column totals, or
c + d = b+d.
Ma rginal homogeneity would mean there was no effect of the treatment. In this case it
wou ld mean that the new training method would not affect the performance o f athletes.
The McNemar test uses the x2 distribution, based on the formula: (I b - c[ _ 1)2) I (b + c) .
l is a s tatistic with 1 degree of freedom (# rows - 1 x # columns - 1). The marginal frequencies
are I/ot homogeneous iJ the X2 result is significant at p < 0.05.
The 'x? value in this example is: (178 - 1321- 1)2)/(78 + 132) = 532 /210 = 13.376.
The table of the dis tribution of ch i-square, with 1 degree of freedom, reveals that the
difference between samples is significant: at the 0.05 level: the critical value of chi-square is
3.841. Since 13.376 com puted for the example above exceeds this valu e, the difference between
samples is significant. Hence, we can conclude that the new training method has a positive
effect on the performance of athletes.
Note that if band / or e are s mall (b + e < 20) then X2 is not approximated by theChi-square
d istribution. lns tead a sign test should be used.

Under the Analyze menu, choose Nonparametric Tests, then Two Related Samples.
Move the variables you want to compare into the "Test Pairs" box. Select McNemar
from the 'l'est Type" group and click OK.
..". ,.,.,. ,.,.. .... ,.,..... ,.'...'.... .....

I. ,. '--.
".
. .'. -- -y
.~
..
''...
..',.,,...
..'"'..
',.'..
,.
".
,.
,.
,.
,'
,.
.----
o.
0 __
e
-
"
...
..
..
. .. .,..
'... ..'.. ''. ..,.
'.... . '. ..'..
.. . ..,..
. ,
',... ,..'.. ,.''.. ,.,..
,. ,. , ..
. ,.,.. '. ,....
,.,..'.
,'. ,. ,'.. ,.,.'. ,.,.
,.
,.'. '. ,.'. ,.
,. ...'.
,. ,. ,.
.....,. ............
,. ,. ,.
,.
,." ,.,., ,.
,.
,.
,.
,.
--- -,, ,.,. ,.
~
'
[;r! - ! - I ~
,,.
,.
,.
r-1
,.'
,.
,
,
,. ,. ,
". '.. .--...
.,.
',.
,.
--
..
-- ,.'',... "..",. ,.''.. ,.','.. .....,.

'. '.
,.
',.
.
,.
..
.
,.
,. ,. ..
,.
,. .. '.
n
ow
.....I....
(IJI _ M ...
"
"
'
'
.... Ic::!
..,.'. ,..
...,.,. ''..
,.
'
,.'
,.
,.
,
'
'
'
"
"
'
,.
,.
'
Testing hypotheses about two unrelated means

There are many instances when we are interested to know whether two groups are different from
each other on a particular interva l-scaled or ratio-scaled variable of interest. For example, would
men and women press their case for the introduction of flextime at the workplace to the same
extent, or would their needs be different? Do MBAs perform better in organizational settings
than business students with only a bachelor's degree? Do individuals in urban areas have a
different investment pattern for their savings than those in semi-urban areas? Do CPAs perform
better than non-CPAs in accounting tasks? To find answers to such questions, an indepe ndent
samples t-tcst is carried out to see if there are any significant differences in the means for two
groups in the variable of interest. That is, a nominal variable that is split into two subgroups (for
example, smokers and nonsmokers; employees in the marketing department and those in the
accounting department; younger and older employees) is tested to see if there is a significant
mean difference between the two split groups on a dependent variable, w hich is measured on an
interval or ratio scale (for instance, extent of well-being; pay; or comprehension level).

Under the Analyze menu, choose Compare Means, then Independent Samples T
Test. Move the dependent variable into the "Test Variable(s)" box. Move the independent variable (that is, the variable whose values define the two groups) into the
"Grouping Va riable" box. Click "Define Groups" and specify how the groups are
defined (for instance 0 and 1 or 1 and 2).
.Ii@
.'. '.
.. '.
.. ....'.
.
..
....-.... ..,. ....,.
....... ,..... ....,.
..... ..
.... ....,. ..''..
,.
,.
'
,,
'
""
'
'
'"
'"
,.
'
,.
'. ,.,. '.. ,.,.'. ''.. ,.'. ,.

',..
'. '. '.
,.'. ,.
,.,. ,.
,
,
,.
,-,.,_.
.'1' ~ ~:
'
G
O
=
'...
Br
,.
.. ..,. -.... .... .... ,.....

,.
,... '.. ..,.
..
..
...... .. ...... .. .... .... ..

..
.... ..
,....,.
,.
,.
)~
,.
I- -
,.
'. ,... ,.,''"...

'. ... ',.,.... ,.
.. ..... ....
. ..
..
..,. ..,.... ,... ,...
,.. '... ,. ..
.... . .. ..'.
....'. ..,.,.'. ..... ......
... ..... . ,...
,.
,... ,.. ,. .
,. ,.,,.
,
'
'
'
'
,.
'
'
'
'
'
'
'
'

Whereas the (independent samples) (-test indicates whether or not there is a significant mean
difference in a dependent variable between two groups, an analysis of variance (ANOVA)
helps to examine the significant mean differences among more than two groups on an interval
or ratio-scaled dependent variable. For example, is there a significant difference in the amount
of 5<1Ies by the following four groups of salespersons: those who are sent to training schools;
those who are given on-the-job training during field trips; those who have been tutored by the
sales manager; and those who have had none of the above? Or is the rate of promotion
significantly d ifferent for those who have assigned mentors, choose their OWll mentors, and
have no mentors in the organizational system?
The results of ANOVA show whether or not the means of the various groups a re
significantly different from one another, as indicated by the F sta tistic. The F s tatistic s hows
whether two sample variances differ from each other or are from the same population. The F
d is tribution is a probability distribution of sample variances and the family of distributions
changes w ith changes in the sample size. Details of the F s tatistic may be seen in Table IVat the
end of the book.
I-Iow does this work in SPSS?

Under the Analyze menu, choose Compare Means, then One-Way ANOVA. Move
the dependent variable into the "Dependent List". Move the Ind ependent variable (that
is, the variable whose values define the groups) into the "Factor" box. Click OK.
!' !iii
,.
.
".".
".
".
".,.
..
,,.
,.
~
o. ,.
..
,. ,.
,.o..
,. ,.
..
..
-,.,. '.o.. ..,.'". '"..o. o.. .... ... o.. ..,". ,.,. . ..
.. ..'. ,... ...." ... '..... ..'.. ..,.''.. ....'.
..
,..
',.,.. . '.
,.
''.. '.....
~
,.
,.
,.
~
'
'
'
o
o
'
'
-..'. ~ Ik
om
'-
,.
~
o
"
o.
'. ..
'. ,.'.
..,.
..

..
~
'
'.
1110
--
'
'w
,-
.-
~.
,.
'
'.
/II
..',... ..,.
w
~G[ I ~~~~~~~~,~
E ~ ~
..
. .
'
Elfi:.
1-
'P
".
L~:](E:L!:iJ(=:1LJ ae-___
~ _
'.
1/11
""
,.
''''
'/11
>111
...
' '''
./11
!J:II
1/11
" "
IJ')
'.
!J:II
'<11
""
. ...
<
...
.""
t'"
~_
0o_
__
'115
IJ>
..,.
. ........
'..
'.
,
...
~:
;:.... ::.., <::.. ::
UII "0"_"'-____-' ,.
1/11 0 _ ...
'
::Ql--.__ ..-
II.
1!1f
'111
'
,/I>
'l1li
!J:II
' /If
1m
' /11
' /11
UII -
'I"
f:- ------,
~--
'/II ~ ~
....'..,..
When significant mean differences among the groups are in d ica ted by the F statistic, there
is no way of knowing from the ANOVA results alone where they lie; that is, whether the
significant difference is between Groups A and B, or between Band C. or A and C, and so on. It
is therefore unwise to use multiple i-tests, taking two groups at a time, because the greater the
number of i-tests done, the lower the confidence we can place o n results. For example, three ttests d one simultaneously decrease the confidence level from 95% to 86% (0.95)3. However,
several tests, such as Schefte's test, Duncan Multiple Range lest, Tukey's test, and StudentNewman-Keul's test are available and can be used. as appropriate, to detect w here exactly the
mean differences lie.
Regression analysis
Simple regression an alysis is used in a situation where one independent variable is hypothesized to affect one dependent variable. For instance, assume that we propose that the
propensity to buy a product depends only on the perceived quality of thal product. I In
this case we would have to gather information on perceived quality and propensity to buy a
product. We could then plot the data to obtain some first ideas on the relationship between
these variables.
From Figure 12.2 we can see that there is a linear relationship between perceived qua lity and
propensity to buy the product. We can model this linear relationship by a I~st squares fllnctio1l.
A simp le line"r regression equation represents a straight line. Indeed, to summarize the
relationship between perceived quality and propensity to buy, we ca n drtlw a straight line
through the data points, as in Figure 12.3
We can also express this relationship in an equation:
The parameters f30 and fh are called regression coefficients. They are the iJltercept (f30) and
the slope (.8t) of the straight line relating propensity to buy (Y) to perceived quality (XI)' The
slope call be interpreted as the number of units by which propensity to buy would increase if
perceived quality increased by a unit. The error term denotes the error in prediction or the
difference between the estimated propensity to buy and the actual propensity to buy.
In this example the intercept (Po) was not Significant whereas the slope (/31) was. The
unstanda rdized regression coefficient f31 was 0.832. This means that if perceived quality is
rated 2 (on a five-poin t sca le), the estimated propensity to buy is 1.664. On the other hand, if
perceived qua li ty is rated 4 (on a five-point scale), the estimated propensity to buy is 3.328.
I In reality, any effort to model the effect of perceived quality 011 propensity to buy a product without
careful attention to other factors that affect propensity to buy would cause a serious statistical problem
("omitted variables bias").
5,00
,>
4,00
>
3,00
<
!l
2,00
1,00 L,~_ _~O_ _ _~_ _ _~_ _~-.J
1,00
2,00
3,00
Percei ved Quality
4,00
5,00
Figu re 12.2: Scatter plot of perceived quality versus propensity to buy.

o
5,00
2,00
1,00
A Sq liner = 0,519
o
~----~--~----~--~~
5,00
1,00
2,00
3,00
4,00
Perceived Qu ality
Figure 12.3: Regression of propensity to buy on perceived quality.

The coefficient of determination, R2, provides information about the good ness of fit of the
regression model: it is a statistical measure of how well the regression line approximates the
real data points. R2 is the percentage of variance in the d ependent variable that is explained by
the variation in the independ ent variable. If R2 is near to 1, most of the variation in the
dependent variable can be expla ined by the regression model. In other words, the regression
model fits the data welL On the other hand, if RZ is near to 0, most of the data variation cannot
be ex plained by the regression model In this case, the regression model fits the data pooriy. In
the aforementioned example, R2 fo r the model is 0.519. This means that almost 52% of the
variance in propensity to buy is explained by variance in perceived quality.

The basic idea of mulliple regression analysis is similar to that of simple regression
ana lysis. Only in this case, we use more than one independent variable to explain varia nce in
the dependent variable. Multiple regression analysis is a multivariate tedmique that is used
very often in business research. Thestarting point of multiple regression analysis is, of cou rse,
the conceptual model (and the hypotheses derived from that model) that the researcher has
developed in an earlier stage of the research process.
Multiple regression analysis provides a means of objectively assessing the degree and the
cTh1racter of the relationship between the independent variables and thedependent variable: the
regression coeffidents indicate the relative importance of each of the independent variables in

Under the Analyze menu, choose Regression, then Linear. Move the depend ent
vflriabl e into the. "Dependent" box. Move the independent variables into the " lndependent(s)" list and dick OK.
"
"
-.
".".
".
..
..".
..-.,.
....
..
-.
. ... ..
-'. -,.'. '. .. ..'. -,. -'. ..,.,. .. ''.. ,.,'". ,.'. ,.,.'.
...... .. '.- '. G'.

'F-!. '....,.. ,..,.'.. ..'..'..
_
. -".. = ~ .. '. '. '.
'
.
,.
,. ,. .'
.
..
..
.
..
fJr='
~
.
'
'. ..'. ''.. ..'..
''... ,.''.. ---- G r -''... ..'. '.. ..'.
..--... -,..-. ---- L-Jr
Gr'
.
'
.
'. ..,.'.
.
,
.
,.
.
..
.
..
'
,.
,.
.
'. .
.'. ''.. ..'.. ''.... '.. '... .. ....' ''... ,'... '..'... ,.''... ,.,'..
'. .
~
,
'
~
~
,.
,.
"
,.
'
,.
"-
I-I
,.
,.
r(EJ
,.
'
J
~~
'
'
'
'
'
'
,.
'
,,
'
'
'
,.
,.
,.
,.
'
'
..::=
,'
'
'
,.
,'
'
'
,.
'm
'
"
,.
,.
the prediction of the dependent variable. For example, suppose that a resea rcher believes that
the variance in performance can be explained by four independent variables, A, B, C, and D (say.
pay, task difficulty, supervisory support, and organizational culture). When these variables are
joinUy regressed against the dependent variable in an effort to explain the variance in it, the sizes
of the individual regression coefficients indicate how much an increase of one unit in the
independent variable would affect the dependent variable, assuming that all the other independent variables remain unchanged. What's more, the individual correlations between the
independent variables and the d ependent variable collapse into what is called a multiple r or
multiple correlation coefficient. The square of multiple r, R-square, or R2 as it is commonly
known, is the amount of variance explained in the dependent variable by the predictors.
Standardized regression coefficients

S ta nd ardi zed reg ress ion coeffi cients or beta coeffi cients are the estimates resulting from a
multiple regression analysis performed on variables that have been standardized (a process
whereby the variables are transformed into variables with a mean of 0 and a standard
deviation of 1). This is usually done to allow the researcher to compare the relative effects of
independent variables on the dependent variable, when the independent variables are
measured in different units of measurement (for example. income measured in dollars
and household size measured in number of individuals).
Regression with dummy variables

A d ummy variable is a variable that has two or more distinct levels, which are coded 0 or 1,
Dummy variables a llow us to use nominal or ordinal variables as independent variables to

explain, understand, or predict the dependent variable.
Suppose that we are interested in the relationship between work shift and job satisfaction .
In this case, the variable "work shift", w hich has three categories (see the Excelsior Enterprises
case), would have to be coded in te.rms of two dummy va riables, since one of the three
categories should serve as the reference category. This might be done as shown in Table 12.4.
Note that the third shift serves as the reference category.
/ TABLE 12.4
Recoding work shift into dummy codes.
Work shift
Original code
DummyD!
DummyD2
First shift
Second shift
Next, the dummy variables D\ and D2 have to be induded in a regression model. This
would look like this:
In this example workers from the third shift ha ve been selected as the reference ca tegory.
For this reason, this category has not been included. in the regression equation. For workers in
the third shift, DJ and D2 assume a value of 0, and the regression equation thus becomes:
Yi = /JO+ ei
For workers in the first shift the equation becomes:
The coefficient PI is the difference in predicted job satisfaction for workers in the first shift,
as compared to workers in the third shift. The coefficient fJz has the same interpretation. Note
that any of the three shifts could have been used as a reference category.
Now do Exercises 12.1 and 12.2
EXERCISE
Provide the equation for workers in the second shift.
EXERCISE
Use the data of the Excelsior Enterprises case to estimate the effect of work shift on job
satisfaction.
Multicollinearity
Multicollinearity is an often encoun tered statistical phenomenon in which two or marc
independent variables in a multiple regression model are highly correlated. in its most severe
case (if the correlation between two independent variables is equaJ to 1 or -1) multicollinearity makes the estimation of the regression coefficients impossible. In all other cases it
makes the estimates of the regression coefficients unreliable.
The simplest and most obvious way to detect multicollineari ty is to check the correlation
matrix for the independent variables. The presence of high correlations (most people consider
correlations of 0.70 and above high) is a ftrst sign of sizeable multicollinearity. However, when
mu lticollinearity is the result of complex relationships among several independent variables,
it may not be revealed by this approach. More common measures for identifying
multicollinearity are therefore the tolerance value and the variance infIatioll faclor (VIF - the
inverse of the tolerance valu e). These measures indicate the degree to which one independent
variable is expla ined by the other independent variables. A common cutoff value is a tolerance
value of O.ID, which corresponds to a VIF of 10.
Under the Analyze menu, choose Regression, then Linear. Move the dependent
variable into the "Depend ent" box. Move the independent variables into the "Independent(s)" list. Select "Statistics" by clicking the button on the right-hand side. Select
"CoIHnearity diagnostics" and dick continue. Then dick OK.
I'.
"
""
'-...
,'.
.,.,.'...
...
,,'.
.
,,.
.
..
,.
,.
,.
,'
,
,
,.,.
,.
..''... ',,,-
. EI
'. ''... . ....,. ,'..

.
,..
,.'.
'
.
'
.
,.
,.
'
.
~ ,. ,.,. .. ',.,..
,'"'. .."'. ..''..
'
.
'
'
.
~r ',.',....
'"
I.
I.
-'.' =
.. ,.,... "-"'-'
,.
G..
-"."" . --,- E1r -- =
..". ..-.. ,-"--,-,-,-- E1rGr..-. --..''.. ',.,.'.. ,. ,.,... ,......_..--._ I~
.... . --..,. .... ..',.. ,.... ,.... ,... ... _.,
. ' .. . .
'
,.
i_I
,.
"w
'
"
"
I.
"
6-
I~
~_
0> _ _
0
~-~
0.-_
0.-_ _
'
J ~ D
..
,OJ
"
'
'
'
'
'
.~
..
'
F _ .........
i
( c
0 __ _
__ 0 0_
i ......
'
....
..,.
'
,.
'
Note that multicollinearity is /lot a serious problem if the purpose of the study is to predict
or forecast future values of the dependent variable, because even though the estimations of the
regression coefficients may be unstable, multicollinearity does not affect the reliability of the
foreca st. However, if the objective of the study is to reliably estimate the individual regression
coefficients, muJticollinearity is a problem . In this case, we may use one or more of the
fo llowing methods to reduce it:
Reduce the set of independent variables to a set that are not collinear (note that this may lead
to omitted variable bias, which is also a serious problem).
Use more sophisticated ways to analyze the data, such as ridge regression.
Crea te a new variable that is a composite of the highly correlated variables.
Testing moderation using regression analysis: interaction effects

Earlier in this book we described a moderating variable as a va riable that modifies the origina l
relationship between a n independent variable and the dependent voriable. This means that the
effect of one variable (XI) o n Y depend s o n the value of another variable; the m oderating variable
(X 2). Such illteractions arc included as the product of two variables in a regression model.
Su ppose that we have developed the foUowing hypothesis:
H J: Tile students' judgme"t of tile II/liversity's libmry is affected by tlte students' judgment of the
compufers
;11
tile libmry.
Now suppose that we also believe that, even though this relationship will hold for all
students, it will be nonetheless contingent on computer ownership. That is, we believe that the
relationship between the judgment of computers in the library and the judgment of the libra ry
is affected by computer ownership (indeed com puter ownership is a dummy variuble).
Therefore, we hypothesize that:
J-I}: Tile relatiol1slrip betwcclI tile judgment of tire libmry and judgmcllt of complIfers i" tire library
is moderated by computer ownership.
The reJationship between the judgment of the library and judgment of computers in the
library can be modelled as follows:
(1)
We have also hypothesized that the effect of XI on Y depends on X2 This can be modelled
as follows:
(2)
Adding the second equation into the fITS ! one leads to U,e following model:
(3)
Model (3) sta tes that the slope o f model (1) is a function of variable X2 Although this model
allows us to test moderation, the following model is better :
(4)
You may have noticed that model (4) includes a direct effect of Xl on Y. This .. !lows u s to
dLfferenti .. te between pure moderatioll and qllasi modem /iol' , (compare Sharma, Durand and
Gur-Arie, 1981) explained next.
If YI =- 0, And Y2 i- 0, X2 is not a moderator but simply an independent predictor variable. If
YI 1:- 0, Xl is a moderator. Model (4) allows us to differentiate between pure moderators and
quasi moderators as fo llows: if Yl t- 0, and Y2 = O,X2 is a pure moderator; that is, X2 moderates
the relationship between Xl and Y, but it has no direct effect on Y.Jf Yr #- 0, and Y2 1: 0, X2 is a
quasi moderator; that is, X2 moderates the relationship between

direct effect on Y.
Suppose that da ta analysis leads to the following model:
Y; = 4.3 + OAX Ii ~ O.OIX2; ~ O.2(X li
Xl
and Y, but it also has a
x Xu)
(5)
o.
/30 "# 0, Yo # 0, Yl i- 0, and Y2 =

Based on the results we can conclude that 0) the judgment of computers in the library has a
positive effect on the judgment of the library and (2) that this effect is moderated by computer
possession: If a student has no computer (X2; = 0) the marginal effect is 0.4; if the student has a
computer (Xli = 1) the marginal effect is 0.2. Thus, computer possession has a negative
moderating effect.
Now do Exercises 12.3, 12.4, and 12.5
where
Why could it be important to differentiate between quasi moderators and pure

moderators?
EXERCISE
Is computer possession a pure moderator or a quasi moderator? Explain.
EXERCISE
Provide a logical explanation for the negative moderating effect of computer possession.
The previous example shows that dummy variables can be used to allow the effect of one
independent variable on the dependent variable to change depending on the valueof the dummy
variable. It is, of course, also possible to include metric variables as moderators in a modeL In
such cases, the procedure to test moderation is exactly the same as in the previous example.
In this section, we have explained how moderation can be tested with regression analysis.
Note that it is also possible to test mediation with regression analysis. We will explain this later
on ill this chapter using the Excelsior Enterprises data .
Other multivariate tests and analyses

We will now briefly describe five other multivariate techniques: discriminant analysis, logistic
regression, conjoint analysis, multivariate analysis of variance (MANOVA), and canonical
correlations.

Under the Analyze menu, choose Classify, then Oiscriminant. Move the dependent
variable into the "Grouping" box. Move the independent va riables into the " lndependent(s)" list and click OK.
...
...
...
....
..'.
'
- ....
..
..'...
....
...
....
...
'
..
...
..
..
..
.
..
.. .... .. .. ..... ........ '..... '... .......
.... .... ..... ..... ...... .. ..... ...
.. . ... ...... ....... ,..... ........ ....
..... .. .. ... ..
'"
"
"
>.
>
>.
'
>.
>
>.
'
>
...
...
. ...
..... ...
. ...
. ,.
.. .....
..
'
"
>
'
.
.
.. ..
....... ...
.. ....
.. ....
. .'.

. .
..
.. ..
~
'
"
"
'
'
'
"
'
"
>
Discriminant analysis
Discriminant analysis helps to identify the independen t variables that d iscriminate a
nominally scaled dependent variable of interest - say those who are high on a variable
from those who arc Iowan it. The linear combination of independ ent variables indicates the
discriminating fun ction showing the large difference that exists in the two group means. In
other words, the independent variables measured on an interva l or ratio scale discriminate the
grou ps of interest to the study.
logistic regression
Logistic regression isa lso used when the dependent variable is nonmetric. However, when the
dependent variable has only two groups, logistic regression is often preferred, because it does
not fa ce the strict assumptions that discriminant analysis faces and because it is very similar to
regression analysis. Although regression analysis and logistic regression analysis are very
different from a statistical point of view, they are very much alike from a practical viewpoint
Both methods proouce prediction equations and in both cases the regression coefficients
measure the predictive capability of the independent variables. Thus, logistic regression allows
the researcher to predict a discrete outcome, such as "will purchase the product/ will not
purchase the product" , from a set of variables that may be continuous, discrete, or dichotomous.
Under the Analyze menu, choose Regression, then Binary Logistic. Move the
dependent varia ble into the "Dependent" box. Move the independent variables into
the "Covariate(s)" list and click OK.
..
.
.
.
.
.
..
.
.
..
.. . .. . ....
.. .. .... .. .
..
.... ... ....... .... .. -.. .. .....
.. ... ,.... .. '.... .. .... ......
.... ..,. ..,. .... .... '.... ... .".
.... . '. ..... ... .....
..
..
-- ....'.
"
.. ..
...
...
....
"
...
"
....
"
'
'
'

'.
,
'
'
"
'
'
'
"
Conjoint analysis
Conjoint analys is is a statistical tech nique tllat is used in many fields induding marketing,
product management, and operations research. Conjoint analysis requ ires participan ts to
make a series of trade-offs. In marketing, conjoint analysis is used to understand how
consumers develop preferences for products or services. Conjoint analysis is built on the
idea that consumers evaluate the value o f a product or service by combining the value that is
provided by each attribute. An attribu te is a general feature of a product or service, such as
price, product quality, or deJiveryspced. Each attribute has specific levels. For instance, for the
attribute " price", levels might be 249, 279, and 319. Along these lines, we might d escribe a
mobile telephone using the attributes " memory", "battery life", "camera" and "price". A
specific mobile phone would be d escribed as follows: memory, 12 Mbytes; battery life, 24
hours; camera 5 mega pixels; and price 249.
Conjoint analysis takes these attribute and level descriptions of products and services and
uses them by asking participants to make a series of choices between different products. For
instance:
WOIlld you choose ,,/zone X or phone Y?
Memory
Battery liCe
Camera
Price
Phone X
12 Mbytes
24 hou rs
5 megapixels
249
Phon e Y
16 Mbytes
12 hours
8 megapixcls
319
By asking for enough choices, it is possible to establish how important each of the levels is
relative to the others; this is known as the utility of the level. Conjoint ana lysis is traditionally
carried out with some form of multiple regression analysis. More recently, the use of
hiera rchical Bayesian analysis has become widespread to develop models of individual
cons umer decision-making behavior.
Two-way ANOVA
Two-way ANOV A can be used to examine the effect of two nonmetric independent variables
on a single metric dependent variable. Note that, in this context, an independen t variable is
often referred to as a factor and this is why a d esign that aims to examine the effect of two
nonmetric independent variables on a single metric d ependent va riable is often caUed a
factorial design. The factorial design is very popular in the social sciences. Two-way ANOVA
enables us to examine main effects (the effects of the independent variables o n the d ependen t
variable) but also interaction effects that exist between the ind ependent va riables (or factors) .
An interactio n effect exists when the effect of one independent variable (or o ne factor) on the
d ependent variable depends on the level of the other independent variable (factor).
MANOVA
MANOVA is sim ilar to ANOVA, with the difference that ANOVA tests the mean differences
of more than two groups on one d ependent variable, whereas MANOVA tests mean differen ces among groups across severnl dependent variables simultaneously, by using s ums of
squares and cross-product matrices. Just as multiple I-tests would bias the results (as
explained earLier), multiple ANOVA tests, using one dependent variable at a time, would also
bias the results, since the dependent variables a.re likely to be interrelated. MAI"OVA
circumvents this bias by simultaneously testing all the dependent variables, canceJJjng out
the effects of any intercorrelations among them.
In MANOVA tests, the independent variable is measured on a nominal scale and the
dependent va riables on an interval or ratio sca le.
The null hypothesis tested by MANOVA is :
Ho: J.d = 1'2 = j.l.3 ... 1'11.
The alternate hypothesis is:
Canonical correlation
Ca non ica l correla tion examines the relationship between two or more dependent variables
and several independent variables; for example, the correlation between a set of job beha viors
(such as engrossment in work, timely completion of work, and number of absences) and their
influence on a set of performance factors (such as quality of work, the output, and rate of
rejects). The focus here is on delinea ting the job behavior profiles associated w ith performance
that result in high-quality production.
In sum, several univariate, bivariate, and multivariate techniques are available to analyze
sample da ta. Using these techniques allows us to generalize the resul ts obtained from the
sample to the population at large. It is, of course, very important to use the correct statistical
techniqu e to test the hypotheses of your study. We have explained earlier in this chapter that
the choice of the appropriate statistical technique depends on the number of variables you are
exa mining. on the scale of measurement of your variable(s), on whether the assumptions of
parametric tests are met, and on the size of your sample.
Excelsior Enterprises - hypothesis testing

The foll owing hypotheses were generated for this study, as stated earlier:
H t : Job enric/zmellt has a negQtive effect all inten/ioll to leave.
H2: Perceived equity lIas a lIegative effect all intentio" to leave.

H3: Burnout has Q positive effect all illtentioll to leave.
H 4 : Job satisfaction mediates the relatiollship between job ellrichmm f, perceived eqllity, and blln/out
on intention to leave.

Under the Analyze menu, choose General Linear Model, then Multivariate. Move the
dependent va riables into the "Dependent" box. Move the independtmt variables into
the "Fixed Factor(s)" list. Select any of the dialog boxes by clicking the buttons on the
righthand side.
..-... ---......... ...

>.
- ->
>.
>
-. ---
>
>.
o.
".
"'
'M
o.
"
.
>
>.
>.
>.
..,.
>
,.
'.
>
,.
,.
>.
'
>.
>
'
>.
>.
,.
OR
>
>.
>.
>
,
,
,.
,.
>
>
>.
>.
>.
>
>
.'. ..
..- ....
. .... ..,. . -.. '.
>.
,.
>.
>
>.
>.
>.
,.
>.
>.
,.
">
>.
>.
,.
>.
..'... .. ..
'.. ,.'. ,.
''.. ,.- .. ..'. . '. '... ...
.. .. .. .. . .. . ..... ...>.. ....
,.
,..
,....
.
,.
,.
.
..
.. .. .... .. . .... ...
>.
~-
o.
""
'..
'
,
'"
~-
- -. -''.. ,.- --,.''.. '. .. '. ,. ,.'. ,...

>.
,.
'">
,.
'"
>
'"
,.
,.
>
>
>.
>.
>.
>.
>.
' M
>.
">.
,.
>.
>.
w
>.
>.
>.
,.
>.

"
,.
>.
' ,.
,.
'"
'>.
,. >.>M
,.
, ,.

,. >>. ,.
,.'
'
'"
..
"
,
,.
"
'"
"
"
>.
,.
>.
>.
>.
>.
>
,.
>
'M
,.
,.
,
,.
'
>
'
'
These hypotheses call for the use of mediated regression analysis (a ll the variables are
measu red at an interval level). The results of these tests and their interpretation are d iscussed
below.
To test the hypothesis that job satisfaction mediates the effect of perceived justice,
burnout, and job enrichment o n employees' intentions to leave, three regression models
were estimated, following Baron and Kenny (1986): model 1, regressi ng job satisfaction on
perceived justice, burnout, and job enrichment; model 2, regressing intention to leave on
perceived justice, burnout, and job enrichment; and model 3, regressing employees' intentions to leave on perceived justice, burnout, job enrichment, and job satisfaction. Separate
coefficients for each equation were estimated and tested . To establish mediation the
following cond itions must hold: perceived justice, burnout, and job enrichment must affect
job sa tisfaction in model 1; perceived justice, burnout, and job enrichment must be shown to
impact employees' intention to leave in model 2; and perceived justice, burnout, and job
enrichment must affect employees' intention to leave in model 3 (while controlli ng for job
sa tisfa ctio n). If these conditions all hold in the predicted direction, then the effect of
perceived justice, burnout, and job enrich men t must be less in model 3 than in model 2.
Perfect mediation holds if perceived justice, burnout, and job enrichment have no effect
when the efiect of job 5<'ltisfaction is controlled for (model 3).
The R square of the first regression model (model I ) was 0.165 and the model was statistically
significant. In this model, perceived justice and burnout were sign ificant predictors of job
satisfaction. whereas }Db enrichment was not. The R square of the second regression model
(model 2) was 0.393 and this model was also statistically significant. Model 2. as depicted in
Table 12.5, indic;'lted that perceived justice, bumout, and job enrichment affected employees'
intention to leave. The R square of the last model (model 3) was 0.487 and again the model was
statistically significant. Perceived justice and burnout were significant predictors of intention to
leave when job satisfaction was controlled for. The effect of perceived justice and burnout on
intention to leave was less in the third model than in the second model. Thus, all conditions for
perfect med iation were mel for perceived justice and burnout. lob enrichment was related to
neither job satisfa ction nor intention to leave (when job satisfaction was controlJed for).
We performed follow-up analyses to test for the indirect effect of perceived justice and
burnout on intention to leave via job satisfaction. Baron and Kenny (1986) provide an
approximate significance test for the indirect effect of perceived justices and burnout on
employees' intentions. The path from , respectively, perceived justice and burnout to job
satisfaction is denoted n and its standard error S~; the path from job sa tisfaction to intention to
leave is denoted b and its standard error Sq. The product ab is the estimate of the indirect effect
of perceived justice and burnout on employees' intentions to leave. TIle sta ndard error of nb is:
The ratio ab/SE.." can be interpreted as a z statistic. Indirect effects of perceived justice
< 0.05) and burnout (2.985, p < 0.01) were both significant.
(2.175, p
Overall interpretation and recommendations to the president

From the results of the hypothesis tests, it is dear thai perceived justice and burnout affect
employees' intentions to leave through job satisfaction. From the descriptive resu lts, we have
already seen that the mean on perceived equity is rather low (2.32 on a fiv e-point scale), as is
the mean on experienced burnout (2.55). Hence, if retention of employees is a top priority for
the president, it is important to fonnulate poUcies and practices that help to enhance justice
perceptions and to further reduce or prevent burnout. Whatever is done to improve employees' perceptions of justice and to either prevent or to reduce burnout will improve job
sa tisfaction and thus help employees to think less about leaving and induce them to stay.
The president" would therefore be well advised to rectify inequities in the system if they
really exist or to dec,r misperceptions of inequities if this is actua lly the case. Preventing or
TABLE 12.5 Mediation analysis.

Step 1 model, witll job salis/acljoll as tile dependent variable
Coefficient
3.575
Constant
0.302
Perccived justice
-0.538
Burnout
0.120
Job enrichment
p-value
0.000
O.D1S
0.000
0.332
..... .... .. .............. .......... ...................... . .... .. ...... ..... .... .. ..... . .......................... .................. ... .
Model fit = 0.165
.. ............ .... ... .. .. ........ .. ............... ...... .. ..... .. ....... ...... .. ................. ......................... .... ... ... . ...
Step 2 model, witll inlenti01l to leave (lTU as the dependent variable

Constant
Perceived justice
Burnout
Job enrichment
Coefficient
1.840
- 0.307
0.643
- 0.165
p-value
0.000
0.000
0.000
0.039
Mod el fit = 0.393
............................................................... ........... .. .................... .......... .. .. .. .................. .....

Step 3 model, including job satisfaction as an independent variable alld willi TTL as Ihe dependent
variable
p-value
Coefficient
0.000
2.744
Constant
0.003
0.231
Perceived justice
0.000
0.507
Burnout
0.068
- 0.134
Job enrichment
0.000
- 0253
Job satisfaction
.... ................................. .............. .. ...... .. ...... ..... ... ... ... .. .. . ..... ................... . ............... ....... ..
fit =... .....
0.487
...Model
. .............
.... ... ...... ... ... . ...... .. .. ..... .. ................. ..... ............... ............ ........ .... . ... .... .. .. .
Note. Parameters are unstandardized regression weights, with significa nce levels of tva.lues. Twosided tests. N = 174.
remedying burnout may require both individual and organizational change. To solve the
problem of burnollt, the president may need to change the work environment a nd educate
workers on how to adapt and cope better to the stresses of the workplace.
The fact that only 50% of the variance in "intention to leave" was explained by the four
independent variables considered in this study still leaves 50% unexplained . In other words,
there are other add itional variables that are important in explaining ITL that have not been
cons idered in this study. Further research might be necessary to explain morc of the variance
in ITL, if the president wishes to pursue the matter further.
Now do Exercises 12.6, 12.7, and 12.8
EXERCISE
Discuss: what do the unstandardized coefficients and their p-values in the first model
imply? In other words, what happens to job sa tisfa ction if perceived justice, burnout,
and job enrichment change by one unit?
EXERCISE
Provide the tolerance values and the variance inflation factors for all the independent
variables in model t. Discuss: do we have a multicollinearity problem?
EXERCISE
Does work shift moderate the relationship between job satisfa ction and intention to
leave for Excelsior Enterprises employees?
We have now seen how different hypotheses ca n be tested by applying the appropriate
statistical tests in data analysis. Based on the interpretation of the results, the research report is
then written, making necessary recommendations and discussing the pros and cons of each,
together with cost-benefi t analysis.
Data warehousing, data mining, and operations research

Data warehousing and data mining are aspects of information systems. Most companies are now
aware of the benefits of creating a data warehouse that serves as the central repository of all data
collected from disparate sources including those pertaining to the compa ny's finance, manufacturing, sales, and the like. Thedata warehouse is usually built from data collected through the
d ifferent departments of the enterprise and can be accessed through various online analytical
processing (OlAP) tools to support decision making. Data warehousing ca n be described as the
process of extracting, transferrin g, and integrating data spread across multiple external databases and even operating systems, with a view to facilitating analysis and decision making.
Complementa ry to the fUllctions of data warehousing, many companies resort to data
mining as a strategic tool for reaching new levels of business intelligence. Using algOrithms to
analyze data in a meaningful way, data mining more effectively leverages the data warehouse
by identifying hidden relations and patterns in the data stored in it For instance, data mining
makes it possible to trace retail sales patterns by ZIP code and the time of day of the purchases,
so that optimal stocking of items becomes possible. Su ch " mined" data pertaining to the vital
areas of the organization call be easily accessed and used for different purposes. For example,
staffing for different times of the day can be planned, as can the number of check-out counters
that need to be kept open in retail stores, to ensure efficiency as well as effectiveness. We can
see that data mining helps to clarify the underlying patterns in different business activities,
which in tum faci litates decision making.
Operations research (OR) or management science (MS) is another sophisticated tool used to
simplify and thus clarify certain types of complex problem that lend themselves to quantification. OR uses higher mathematics and statistics to identify, analyze, and ultimately solve
intricate problems of great complexity fa ced by the manager. It provides an additional tool to the
manager by using quantification to supplement personal judgment. Areas of problem solving
that easily lend themselves to OR include those relating to inventory, queuing, sequencing,
routing, and search and replacement. OR helps to minimize costs and increase efficiency by
resorting to decision trees, lillear programming, network analysis, and mathematical models.
Other information systems such as management information systems (M IS), the decision
stlpport system, Ihe executive informatiolT system, and the expert system are good decision-making
aids, but not necessarily involved with data coUection and ana lyses in the strict sense.
In su m, a good information system collects, mines, and provides a wide range of pertinent
information relating to aspects of both the external and internal environments of the
organiza tion. By using the wide variety of tools and tedmiques available for solvin g problems
of differing magnitude, executives, managers, and others entrusted with responsibility for
results at various levels of the organization can find solutions to various concerns merely by
securing access to these data available in the system and analyzing them.
It should be ensured that the data in the information system are error-free and are frequenUy
updated. After all, decision making can only be as good as lhedata made ava ilable to managers.
Some software packages useful for data analysis

There is a wide variety of analytical software that may help you to ana lY4e your data. Based on
your specific needs, your research problem, and/or conceptual model you might consider the
following software packages:
LI SREL: from Scientific Software International;
MATLAB1/ll: from the MathWorks, Inc.;
SASfSTAT: fro m SAS Institute;
s pSS: Complex Samples. from SPSS [nc.;
Stata: from Stata Corporation.
LISREL is designed to estimate and test structu ra l equa tion models. Stnlctural equation
mod e1s are complex, statistical models of linear relationships among latent (unobserved)
variables and manifest (observed) variables. You ca n also use U SREL to carry out exploratory
factor analysis and confi rmatory factor analysis.
MA TLAB is a computer program that was originally designed to simplify the implementation of numerical linea r algebra routines. It is used to implement numerical algorithms for a
wide range of applications.
SAS is an integrated system of software products. capable of performing a bro.:,d range of
statistical analyses such as descriptive statistics, multivariate techniques, and time series
analyses. Because of its capabilities, it is used in many disciplines, including medical sciences,
biological sciences, social sciences, and education.
SPSS (Statistica l Package for the Social Sciences) is a d"t<l management and analysis
program designed to do statistical data analysis, including descriptive sta tistics such as plots,
frequencies, charts, and lists. as well as sophisticated inferential and multivariate statistical
procedures like analysis of variance (ANOVA), factor analysis, cluster analysis, and categoric<l l data analysis.
Stata is a general purpose statistical software package that supports various statistical and
econometric methods, graphics, and enhanced features for data manipulation, programming,
and matrix manipulation.
In this chapter we covered the procedwe for hypolhesis testing. We have discussed type I
errors, type U errors, and statistical power. We have observed various statistical analyses and
tests used to examine different hypotheses to answer research questions. We discussed the
use of dummy variables, multicollinearity, and moderated regression analysis. Through the
example of the research on Excelsior Enterprises, we observed hypothesis testing using
mediated regression analysis and learned how the computer results are interpreted.
1. What kinds of biases do you think could be minimized or avoided during the data
analysis stage of research?
2. When we collect data on the effects of treatment in experimental designs, which
statistical test is most appropriate to test the treatment effects?
3. A tax col'\SUltant wonders whether he should bemoreselectiveabout the class of clients he
serves so as to maximize his income. He usually deals with four categories of clients: the
very rich. rich, upper middle class, and middle class. He has records of each and every
client served, the taxes paid by them, and how much he has charged them. Since many
particulars in respect of lhe clients vary (number of dependants, business deductibles,
etc.), irrespective of theca tegol)' they belong to, he would like a n appropriate ana lysis to be
done to see which among the four categories of clientele he should choose to continue to
serve in the future. Wlwt killd a/al1alysis should be dOl1e irl tltis case and wily?
Now do Exercises 12.9 and 12.10
EXERCISE
Open the file "resmethassignmentl" (you created this file doing the exercise from the
previous chapter). Answer the following questions.
Is the exam grade significantly larger than 75?
Are there significant differences in the exam grade for men and women?
Is there a significant difference between the exam grad e and the paper grade?
Are there significant differences in the paper grade for the fOUT year groups?
Is the sample representative for the IQ level, for which it is known that 50% of the
popu lation has an IQ below 100, and 50% has an lQ of 100 or higher?
f. Obtain a correlation matrix for all relevant variables and discuss the results.
g. Do a multiple regression analysis to explain the variance in paper grades using the
independent variables of age, sex (dummy coded), and IQ, and interpret the
results.
a.
b.
c.
d.
e.
EXERCISE
Below are Tables 12A to 12D, summarizing the results of data analyses of research conducted
in a sales organization that operates in.50 differentdties of the country, and employs a total
sales force of about 500. The number of salespersons sampled for the study was 150.
a. Interpret the information contained in each of the tables in as much detail as possible.
b. Summarize the results for the CEO of the company.
c. Make recommendations based on your interpretation of the resu lts.
TABLE 12 A Means, standard deviations, minimum,

and maximum.
Variable
Sales (in 1000s of $)
Mean
Std. d eviation
Minimum
Maximum
75.1
8.6
45.2
97.3
...t:!~: .?~ .~?l.:~'p':!'~?~~........................ ~? .................~....................~................ ~~ ......... .

.. .~~p.';l.l.~~~?~~ .~~ ~ .~ ?~~~ ......................... ~'.~ ...............~:~ .................~:?~.............. ?'.~ ~ ......
...~::. ~?p.i.t.~ . !~~~~~ . ~~~ .~ ~~. ~!. ~~ ..... ~~'.~ .............~~: ~. ............. ~ .~:~.............. ??:?....... .
Advertisement (in 1000s of $)
10.3
5.2
6.1
15.7
/ TABLE 12 B Correlations among the variables.

Sales
Sales
Salespersons
Population
Income
Advertisement
1.0
. J:~~: .~~ .~.I.~P.7.~~~.....~:?? ...........~:~ ...................................................................... .

...~~f~.I.~ ~~~~ .................~:~~............~:~................ ~. :~..................................................
Income
, Ad . expenditure
0.56
0.21
0.11
1.0
0.68
0.16
0.36
0.23
All figures above 0. 15 are significant at

All figures above 0.35 are significant at
/ TABLE 12C
1.0
p = 0.05.
pS
0.001.
Results of oneway ANOVA: sales by level

of education.
Source of va riation
Sums of squares
df
Mean squares
Significance of F
...~~~~.~ . ~~?~ p.~..................~:? ...............~...........~. ~? ..........?:.~.............~:~~ ........... .
...~~!~~~ .~~~~. P.:... ..... .... .....~.~ :~ ............~ ~? ............ ~:~............................................ .

~o'al
552.5
150
TABLE 12 D Results ofregression analysis.

0.65924
...~~~~~p.~~. ~.............................
...~ .~q~~.~~...........................................?:~~.......................................................... .
.. .~~I~~.t.~ . ~. ~.9~~~~ .. ...........................?:~~~~.......................................................... .

Standard error
0.411 73
df
(5.144)
5.278
...Si~....................................................o:()()()...................... .. ... ................................. .

(collti/wed)
TABLE 12 0 (Continued)
...~~~!~p.~~. ~.........................................?:~.~~~......................................................... .
...~ .~~?~~............................................?:~~?~.......................................................... .
...~~!~~.~~. ~. ~.~.~~~ .............................?:~.~.......................................................... .

Standard error
0.41173
df
(5. 144)
5.278
...~.i~....................................................
?:0?!? ........................................................... .
V~.~~~.? 1.7 .............................................~~!~...................................................~.i~: .~ .. .

..'!:~~.~~.i.~.!? ~~. ~~ ~~.~~~~~~.....................?:~~.........................~: ?~ ...................~:.~~ ... .
..r:-!.~: .?~ .~~.~~~.P.~.~?~~......... .. .... .. ... .. .. .. ..?:?1.........................~:?~.....................~:.~.?~.
..~?p.~ .l.~~!~~ ........................................?:~.. .... .... .... ... .......?:?? ...................~:.~~? .... .
0.089
.. .~7~. ~~.~i.t.~ . ~~~.'?~~ ... ........... ..............?:~? .......................~ :~.~ ....... .
Advertisement
0.47
4.54
0.00001

Quantitative data analysis: Hypothesis testing techniques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantitative data analysis: Hypothesis testing techniques

Uploaded by

Copyright:

Available Formats

CHAPTER

Quantitative data analysis:

Type I errors, type II e rrors and statistical power

3. Describe how to choose the appropriate statistical technique to test hypotheses.

Type I errors, type II errors, and statistical power

Choosing the appropriate statistical technique

One sample I-test

Testing hypotheses about two related means

Paired samples /-lest

Testing hypotheses about several means

One-way analysis of variance

Analysis of variance and covariance

figu re 12.1: Overview of univaria te and multivariate statistical techniques.

Testing a hypothesis about a single mean

Testing hypotheses about two related means

.. '. ,.. ,. ,.,.,.

,.'. ,.''.. ,.'.

Score first week

Score last week

How does this work in SPSS?

TABLE 12.2 Performance of athletes before and after new

"'TABLE 12.3 A more abstract representation ofTable 1 2 .2 .

How does this work in SPSS?

..". ,.,.,. ,.,.. .... ,.,..... ,.'...'.... .....

". '.. .--...

-- ,.'',... "..",. ,.''.. ,.','.. .....,.

Testing hypotheses about two unrelated means

How does this work in SPSS?

'. ,.,. '.. ,.,.'. ''.. ,.'. ,.

.. ..,. -.... .... .... ,.....

...... .. ...... .. .... .... ..

'. ,... ,.,''"...

Testing hypotheses about several means

I-Iow does this work in SPSS?

;:.... ::.., <::.. ::

("omitted variables bias").

1,00 L,~_ _~O_ _ _~_ _ _~_ _~-.J

Figu re 12.2: Scatter plot of perceived quality versus propensity to buy.

Figure 12.3: Regression of propensity to buy on perceived quality.

variance in propensity to buy is explained by variance in perceived quality.

How does this work in SPSS?

...... .. '.- '. G'.

Standardized regression coefficients

Regression with dummy variables

Dummy variables a llow us to use nominal or ordinal variables as independent variables to

Recoding work shift into dummy codes.

'. ''... . ....,. ,'..

Testing moderation using regression analysis: interaction effects

quasi moderator; that is, X2 moderates the relationship between

Y; = 4.3 + OAX Ii ~ O.OIX2; ~ O.2(X li

and Y, but it also has a

/30 "# 0, Yo # 0, Yl i- 0, and Y2 =

Why could it be important to differentiate between quasi moderators and pure

Other multivariate tests and analyses

How does this work in SPSS?

WOIlld you choose ,,/zone X or phone Y?

The alternate hypothesis is:

Excelsior Enterprises - hypothesis testing

H2: Perceived equity lIas a lIegative effect all intentio" to leave.

How does this work in SPSS?

..-... ---......... ...

- -. -''.. ,.- --,.''.. '. .. '. ,. ,.'. ,...

Overall interpretation and recommendations to the president

. J:: . .~.I.~P.7.~~~.....~:?? ...........~:~ ...................................................................... .

...!~ .~~~~. P.:... ..... .... .....~.~ :~ ............~ ~? ............ ~:~............................................ .

.. .I.t.~ . ~. ~.9 .. ...........................?:.......................................................... .

...!.~~. ~. ~.~.~~~ .............................?:~.~.......................................................... .

V~.~.? 1.7 .............................................!~...................................................~.i~: .~ .. .