Professional Documents
Culture Documents
Note from Section Editor James Comer: Professor Churchill has long been acknowledged as
one of the pioneers of measurement research in sales management. His willingness to initiate
this section with a portrayal of the key measurement issues is much appreciated. Thank you
Gil.
Manuscript Submissions: The reader who wishes to have a manuscript reviewed for possible
publication in this section should examine page ix of the Fall 1991 issue of the Journal of Per-
sonal Selling and Sales Management for details on section objectives andl guidelines or you can
send for guidelines to: Professor James M. Comer, University of Cincinnati, College of Business
Administration, Department of Marketing, Linder Hall (#145), Cincinnati, Ohio 45221.
The new section on "Scaling and Measurement" The various scientifie disciplines differ from one
is a welcome addition to the Journal of Personal another in a variety of ways ... One such [way] ...
Selling and Sales Management, in that the study of is according to the degree to wlrich theoretical
procedures or explanations are used as con-
personal selling and sales management shares a
trasted with correlational procedures or expla-
characteristic common to other areas of scientific nations ... The distinction here is between a sci-
inquiry. If knowledge and understanding of its ence that consists largely of statements describ-
phenomena are to advance, then its measurement ing the degree of relationship among more or
tools must also. 'I'he two go hand-in-hand. One less directly observable variables and a science
only has to think of the value of the telescope to that attempts (succe~ssfully) to derive, account
astronomy and the microscope to biology and chem- for, or explain these r~lationships from principles
istry to appreciate the relationship between knowl- that are not immediately given, but lie beyond
edge development and good measurement. As straight empirical knowledge. Although no sci-
ence is all correlational or entirely theoretical in
Torgerson suggested more than thirty years ago:
this sense, .. .it is clear that sciences do differ in
the degree to which they rely on one or the other
Gilbert A. Churchill, .Jrr., is the Arthur C. Nielsen, Jr., Chair of level of explanation. It may be also noted that
Marketing Research, University of Wisconsin-Madison, Univer-
sity ofWisconsin-Madison, 1155 Observatory Drive, Madison, WI Journal of Personal Selling & Sales Man.agement,
53706. Volume :XU, Number 2 (Spring 1992).
74 Journal of Personal Selling & Sales Management
Now the interesting thing about most psychologi- tency of repeated or equivalent measurements when
cal constructs is that we cannot rely on visual com- the measurements are made on the same object or
parisons to either confirm or refute a measure. We person. An example would be the use of an elastic
cannot see a salesp1erson's attitude, a rep's person- ruler to measure a man's height. It is unlikely that
ality characteristic, a salesperson's knowledge about on two successive measurements the observer would
a particular product, or other psychological charac- stretch the elastic ruler to the same degree of taut-
teristics such as intelligence, mental anxiety, or ness, and, therefore, the two measures would not
whatever. These characteristics are all part of the agree although the man's height had not changed.
representative's black box. Their magnitude must The distinction between systematic error and
be inferred from our measurements. Since we can- random error is critical because of the way the va-
not resort to a visual check on the accuracy of our lidity of measures is assessed. Validity is synony-
measures, we must rely on evaluating the proce- mous with accuracy or correctness. The validity of
dures used to determine the measure. Eye color is a measuring instrument is defined as "the extent to
certainly not height, but can we capture sales rep- which differences in scores on it reflect true differ-
resentatives' satisfaction with their job if we ences among individuals on the characteristic we
ask them directly how satisfied they are? Probably seek to measure, rather than constant or random
not, for reasons that will become obvious as we errors" (Selltiz, Wrightsman, and Cook 1976, p. 169).
continue our discussion. The ability to conclude When a measurement is valid, X0 =Xr, since there
that a salesperson's job satisfaction or any other is no error.
characteristic of the person has indeed been cap- The basic measurement problem is to develop
tured by the measurement depends on understand- measures in which the score we observe and record
ing measurement error and its assessment using actually represents the true score of the object on
evidence of the reliability and validity of the mea- the characteristic we are attempting to measure.
sure. This is much harder to do than to say. It is not
accomplished by simply making up a set of ques-
Classification and Assessment tions or statements to measure a salesperson's job
of Error satisfaction. Rather, the burden is on the researcher
to establish that the measure accurately captures
The ideal in measurement is to generate a score the characteristic of interest. The relationship be-
that reflects true differences in the characteristic tween measured score and true score is never es-
one is attempting to measure and nothing else. tablished unequivocally but is always inferred. The
What we in fact obtain, though, is something else. bases for such inferenees are two: (1) direct as-
A measurement, calli it X0 , for what is observed can sessment employing validity measures and (2) in-
be written as a function of several components: direct assessment via reliability measures (Peter
Xo=Xr+Xs+XR and Churchill 1986). Let us consider each of these
where sources of evidence in turn.
Xr represents the true score of the character-
istic being measured; Validity Measures
X8 represents systematic error; and
XR represents random error. As mentioned, a measuring instrument is valid to
The total error of a measurement is given by the the extent that differences in scores among objects
sum ofXg and~ and it is important to note that it reflect the objects' true differences on the charac-
has two components. Systematic error is also known teristic that the instrument tries to measure. We
as constant error, because it affects the measure- normally do not know the true score of an object
ment in a constant way. An example would be the with respect to a given characteristic. If we did
measurement of a man's height with a poorly cali- know it, there wouldl be no need to measure the
brated wooden yardstick. Random error is not object on that characteristic. What we do, therefore,
constant error but, rather, is due to transient as- is infer the validity of the measure by looking for
pects of the person or measurement situation. A evidence of its predictive, content, and construct
random error manifests itself in the lack of consis- validity.
76 Journal of Personal Selling & Sales Management
Predictive Validity spelling test. Further, the basis for your objection
probably would be the fact that all the words relate
The predictive approach to validation focuses on to the sport of football. Therefore, you could argue
the usefulness of the measuring instrument as a that an individual who is basically a very poor speller
predictor of some other characteristic or behavior of could do well on this test simply because he or she
the individual; it is thus sometimes called crite- is a football enthusiast. You would be right, of
rion-related validity. Predictive validity is ascer- course. A person with a basic capacity for spelling
tained by how well the measure predicts the crite- but with little interest in football might, in fact, do
rion, be it another characteristic or a specific be- worse on this spelling test than one with less native
havior. An example would be the Graduate Man- ability but a good deal more interest in football.
agement Admissions Test. The fact that this test is The test could be said to lack content validity, since
required by most of the major schools of business it does not properly sample the domain of all possible
attests to its predictive validity; it has proven to be words that could be used but is very selective in its
useful in predicting how well a student with a par- emphasis.
ticular score on the exam will do in an accredited The preceding example illustrates how content
MBA program. The test score is used to predict the validity is assessed, although not how it is estab-
criterion of performance. An example of an attitude lished. Content validity is sometimes known as
scale might be using scores that sales representa- "face validity" because it is assessed by examining
tives achieved on an instrument designed to assess the measure with an eye toward ascertaining the
their job satisfaction to predict who might quit. domain being sampled. If the included domain is
The attitude score would again be used to predict a decidedly different from the domain of the variable
behavior-the likelihood of quitting. Predictive as conceived, the measure is said to lack content
validity is determined strictly by the correlation validity. Theoretically, to capture a person's spell-
between the two measures; if the correlation is high, ing ability, we should ask the person to spell all the
the measure is said to have predictive validity. words in the language. The person who spelled the
Predictive validity is relatively easy to assess. It greatest number of words correctly would be said to
requires, to be sure, a reasonably valid measure of have the most spelling ability. This is a completely
the criterion with which the scores on the measur- unrealistic procedure. It would take several life-
ing instrument are to be compared. Given that times to complete. We, therefore, resort to sampling
such scores are available (for example, the grades the domain of the characteristic by constructing
the student actually achieves in an MBA program, spelling tests that consist of samples of all the pos-
the sales representative's quitting or not), all that sible words that could be used. Different samplings
the researcher needs to do is to establish the degree of items can produce different comparative perfor-
of relationship, usually in the form of some kind of mances by individuals, and consequently whether
correlation coefficient, between the scores on the a particular test assesses "true" spelling ability de-
measuring instrument and the criterion variable. pends on how well the test samples the domain of
Although easy to assess, predictive validity is rarely the characteristic. This is not only true for spelling
the most important kind of validity. We are often ability, but also holds for psychological character-
concerned with "what the measure in fact measures" istics in which we have an interest.
rather than simply whether it predicts accurately How can we ensure that a measure will possess
or not. content validity? We can never guarantee it because
it is partly a matter of judgement. We may feel
Content Validity quite comfortable with the items included in a
Content validity focuses on the adequacy with measure, for example, while a critic may argue that
which the domain of the characteristic is captured we have failed to sample from some relevant domain
by the measure. Consider, for example, the char- of the characteristic. Although we can never guar-
acteristic "spelling ability" and suppose that the antee the content validity of a measure, we can
following list of words was used to assess an severely diminish the objections of the critics. The
individual's spelling ability: quarterback, guard, key to content validity lies in evaluating the proce-
tackle, end, pass, fumble, punt, touchdown, ru~. dures that are used to develop the instrument
Now, you would probably take issue with this (Churchill1979).
reason is that coeffilcient alpha has a direct rela- using the Eviderwe
tionship to the most accepted and conceptually ap-
pealing measurement model, the domain sampling A logical question is how does the
model, which holds that the purpose of any par- nonmeasurement expert use the above information
ticular measurement is to estimate the score that to make judgments. The researcher's burden is
would be obtained if all the items in the domain were different depending on whether the measure is a
employed (Nunnally 1978, p. 194). The score that new one or one borrowed from another study.
any subject would obtain over the whole sample
domain is the person's true score Xfr
In practice, though, we do not use all of the items
New Measure
that could be used but rather only a sample of them. If the measure is new, then the burden on the
To the extent that the sample of items correlates researcher is to demonstrate to the scientific and
with true scores, it is good. According to the domain practitioner communiti,es that the measure has de-
sampling model then, a primary source of mea- sirable reliability and validity properties. One way
surement error is the inadequate sampling of the to do this is to follow a generally accepted paradigm
domain of relevant items. when developing the measure including: (1) defin-
Basic to the domain sampling model is the concept ing the construct and specifying its domain, (2)
of a very large correlation matrix showing all cor- generating items to sample the domain, (3) collect-
relations among the items in the domain. No single ing data and calculating the appropriate indexes by
item is likely to provide a perfect representation of which to assess the measure's reliability and validity
the concept just as no single word can be used to properties (Churchill1B79).
test for differences in subjects' spelling abilities and
no single question c:an measure a person's intelli-
gence. Rather, each item can be expected to have a Borrowed Measures
certain amount of distinctiveness or specificity even
though it relates to the concept. When borrowing measures from other studies,
The average correlation in this large matrix in- the researcher needs to at least look at the reliability
dicates the extent to which some common core exists and validity evidence that has been gathered in
in the items. The dispersion of correlations about support of the measure. The researcher also needs
the average indicates the extent to which items to weigh carefully that evidence. Many researchers
vary in sharing the common core. The key assump- borrow scales and simply assume that because the
tion in the domain sampling model is that all items, measure had attractive properties in other contexts
if they belong to the domain of the concept, have an that it can be used as i:s. They fail to realize that
equal amount of common core. This implies that scales developed in other situations may not display
the average correlation in each column of the hy- the same desirable properties in the new context.
pothetical matrix is the same and, in turn, equals For example, scales developed in other disciplines
the average correlation in the whole matrix. That or in classroom-like situations with student subjects
is, if all the items in a measure are drawn from the may not translate welll at all to corporate survey
domain of a single construct, responses to those environments involving salespeople. When they
items should be internally consistent or highly cor- borrow scales, researchers need to check their
related. Low inter-item correlations, on the other properties by calculating the appropriate reliabil-
hand, indicate that some items are not drawn from ity and validity coefficients. Only by doing so can
the appropriate domain and are producing error they avoid executing GIGO (garbage in, garbage
and unreliability. Coefficient alpha provides a out) routines and determine whether the conceptual
summary measure of the internal homogeneity that structures relating the eonstructs make sense given
exists among a set of items. Moreover, its square the empirical evidence . The new section on mea-
root equals the estimated correlation of the test with surement can only help in improving our under-
true scores (Nunnally 1978, p. 214). Thus, it is, or standing of the measures with which sales man-
should be, routinely calculated. agement practitioners and researchers work.
80 Journal of Personal Selling & Sales Management
A qJM;;' ,.,;;;AiM II JZU t CJ 1M At&.$ ,$ . t l, 4fti.l$AJI& .A$ UiftA i I ikt I(ZU!Mi Qi,U& t 04 bit, M..-AMM41MA14 RW,QMZP Att 5a us t i a
!