You are on page 1of 3

BOOK REVIEW

Psychometric Theory (3rd ed.)


by Jum Nunnally and Ira Bernstein
New York: McGraw-Hill, 1994, xxiv + 752 pp.

It is always sad when one of your idols turns out to have clay feet-even worse when the base substance
extends higher. Unfortunately, this was my experience in reading the third edition of Nunnallys classic
work as revised by Bernstein. Because the first edition (Nunnally, 1967) came out during my graduate
school career and the second (Nunnally, 1978) was useful in a decade of teaching, I looked forward to this
revision and had high expectations for it.
I object at a fundamental level to retaining authorship for revisions of works when the original author
was not part of the effort. This was true of the 1960 and 1972 Stanford-Binet tests, it has been true of
several of the more recent tests produced under Wechslers name [e.g., the Wechsler Preschool and Primary Scale of Intelligence-Revised (WPPSI-R; Wechsler, 1989) and the Wechsler Intelligence Scale for
Children-Third Edition (wIsc-m; Wechsler, 1991)], and it is true of this book, since Nunnally died in
1982. The publisher is probably to blame for trying to capitalize on name familiarity, but Nunnally should
not be held responsible for changes about which he had no say.
Two features characterize this book: a somewhat awkward use of language and an astounding number
of typographical and substantive errors (averaging approximately 1 per page). Ten Berge (1995) has provided a fairly exhaustive list of the errors in the equations, and the others are too numerous to deal with in
any detail. This material is difficult enough for most students when everything is correct. I can only echo
ten Berge: &dquo;Students should not be exposed to this book in the present form&dquo; (p. 313). Hopefully, McGrawHill did not produce too large a first printing and will (1) destroy existing copies and (2) make a major
effort to correct the mistakes before producing more. We need the content that this book uniquely covers,
but not as it has been abused in this revision.
The book retains the basic organization of the earlier editions. An introductory chapter discussing the
role of measurement in science and presenting some basic scaling issues is followed by a chapter on
traditional psychophysical methods. Both of these chapters, because they were not altered very much, are
reasonably clean (there is an unfortunate error in Equation 2-3).
Chapter 3 on validity has a decidedly classic ring, and it is here that the problems begin. There is no
mention of the work of Messick (1989) or Shepard (1993). The discussion of explication of constructs (pp.
104-107) comes from a radical behaviorist perspective that is no longer appropriate. On p. 108 we are
treated to the delightful idea that &dquo;... instruments sometimes are often used ....&dquo; We also find Campbell &
Fiske calling for &dquo;divergent validity&dquo; rather than discriminant validity. Basically, this chapter looks like it
is 25 years old and philosophically naive.
Things really fall apart in Chapter 4, which is the first heavily quantitative chapter and probably the
worst in the book. Because the material of this chapter is so fundamental to understanding what follows,
the weakness is particularly destructive. Many of the errors are simple typos that a careful proofreading
...

APPLIED PSYCHOLOGICAL MEASUREMENT


Vol. 19, No. 3, September 1995, pp. 303-305
©

Copyright 1995 Applied Psychological Measurement Inc.

0146-6216/95/030303-03$1.40

Downloaded from apm.sagepub.com at FLORIDA INTERNATIONAL UNIV on June 13, 2015

303

304
should have caught. For example, in Equation 4-7 (p. 121) the verbal description conflicts with the equation. Three lines later there is another critical error, and the entire paragraph on moments is poorly done.
Readers sometimes have trouble keeping track of what symbols mean; therefore, using the same letter
for different concepts should be avoided whenever possible. In a paragraph that introduces r as the Pearson
product-moment correlation, to use the same letter to symbolize the &dquo;rth moment&dquo; verges on the criminal.
A few pages later, after talking about r as a covariance of standard z scores, z is used to represent the
ordinate of the normal curve!
Problems continue on p. 130. Here we are told that &dquo;... r2 may be expressed as the proportion of variance
in Y (aY) that is accounted for by its linear relation to X (true variance, (y2....&dquo; The statement, of course, is
fine up to the parentheses, but the rest of the paragraph confuses true score variance from classical reliability theory with predictable variance-two quite different concepts. There also seems to be confusion between the standard error of estimate and the standard error of measurement in several places throughout
the book. Chapter 4 deteriorates further in the discussion of regression. Here the errors are so numerous
and the language so imprecise that even readers who are intimately familiar with the concepts and equations may have trouble.
Chapter 5 introduces linear combinations and multiple correlation. Aside from the usual quota of typos
and a few rather weak explanations of concepts (particularly &dquo;shrinkage&dquo; on p. 189), the major weakness
of this chapter is that it suffers from not including at least some definitions of matrix terminology. The term
&dquo;transpose&dquo; is used here and &dquo;inverse&dquo; occurs later without any definition or explanation. The same happens for matrix multiplication. The final failure in the chapter is the lack of any adequate mention of raw or
deviation score equations for multiple regression. The author states that &dquo;The b weights should not be used
when the predictors are in different units since they are highly sensitive to these often arbitrary scale
differences&dquo; (p. 192), but sometimes group differences in means or variances are important information,
particularly in cross-validation. Also, units are meaningful sometimes.
Chapter 6 presents the logic and equations for the classical theory of measurement error. Although the
chapter has the usual quantity of typos (approximately a dozen equations and numerous pieces of text are
flawed), the interaction of this chapter with Chapter 4 on the concept of true and error variance is what is most
distressing. (ye~, is substituted for aYX and ameas replaces ae. The reason for the inconsistency eludes me.
Chapter 7 on reliability is one of the places where this edition is an improvement. The introduction of
generalizability theory was needed and should have been in the second edition. The treatment generally is

adequate.
Nunnally

was always a strong advocate of homogeneous tests, and Chapter 8 on the construction of
conventional tests retains his point of view. Unfortunately, the discussion of defining a content domain and
constructing a test blueprint is weak and leans toward content homogeneity rather than faithfulness to
domain definition. There is also no discussion of improving items by analysis of distractors or rewriting.
The emphasis is on statistical criteria and throwing out weak items.
Chapter 9 on special problems in classical test theory presents an uneven mix of practical advice and
statistical rigor. One has the feeling the author does not know whether he is Lord and Novick or Gronlund
(1985). Chapter 9 is a relocation of the previous Chapter 16, which seems to be a receptacle for a variety of
unrelated topics that the author wanted to include. One topic that should be here but is not is test equating.
Chapter 10, Recent Developments in Test Theory, is a needed addition. Here item response theory (IRT),
differential item functioning, and tailored/computer adaptive testing are presented in a fairly straightforward
and understandable way. Although IRT-like tests have been around since 1926 (Thomdike, Bregman, Cobb,
& Woodyard, 1926), the idea did not catch on until the 1970s; however, it is here to stay. The author missed
an opportunity to use the same six-item example with both classical theory and IRT to show the differences.
In this edition, there are three chapters that discuss various issues in factor analysis. Chapters 11and 12

Downloaded from apm.sagepub.com at FLORIDA INTERNATIONAL UNIV on June 13, 2015

305

similar to those of the earlier editions, but chapter 13 introduces &dquo;confirmatory factor analysis&dquo; (CFA).
Space prohibits listing the errors in the these chapters, but a strong point is that this is one of the few texts
in which CFA is not equated with structural equation models. The author missed the chance to make the
distinction between purpose and method as clear as he might, but he is on the right track. Confirmation of
a theorized structure is a purpose that can be accomplished with a variety of analytic methods, including
restricted maximum likelihood factor analysis. That programs to perform structural equation fitting can be
used to test statistical hypotheses about structural fit does not make them any more &dquo;confirmatory&dquo; than
vanishing tetrads are. The author rightfully points out that there is a continuum in the degree of restriction
that can be placed on a factor model. If only he had deleted the reference to CFA.
Chapter 14 (previously chapter 12) contains a discussion of discriminant analysis and multidimensional
scaling. It is in this chapter that one feels the lack of matrix terminology most acutely, but previous editions
are

had the same weakness. There are fewer errors here and the addition of a discussion of the ALSCAL program is a good one.
The final chapter (chapter 15) is another loose conglomeration of topics that interest the author. The
main headings are categorical modeling, binary classification, and non-geometric and non-Euclidian models. Although these may interest some readers, many users of the previous editions will miss the coverage
of applications to the measurement of abilities, personalities, and sentiments that formed the final part of
those volumes.
To summarize, a well done revision of Nunnallys Psychometric Theory has been needed for several
years. Hopefully, we will not have to wait too much longer.
Robert M. Thorndike
Western Washington University
References

Gronlund, N. E. (1985). Measurement and evaluation in


teaching (5th ed.). New York: Macmillan.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.; pp. 13-103). New York:
ACE/Macmillan.
New York:
McGraw-Hill.
Nunnally, J. C. (1978). Psychometric theory (2nd ed.).
New York: McGraw-Hill.
Shepard, L. A. (1993). Evaluating test validity. Review
of Research in Education, 19, 405-450.
ten Berge, J. M. F. (1995). Review of Nunnally and

Nunnally, J. C. (1967). Psychometric theory.

Bernsteins Psychometric theory. Psychometrika, 60,


313-315.
Thorndike, E. L., Bregman, E. O., Cobb, M. V., &
Woodyard, E. (1926). The measurement of intelligence. New York: Teachers College Bureau of Publi-

cations.

Wechsler, D. (1989). Wechsler preschool and primary

of intelligence-Revised. San Antonio TX: Psychological Corporation.


Wechsler, D. (1991). Wechsler intelligence scale for chilscale

dren-Third edition. San Antonio TX:

Corporation.

Downloaded from apm.sagepub.com at FLORIDA INTERNATIONAL UNIV on June 13, 2015

Psychological

You might also like