Policy Capturing As An Approach To Understanding and Improving Performance Appraisal: A Review of The Literature

'Academy of Management Review, t983, Vot. 8, No. 4, 640-649.
Policy Capturing as an Approach to

Understanding and Improving Performance
Appraisal: A Review of the Literature
CHARLES J. HOBSON
Indiana University Northwest
FREDERICK W. GIBSON
U. S. Air Force Occupational Measurement Center
This paper reviews the research dealing with the use of policy capturing
as a technique to understand and improve the performance appraisal process. Major methodological problems are emphasized, along with directions
for future research. A number of practical implications and applications
are discussed.
represents or summarizes the available information;
and (c) the use of multiple regression analysis to
calculate the extent to which an individual rater's
overall ratings are predictable, given scores on the
separate performance dimensions, and to compute
the relative importance of each single dimension in
determining overall ratings. The statistical equation
resulting from the regression analysis defines the
"captured rating policy" for each individual rater.
This captured policy is taken to represent an explicit
objective description of the way in which the rater
combines and weights dimensional information in arriving at overall ratings.
The captured rating policy thus is the crux of the
entire methodologywith proponents arguing that
actual rating behavior can be accurately described in
this manner. The implicit assumption is that captured
rating policies are construct valid representations of
"true" rating policies.
Researchers have compared captured policies, as
objective definitions of actual rating behavior, with
various subjective policy estimates, with the intent
of indexing the error contained in the subjective
estimates. For instance, Zedeck and Kafry (1977),
Hobson et al. (1981), and Stumpf and London (1981)
also compared captured policies to raters' subjective
estimates of their own policyi.e., estimates of how
important in relative terms each individual performance dimension was in determining their overall
A small but growing literature has accumulated

dealing with the use of policy capturing as a way to
address rater decision making in performance appraisal (Anderson, 1977; Hobson, Mendel, & Gibson, 1981; Naylor & Wherry, 1%5; Stumpf& London, 1981; Taylor & Wilsted, 1974; Zedeck & Kafry,
1977). Research on policy capturing offers a number
of potentially important practical insights in understanding and improving performance appraisal. For
instance, the policy capturing methodology can provide a framework for: (a) examining the unique information processing behavior of raters; (b) comparing stated rating policies with those actually used; and
(c) training raters in the consistent use and application of a given policy. The purpose of this paper is
to review the available research with the intent of
identifying major methodological problems, proposing new research directions, and drawing relevant
practical implications.
Policy capturing is a general procedure designed
to describe statistically the unique information processing strategies of individual raters. It has been used
in a variety of settings (Slovic & Lichtenstein, 1971;
Slovic, Fischoff, & Lichtenstein, 1977). As used in
performance appraisal, it typically involves: (a) the
presentation to raters of a series of performance profiles consisting of scores on the major dimensions of
performance; (b) instructions to raters to review each
profile and then assign an overall rating that best
640
ratings. Hobson et al. (1981) also compared a psychology department chairperson's captured policy
with subjective estimates of that policy obtained from
19 faculty members. In all cases the goal was to quantify the extent to which subjective perceptions were
different from the objective captured policy.
In addition to computing captured rating policies
and comparing them to subjective policy estimates,
researchers have utilized various statistical clustering
procedures to group captured rating policies in terms
of their similarity (Hobson et al., 1981; Naylor &
Wherry, 1965; Stumpf & London, 1981; Zedeck &
Kafry, 1977). The purpose has been to determine if
there were systematic differences in rating orientations in a given group of raters.
When considering the policy capturing literature,
four general conclusions can be reasonably drawn.
(Table 1 provides summary information about the
six studies being reviewed.)
First, the general linear model has worked well in
describing rater policies, as evidenced by the consistently high rater Rh obtained when regressing
overall ratings on scores for the separate dimensions
of performance (see Table 1). These results are not
surprising in light of the decision tasks given to raters.
These tasks generally possessed characteristics similar
to those asserted by Dawes and Corrigan (1974) to
be necessary for adequate fit of the linear model. This
finding simply adds further confirmation to other
well-documented results obtained in a variety of decision making settings (Slovic & Lichtenstein, 1971;
Slovic et al., 1977).
Second, there is consistent evidence indicating that
raters generally possess poor insight into their own
rating policies (Hobson et al., 1981; Taylor &
Wilsted, 1974; Zedeck & Kafry, 1977). Raters exhibit
a tendency to perceive a more equal weighting of performance dimensions than actually is the case
tending to underestimate the importance of major
dimensions and overestimate the importance of
minor ones.
Third, Hobson et al. (1981) demonstrated that subordinate subjective estimates of their supervisor's
captured rating policy reflect the same basic errors.
That is, subordinates tended to believe that their
supervisor used a more equal weighting of performance dimensions in making appraisals than acutaliy
was the case.
Finally, in three of four studies utilizing rater
clustering procedures (Hobson et al., 1981; Naylor
& Wherry, 1965; Stumpf & London, 1981; Zedeck

& Kafry, 1977), researchers found statistically
distinct, relatively homogeneous rater subgroups
possessing different rating orientations. In the Zedeck
and Kafry study, although two rater clusters emerged
that were statistically different, the differences were
not found to be on important dimensions of performance. Thus, in general, it has not been possible to
use a single rating policy to describe the decision making behavior of all raters.
In each of the above four conclusions, the statistically derived captured rating policy plays an important role. As mentioned earlier, it is the crux of the
entire methodology. In addition to representing an
individual's rating decisions, it is used for subjective
comparison and clustering purposes. Thus, this entire literature revolves around the captured rating
policy and what it means.
In general, there is agreement that captured policies
do not precisely describe the manner in which raters
cognitively combine performance information in
making overall ratings. Hoffman (1960) originally
suggested that the linear model and multiple regression analysis could be used to represent human judgment "paramorphically." Dawes and Corrigan emphasized that Hoffman "did not mean to imply that
the actual psychological process involved in making
the judgment was that of weighting various variables
but rather that this process could be stimulated by
such a weighting" (1974, p. 100). To this, Oskamp
added that "a high multiple R means that a judge's
decisions may be reproduced by a linear, additive
combination of predictor variables; however, it does
not necessarily mean that the judge himself used a
linear, additive method of combining variables"
(1967, p. 414).
Thus, ideally, captured policies are intended to
model actual performance rating decisions paramorphically, not to model the cognitive or psychological
processes involved. To demonstrate the construct
validity or meaning of captured policies one should
provide evidence as to the accuracy with which they
represent or model acutal rating decisions.
In the present authors' opinion, serious questions
exist concerning the construct validity or exact meaning of captured policies as defined by current methodologies. These questions arise largely as a function
of the artifical rating tasks typically presented to
raters, which differ drastically from actual rating
settingsthus seriously calling into question the con641
P7?
<*>. II
*1
-o
"5
'c
'<A O
I i S IB
o 2
n
BO U
* : II fe
* " <='S-S>.fe
fc2
1,
^-
r>l
en
rn
iS
.2 a
S5^
<N
mil Isis
> Ul
35
iiij
^1
^1
II
II
sa
1= O
J5
u E :; u
8
2
II
3
III if
If
>. u
XE
52
It
X Q.S
X a.
.e
o
o
4.E&
Si
3 ^.
E u 2'fe.E
e c ^ 5
m t: a> c 3
w=-a 3.D
o-0 O
III t^ll
ill
1/5-JC-
642
struct validity of the resultant captured policies. In

other words, captured policies represent dubious approximations of actual rating decisions because of
the existence of a number of critically important
boundary variables.
asked U.S. Air Force cadet officers to rate their eight

immediate subordinates in addition to 17 other subordinates with whom they had varying degrees of familiarity. Although this is an admirable effort to insure some degree of realism in the rating task, the
comparability of the two sets of ratings, given the
differential familiarity, is doubtful (Borman, 1974;
Cascio, 1978).
Important questions remain concerning the extent
to which the evaluation of hypothetical ratee profiles
corresponds in any meaningful manner to the rating
of actual subordinates. Information available in
rating hypothetical profiles, in addition to being artificial, generally is deficient in terms of volume,
specificity, and familiarity when compared to that
used in rating actual subordinates. Given the absence
of contrary comparative research results, it seems
reasonable to hypothesize that rater policies captured
on the basis of evaluations of hypothetical ratee profiles would differ substantially from actual policies.
Ratee Performance Format. A number of different
formats (see Table 1) have been used to present ratee
performance information to subjects, ranging from
simple numerical profiles (Naylor &Wherry, 1965),
to paragraph descriptions (Zedeck & Kafry, 1977),
to graphic profiles with behaviorally defined anchor
points (Hobson et al., 1981).
Two important issues arise concerning the format
of ratee performance information. First, Anderson
(1977) has presented evidence indicating that
numerically defined ratee profiles are evaluated more
consistently than verbal ones in paragraph form. This
suggests that the informational format can have an
important infiuence on (a) the way in which raters
process performance information and (b) the content
of captured policies. That captured policies have been
shown to be dependent on veu-iations in informational
format casts additional doubt as to their construct
validity; that is, what actually is being measured and
represented"true" rating policies or informational
format of some combination of these and other
factors?
Perhaps a more important issue is the extent to
which the various informational formats that have
been utilized conform to those used in actual rating
settings. It is highly doubtful that raters collect and
evaluate performance information in the rational,
systematic manner that characterizes the policy capturing research (Feldman, 1981). Insofar as this is the
case, the lack of correspondence represents another
Methodological Issues/Problems
Major features of the policy capturing methodology detract from its usefulness as a way to
describe and represent actual rating decisions and
seriously undermine the construct validity of captured
rater policies.
j
Research Purpose
In all of the policy capturing studies, directions to
subjects clearly indicated that ratings were for
research purposes. As Anderson (1977) has noted,
the context in which ratings are made seems to affect the rating process and the results obtained. Landy and Farr (1980), in their review of the performance
rating literature, concluded that the evidence suggests
that there are systematic differences in ratings collected for administrative versus research purposes.
Specifically, ratings for administrative reasons seem
to be more lenient than those made for research purposes. Such findings raise important questions about
the construct vaUdity of rater policies captured when
ratings are for research purposes only. In the absence
of direct comparisons, the limited available research
casts some degree of doubt as to adequacy of captured policies as paramorphic representations of actual rating decisions.
Ratee Performance Profile Issues
Four major aspects of the way in which stimulus

information regarding ratee performance has been
presented to subjects create potentially serious threats
to the construct validity of captured rater policies.
Hypothetical Versus Real Ratee Profiles. In all but
one instance (Taylor & Wilsted, 1974), policy capturing researchers have presented hypothetical ratee
performance profiles to subjects. The primary advantage of hypothetical profiles is that it allows for the
generation of a large number of ratee profiles with
specific dimensional interrelationships. Subjects
generally have been instructed to review the dimensional information for each hypothetical ratee and
then formulate an overall rating.
In the one exception, Taylor and Wilsted (1974)
643
concern that jeopardizes the construct validity of captured rating policies.

Performance Dimension Intercorrelations. The
correlations among the individual dimensions of performance pose an especially difficult problem to
researchers using the policy capturing methodology.
As Landy and Farr pointed out, in spite of concerted
research attempts to "identify and define independent aspects of performance, the intercorrelations
among ratings on these dimensions are often high"
(1980, p. 91). If significant multicollinearity exists,
major problems arise when using multiple regression
to define and study captured rating policies. These
problems include: (a) computed regression coefficients become increasingly unstable, making meaningful interpretation as indicators of dimensional importance extremely difficult; (b) as shown by Dudycha and Naylor (1966) and Schenck and Naylor
(1968), increases in dimensional intercorrelations lead
directly to increases in rater Rh\ and (c) the results
obtained from standard policy clustering procedures
are of questionable accuracy (Dudycha, 1970).
Most policy capturing researchers have chosen to
deal with these problems by creating artificially orthogonal dimensions for use in representing ratee performance (Anderson, 1977; Hobson et al., 1981;
Stumpf & London, 1981; Zedeck & Kafry, 1977).
One notable exception is the study by Naylor and
Wherry (1965), in which dimensional intercorrelations were determined empiriceilly by factor analyzing dimensional relevance ratings collected from job
experts.
The major argument advanced in favor of using
orthogonal performance dimensions is to facilitate
and simplify the interpretation of regression weights
associated with each dimension (Hobson et al., 1981;
Zedeck & Kafry, 1977). When the dimensions are orthogonal, the squared regression weight represents
the percentage of variance in overall ratings accounted for by a given dimension. Thus, rater policies can
be constructed in a very straightforward manner in
terms of the squared regression coefficients.
However, this computational and interpretational
simplicity is gained at the expense of accuracy in
representing the true performance domain. When
dimensional intercorrelations exist in reality, statistically capturing rating policies using orthogonal dimensions produces results with a highly questionable
correspondence to actual policies. As Dudycha and
Naylor (1966) have pointed out, to the extent that
dimensional intercorrelations differ from actual relationships, one simply does not know what captured
policies mean. Certainly, the external validity of the
ratee profiles and the resultant captured policies are
seriously compromised. Thus, in those studies using
orthogonal dimensions, one is left with the conclusion that the construct validity of captured policies
is highly questionable.
Given the serious problems associated with high
levels of dimensional multicollinearity, what other
options are available besides artificial orthogonalization? Hobson et al. (1981) have discussed the various
methodological (use of behavioral expectation scales
to generate conceptually independent dimensions)
and statistical procedures (principal components
analysis and partialing overall performance from
dimensional scores before factor analysis as proposed
by Landy, Vance, Bames-Farrell, & Steele, 1980) offered in the literature to deal with this problem. The
key issue, however, is the extent to which the actual
relationships among performance dimensions are
represented accurately in the ratee profiles presented
to subjects. When gross discrepancies e.xist, the meaning of captured rating policies is hopelessly confused.
Number of Ratee Profiles. An important consideration in the use of multiple regression in any
decision making setting is the number of stimuli or
profiles presented to raters, relative to the number
of cues or dimensions. In situations in which the ratio
of profiles to dimensions is low, two major, related
problems arise: (a) substantial overfitting of the
linear model, resulting in spuriously high obtained
R^ values and (b) large sampling error in all computed estimatesboth individual regression weights
and the overall R^ (Cohen & Cohen, 1975;
Darlington, 1968).
In policy capturing studies, obtained R^& indicate
the consistency with which raters combine performance information in making an overall judgment.
Regression weights for individual performance
dimensions are used in standard calculations of the
relative importance of each dimension in the rater's
policy (Hoffman's, 1960, relative weight formula is
the one most frequently utilized). In fact, captured
rating policies are nothing more than the pattern of
relative weights associated with the various dimensions of performance. Thus, small ratee profile to
performance dimension ratios resulting in substantial sampling error can lead to unreliable estimates
of an individual's rating policy.
644
To index the extent to which sampling error is present in the policy capturing research, the correction
formulae outlined by Cattin (1980) for estimating the
population cross-validity coefficient were applied to
the central tendency measures of obtained Rh in
each study. These corrected R^ values are provided
in the last column of Table 1. As the values in the
table indicate, there is substantial variation in the
amount of sampling error or shrinkage present.
Drops in 7?^ range from a low of .04 (Naylor &
Wherry, 1965) to a high of .70 (Stumpf & London,
1981), indicating that the obtained Rh in this study
were completely spurious. Although there are major
differences across studies in the profiles/dimensions
ratio, in general most researchers have exhibited a
lack of recognition of this important factor and have
violated the minimum standard ratio of 10:1 recommended by Nunnally (1978).
In those situations in which substantial sampling
error exists, the meaning of captured policies is further obscured. Relative weights used to define rater
policies and computed in terms of estimated regression weights will exhibit a great deal of sampling instability. This poses serious interpretational problems
as to the meaning of captured policies and their accuracy in representing actual rating decisions.
Summary. Given the serious methodological problems discussed above, one cannot reasonably conclude that the evidence supports the construct validity
of captured policies as accurate paramorphic representations of actual rating decisions. The critical differences between the typical research tasks presented
to subjects and actual rating settings force one to admit that what captured policies represent is not really known.
magnitude as to warrant a complete halt. If captured

policies are to represent rater decision making
behavior accurately, they must be obtained and defined in terms of realistic rating settings.
Certainly, finding appropriate applied settings and
performing rigorous policy capturing research will
not be easy. Ideal research settings would have the
following characteristics: (a) a relatively straightforward job with basically independent performance
dimensions; (b) a large number of supervisors; (c) an
extremely large span of control; and (d) a structured,
behaviorally-based rating system in which supervisors
rate subordinates on individual performance dimensions and then also on overall performance.
One such setting that might lend itself nicely to
meeting these requirements is the assessment center.
Typically, multiple raters evaluate a large number of
assessees in a standardized manner on individual
dimensions of performance in addition to overall performance. This type of applied situation could provide an excellent opportunity to collect the data
necessary to generate more realistic and construct
valid rater evaluation policies and circumvent the host
of problems associated with using hypothetical ratees.
Certainly a major challenge to policy capturing researchers in the future will be to locate or develop
appropriate field settings in order to facilitate the
definition of more construct valid rater policies.
The second recommendation is contingent on successful attainment of the first one. Once field studies
have been completed, it would be of value to compare rater policies captured in field versus artificial
settings. For instance, using the assessment center as
an example, one could compare rater policies based
on the evaluation of actual assessees with policies
based on hypothetical ones. In this way, the extent
to which artificially captured policies differ from
more realistic ones could be delineated more precisely. Also, only in this manner would it be possible to
vindicate the present methodological approach to
policy capturing as a valid one.
Research Recommendations
Based on this review of the policy capturing literature, two recommendations for future research are
offered. First and foremost, it is recommended that
future policy capturing studies be conducted exclusively in field settings, using actual supervisory
ratings of subordinates for administrative purposes.
In conjunction with this recommendation, a cessation of policy capturing research as designed and implemented to date would seem warranted. It appears
at this point that the differences between the artificial
rating situations characterizing current research and
actual rating settings are of such overwhelming
Practical Implications
Despite the serious methodological problems with
the policy capturing literature, it has served to focus
attention on the need for an explicit definition of
supervisor rating policies and their consistent use in
making overall evaluations. When used in conjunction with a thorough job analysis and rigorously
645
basic ways ot detining rater policies: (a) in a purely

statistical manner; (b) in a purely rational manner;
and (c) in a manner combining both the statistical
and rational approaches. As mentioned above, defining policies in a purely statistical manner does not
guarantee rater acceptance of the end result. On the
other hand, when using a purely rational approach
to define policies, the lack of rater self-insight
discussed earlier presents a major problem.
Given these problems with the first two approaches, the third alternative, that of combining the
statistical and rational, can be viewed as a more acceptable one. Rater policies first would be described
statistically in order to provide individuals with an
insight into their actual decision making behavior.
These statistical policies then would be rationally
reviewed and revised by each rater so as to refiect
their intentions more accurately and thus be more
personally acceptable. This format allows one to
capitalize on the strengths and objectivity of the
statistical approach while insuring the rater the opportunity to modify the statistical policy to correspond more closely to his/her intentions.
Another important question concerning the source
of rater policies involves the desirability of allowing
individual raters to develop and utilize their own
unique policies. As will be discussed later, affording
raters this opportunity would allow for the possibility
of individualizing the rating process for subordinates.
Specifically, raters could develop and use different
policies to evaluate different subordinates. This
would allow subordinates the opportunity to
negotiate the appraisal weights associated with each
individual performance dimension in an effort to
capitalize on their own unique strengths and
weaknesses.
Although this approach might seem intuitively appealing, its implementation could result in two serious
drawbacks: (a) a possible lack of correpondence between appraisal requirements and general organizational goals and (b) a potential loss of consistency
in supervisory appraisal criteria throughout an
organization. Conformity to basic organizational
goals and consistency in the application of evaluation criteria often are considered desirable characteristics of an appraisal system and crucial in making administrative decisions involving promotion,
merit increase, termination, and so on.
A possible compromise strategy would be to allow
supervisors the fiexibility to assign different weights
developed behaviorally-based rating scales for the

major individual dimensions of performance, the
definition and use of supervisor rating policies have
a number of important practical implications. Hobson and Gibson (in press) have provided an example
of such a rating system consisting of four basic steps.
The first one involves a thorough job analysis to identify the major dimensions of performance. Step two
follows with the rigorous development of behaviorally based rating scales for each dimension. It should
be noted that although research (Landy & Farr, 1980)
generally has not shown behaviorally based dimensions to be psychometrically superior to more general
trait dimensions, the behaviorally based format has
to recommend it the following advantages as outlined
in Carroll and Schneier:
Job behaviors typically are more readily accessible for
measuremetit and hence are likely to be more practical and relevant criteria. Such performance based
ctiteria are perhaps also less prone to the subjectivity
that is often associated with the use of measurements
of personal criteria techniques. In addition, observable criteria seem to go further in meeting guidelines
for the validity of selection techniques (1982, p. 36).
The third step in the development of this rating system involves the definition of rater policies. Finally,
step four focuses on the explicit communication and
use of rater policies in making overall performance
ratings.
Before discussing the implications of using a rating
system of this type, the issue concerning the source
of supervisor rating policies must be addressed. There
is no logical imperative that they be defined in a purely statistical manner. Certainly the interpretational
problems associated with policies captured in artificial settings are quite evident and serious in nature.
However, even were rater policies captured in actual
field settings, it is still questionable whether one
should rigidly apply such policies in making overall
ratings. Einhorn and Hogarth (1981) introduced the
notion of whether optimal decision models can be
considered "reasonable." This idea is applicable in
considering the "reasonableness" of using a statistically defined policy to make overall ratings. Certainly a captured rating policy first must be acceptable to the individual rater before one could expect
that person to utilize it in making overall evaluations.
Thus, in the final analysis, captured policies should
undergo the scrutiny of the individual rater and be
judged appropriate.
Hobson and Gibson (in press) have outlined three
646
tional rule when used in repeated decisions (Slovic

& Lichtenstein, 1971; Slovic et al., 1977).
These potential sources of error are not present in
overall evaluations computed using rater policies.
Thus, ratings should be more reliable. If the system
has been developed in a rigorous, behaviorally based
manner, they should be more valid as well. Additionally, the likelihood of intentional or unintentional
bias in overall ratings should be reduced. As Feldman
(1981) has pointed out, bias is more likely to enter
into global ratings of performance. Therefore, restricting supervisor ratings to individual, behaviorally defined dimensions of performance should act
to reduce such bias.
Of course, as is the case with any rating format
yet devised, opportunities to circumvent the system
and distort appraisal results exist. For instance, raters
conceivably could compute the scores needed on the
separate dimensions of performance in order to produce a desired overall rating. However, no system
is foolproof, and the use of captured policies is best
viewed as an aid in improving the overall appraisal
process in situations in which a conmiitment to such
improvements exists. Raters genuinely concerned
with improving the objectivity and reliability of their
ratings should find the use of captured policies very
helpful in this regard.
to the separate dimensions of performance, but

within general limits set forth by the organization.
For instance, on a given performance dimension, an
organization might prescribe that acceptable weights
must fall in the range of 10 percent to 20 percent.
This would allow for some measure of flexibility and
individualization and at the same time insure a degree
of conformity, standardization, and consistency
throughout an organization. The final choice of
which approach to take probably is best made in
terms of overall organizational goals and operating
procedures.
The following practical implications represent
logical consequences to be expected from implementing and utilizing a policy capturing system similar to
the one described above by Hobson and Gibson (in
press). The hypothetical nature of these outcomes
should be emphasized in light of the lack of supporting empirical data. Also, as with any appraisal
system, active, genuine commitment and participation are prerequisites to the realization of any of the
expected outcomes.
Simplified Performance Evaluation
The definition and explicit use of behaviorally

based supervisor rating policies to make overall
evaluations can be expected to result in more simplified demands on supervisors. When conducting evaluations, the supervisor would simply rate a subordinate on each individual dimension of performance
and then use the stated rating policy to combine that
information into an overall evaluation. This precludes the often difficult task of providing a single
global rating for each subordinate, and it increases
the likelihood that overall ratings are made in a consistent manner. The end result should be ratings that
are more objective, fair, and defensible.
Subordinate Benefits
The definition and use of supervisor rating policies

also can lead to a number of potentially important
benefits for subordinates. First, the supervisor's
rating policy in conjunction with the behaviorally
based scales for each individual performance dimension provides a very explicit definition of performance expectations. Subordinates receive an objective statement of what constitutes desired performance on each dimension in addition to its relative
importance in determining overall ratings. Such a
clear linking of job behaviors to appraisal results
should facilitate the learning of behavior-reward contingencies or instrumentalities.
As Ilgen, Fisher, and Taylor have noted, in many
organizational settings "the individual often is left
to infer what is the desirable behavior from outcomes
(positive or negative) that are administered.. .however, because of the complexities of the behaviors
emitted and the inconsistent scheduling of most
rewards in organizations, the learning of instrumen-
Improved Rating Quality
Using the general rating approach outlined above,

the psychometric quality of overall ratings should be
substantially improved. The use of rater policies to
combine dimensional information into an overall
rating explicitly recognizes and deals with important
limitations in human information-processing capacity. These include: (a) the inconsistent combination
of data on a number of factors into an overall judgment; (b) the lack of insight into the decision rule
used to weight and combine the information; and
(c) the tendency to change, unwittingly, the combina647
talities through a shaping process often is extremely

difficult" (1979, p. 362). The use of captured rating
policies thus is one way to deal with this problem by
specifying performance expectations and the contingencies between performance and appraisal results.
Additionally, one could predict subsequent improvements in motivation (in terms of an expectancy theory
framework), performance, and satisfaction with the
appraisal process.
The policy capturing approach provides an excellent format for individualizing the appraisal process:
subordinates could negotiate, within limits, the
weights associated with each separate performance
dimension in an effort to capitalize best on their
unique pattern of strengths and weaknesses.
The improved psychometric quality of overall

ratings discussed above should lead to more accurate
administrative decisions based on these ratingsthat
is, promotion, merit increase, termination, and selection for training. By explicitly specifying desired
behaviors and linking them to appraisal results, organizations also can enhance their control over
employees. In addition, it would be possible for an
organization to monitor individual supervisor rating
policies to insure that they are consistent with the
firm's overall goals. Finally, the policy capturing
system outlined here provides an excellent, behaviorally-based framework for performance evaluation
that should easily withstand legal challenge (Feild &
Holley, 1982).
References
Anderson, B. L. Differences in teachers' judgement policies for
varying numbers of verbal and numerical cues. Organizational
Behavior and Human Performance, 1977, 19, 68-88.
Feldman, J. M. Beyond attribution theory: Cognitive processes

in performance appraisal. Jounwl of Applied Psychology, 1981,
66, 127-148.
Borman, W. C. The rating of individuals in organizations: An

alternate approach. Organizational Behavior and Human Performance, 1974, 12, 105-124.
Hobson, C. J., & Gibson, F. W. Policy capturing (POLVCAP):

A new approach to performance appraisal. Personnel Administrator, in press.
Carroll, S. J., & Schneier, C. E. Performance appraisal and review

systems. Glenview, 111.: Scott, Foresman & Co., 1982.
Hobson, C. J., Mendel, R. M., & Gibson, F. W. Clarifying performance appraisal criteria. Organizational Behavior and
Human Performance, 1981, 28, 164-188.
Cascio, W. F. Applied psychology in personnel

management.
Reston, Va.: Reston Publishing Co., Inc., 1978.
Hoffman, P. J. The paramorphic representation of clinical judgement. Psychological Bulletin, 1960, 57, 116-131.
Cattin, P. Estimation of the predictive power of a regression model.

Journal of Applied Psychology, 1980, 65, 407-414.
Cohen, J., & Cohen, P. Applied multiple
regression/correlation
analysis for the behavioral sciences. Hillsdale, N. J.: Erlbaum,
1975.
Ilgen, D. R., Fisher, C. D., & Taylor, M. S. Consequences of individual feedback on behavior in organizations. Journal of Applied Psychology, 1979, 64, 349-371.
Darlington, R. B. Multiple regression in psychological research

and practice. Psychological Bulletin, 1968, 69, 161-182.
Landy, F. J., & Farr, J. L. Performance rating. Psychological

Bulletin, 1980, 87, 72-107.
Dawes, R. M., & Corrigan, B. Linear models in decision making. Psychological Bulletin, 1974, 81, 95-106.
Landy, F. J., Vance, R. J., Bames-Farrell, J. L., & Steele, J. W.

Statistical control of halo error in performance ratings. Journal of Applied Psychology, 1980, 65, 501-506.
Dudycha, A. L. A Monte Carlo evaluation of JAN: A technique

for capturing and clustering raters' policies. Organizational
Behavior and Human Performance, 1970, 5, 501-516.
Naylor, J. C , & Wherry, R. J., Sr. The use of simulated stimuli

and the "JAN" technique to capture and cluster the policies
of raters. Educational and Psychological Measurement, 1965,
25, 969-986.
Dudycha, A. L., & Naylor, J. C. The effect of variations in the

cue R matrix upon the obtained policy equation of judges.
Educational and Psychological Measurement, 1966, 26, 583-603.
Nunnally, J. C. Psychometric theory. 2nd ed. New York: McGraw

Hill, 1978.
Einhorn, H. J., & Hogarth, R. M. Behavioral decision theory:

Processes of judgement and choice. Annual Review of
Psychology, 1981, 32, 53-88.
Oskamp, S. Clinical judgement from the MMPI: Simple or complex? Journal of Clinical Psychology, 1967, 23, 411-415.
Feild, H. S., & Holley, W. H. The relationship of performance

appraisal system characteristics to verdicts in selected employment discrimination cases. Academy of Management Journal,
1982, 25, 373-391.
Schenck, E. A., & Naylor, J. C. A cautionary note concerning

the use of regression analysis for capturing the strategies of people. Educational and Psychological Measurement, 1968, 28, 3-7.
648
Slovic, rrTVTtaiteilimriTvr^amfmman ui uujuijn and regression approaches to the study of information processing in judgement. Organizational Behavior and Human Performance, 1971,
6 649-744
Taylor, R. L., & Wilsted, W. D. Capturing judgement policies:

A field study of performance appraisal. Academy of Management Journal, 1974, 17, 440-449.
Zedeck, S., & Kafry, D. Capturing rater policies for processing
evaluation data. Organizational Behavior and Human Performance, 1977, 18, 269-294.
Slovic, P., Fischhoff, B., & Lichtenstein, S. Behavioral decision

theory. Annual Review of Psychology, 1977, 28, 1-39.
Stumpf, S. A., & London, M. Capturing rater p>olicies in evaluating
candidates for promotion. Academy of Management Journal,
1981, 24, 752-766.
Charles J. Hobson is Assistant Professor of Management

and Personnel in the Division of Business and Economics,
Indiana University Northwest.
Frederick W. Gibson is a Captain in the U.S. Air Force,
stationed at the U.S. Air Force Occupational Measurement
Center in San Antonio, Texas.
649

Policy Capturing As An Approach To Understanding and Improving Performance Appraisal: A Review of The Literature

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Policy Capturing As An Approach To Understanding and Improving Performance Appraisal: A Review of The Literature

Uploaded by

Copyright:

Available Formats

'Academy of Management Review, t983, Vot. 8, No. 4, 640-649.

Policy Capturing as an Approach to

A small but growing literature has accumulated

& Wherry, 1965; Stumpf & London, 1981; Zedeck

struct validity of the resultant captured policies. In

asked U.S. Air Force cadet officers to rate their eight

Four major aspects of the way in which stimulus

concern that jeopardizes the construct validity of captured rating policies.

magnitude as to warrant a complete halt. If captured

basic ways ot detining rater policies: (a) in a purely

developed behaviorally-based rating scales for the

tional rule when used in repeated decisions (Slovic

to the separate dimensions of performance, but

The definition and explicit use of behaviorally

The definition and use of supervisor rating policies

Improved Rating Quality

Using the general rating approach outlined above,

talities through a shaping process often is extremely

The improved psychometric quality of overall

Feldman, J. M. Beyond attribution theory: Cognitive processes

Borman, W. C. The rating of individuals in organizations: An

Hobson, C. J., & Gibson, F. W. Policy capturing (POLVCAP):

Carroll, S. J., & Schneier, C. E. Performance appraisal and review

Cascio, W. F. Applied psychology in personnel

Cattin, P. Estimation of the predictive power of a regression model.

Darlington, R. B. Multiple regression in psychological research

Landy, F. J., & Farr, J. L. Performance rating. Psychological

Landy, F. J., Vance, R. J., Bames-Farrell, J. L., & Steele, J. W.

Dudycha, A. L. A Monte Carlo evaluation of JAN: A technique

Naylor, J. C , & Wherry, R. J., Sr. The use of simulated stimuli

Dudycha, A. L., & Naylor, J. C. The effect of variations in the

Nunnally, J. C. Psychometric theory. 2nd ed. New York: McGraw

Einhorn, H. J., & Hogarth, R. M. Behavioral decision theory:

Feild, H. S., & Holley, W. H. The relationship of performance

Schenck, E. A., & Naylor, J. C. A cautionary note concerning

Taylor, R. L., & Wilsted, W. D. Capturing judgement policies:

Slovic, P., Fischhoff, B., & Lichtenstein, S. Behavioral decision

Charles J. Hobson is Assistant Professor of Management

You might also like