Foundational Models For 21 Century Program Evaluation: The Evaluation Center Occasional Papers Series

Foundational Models for 21st Century
Program Evaluation
by
Daniel L. Stufflebeam
The Evaluation Center
Western Michigan University
The Evaluation Center

Occasional Papers Series
December 1, 1999
Foundational Models for 21st Century
Program Evaluation 12
In moving to a new millennium, it is an opportune time for evaluators to critically appraise their
program evaluation approaches and decide which ones are most worthy of continued application
and further development. It is equally important to decide which approaches are best abandoned.
In this spirit, this paper identifies and assesses 22 approaches often employed to evaluate
programs. These approaches, in varying degrees, are unique and comprise most program
evaluation efforts. Two of the approaches, reflecting the political realities of evaluation, are often
used illegitimately to falsely characterize a program’s value and are labeled pseudoevaluations.
The remaining 20 approaches are typically used legitimately to judge programs and are divided
into questions/methods-oriented approaches, improvement/ accountability approaches, and social
agenda/advocacy approaches. The best program evaluation approaches appear to be Outcomes
Monitoring/Value-Added Assessment, Case Study, Decision/Accountability, Consumer-Oriented,
Client-Centered, Constructivist, and Utilization-Based, with the new Democratic Deliberative
approach showing promise. The worst bets seem to be Politically Controlled, Public Relations,
Accountability (especially payment by results), Clarification Hearings, and Program Theory-
Based. The rest fall somewhere in the middle. All legitimate approaches are enhanced when
keyed to and assessed against professional standards for evaluations.
1
This paper was prepared for The Evaluation Center’s Occasional Papers Series. It is based on a presentation
in the State of the Evaluation Art and Future Directions in Educational Program Evaluation Invited Symposium at
the annual meeting of the American Educational Research Association; Montreal, Quebec, Canada; April 20, 1999.
2
Appreciation is extended to colleagues who critiqued prior drafts of this paper, especially Sharon
Barbour, Jerry Horn, Tom Kellaghan, Gary Miron, Craig Russon, James Sanders, Sally Veeder, Bill W iersma, and
Lori W ingate. While their valuable assistance is acknowledged, the author is responsible for the paper’s contents
and especially any flaws.
ii
Table of Contents
I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Overview of the Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Evaluation Models and Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The Nature of Program Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Need to Study Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Classifications Of Alternative Evaluation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Program Evaluation Defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Pseudoevaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Questions/Methods-Oriented Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Improvement/Accountability-Oriented Evaluations . . . . . . . . . . . . . . . . . . . . . . . 4
Social Agenda-Directed (Advocacy) Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
II PSEUDOEVALUATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Approach 1: Public Relations-Inspired Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Approach 2: Politically Controlled Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
III QUESTIONS/METHODS-ORIENTED EVALUATION APPROACHES . . . . . . . . . . . . . 11

Approach 3: Objectives-Based Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Approach 4: Accountability, Particularly Payment By Results Studies . . . . . . . 12
Approach 5: Objective Testing Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Approach 6: Outcomes Monitoring/Value-Added Assessment . . . . . . . . . . . . . 15
Approach 7: Performance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Approach 8: Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Approach 9: Management Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . 19
Approach 10: Benefit-Cost Analysis Approach . . . . . . . . . . . . . . . . . . . . . . . . . 20
Approach 11: Clarification Hearing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Approach 12: Case Study Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Approach 13: Criticism and Connoisseurship . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Approach 14: Program Theory-Based Evaluation . . . . . . . . . . . . . . . . . . . . . . . 25
Approach 15: Mixed Methods Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
IV IMPROVEMENT/ACCOUNTABILITY-ORIENTED EVALUATION APPROACHES . . 41

Approach 16: Decision/Accountability-Oriented Studies . . . . . . . . . . . . . . . . . . 41
Approach 17: Consumer-Oriented Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Approach 18: Accreditation/Certification Approach . . . . . . . . . . . . . . . . . . . . . 45
V SOCIAL AGENDA-DIRECTED (ADVOCACY) APPROACHES . . . . . . . . . . . . . . . . . . . 53

Approach 19: Client-Centered Studies (or Responsive Evaluation) . . . . . . . . . . 55
Approach 20: Constructivist Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
iii
Approach 21: Deliberative Democratic Evaluation . . . . . . . . . . . . . . . . . . . . . . 58
Approach 22. Utilization-Focused Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 60
VI Best Approaches for 21st Century Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Table 19: RATINGS Strongest Program Evaluation Approaches . . . . . . . . . . . . . . . . 72
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Checklist for Rating Evaluation Approaches in Relationship to
The Joint Committee Program Evaluation Standards . . . . . . . . . . . . . . . . . . . . 89
iv
Editor’s Note:
The Occasional Paper Series is published by The Evaluation Center on the campus of
Western Michigan University. Its purpose is to advance the theory and practice of
evaluation by reporting on new developments in the profession. Authors who contribute
to the series retain copyright or their work. This allows them to publish early drafts of a
paper, obtain feedback from readers, make necessary modifications, and go on to publish
in other venues.
In this volume of the Series, published on the eve of a new millennium, Daniel
Stufflebeam reviews the evaluation models that have emerged and identified the models
that offer the greatest prospects for future success. Few in the profession are better able to
do this than Stufflebeam. During his career, which spans nearly four decades, he has
developed nearly 100 standardized tests, authored the CIPP evaluation model, was the
first Chair of the Joint Committee on Standards for Educational Evaluation, and
pioneered the concept for metaevaluation.
The reader is invited to join the ranks of authors who have published in the Occasional
Paper Series including Donald Campbell, Gene Glass, Arnold Love, James Sanders,
Michael Scriven, Lori Shephard, Robert Stake, and Daniel Stufflebeam. Manuscripts
should be 50-100 pages in length and significant to the field of evaluation. All
submissions are reviewed for acceptability by the editorial team made up of the staff of
The Evaluation Center.
Craig Russon, Ph.D.

Editor,
The Occasional Paper Series
v
I. INTRODUCTION
Overview of the Paper heart of societal reforms in the U.S., and the
U.S. society has repeatedly pressed educators to
Evaluators today have at their disposal many show through evaluation whether or not
more evaluation approaches than in 1960. As improvement efforts were succeeding.
evaluators prepare to surmount the Y2K
challenges and cross into the next century, it is The development of program evaluation as a
an opportune time to consider what 20th century field of professional practice was also spurred
evaluation developments are best to take along by a number of seminal writings. These
and which ones would best be left behind. I included, in chronological order, publications
have, in this paper, attempted to sort 22 by Tyler (1942, 1950), Campbell and Stanley
alternative evaluation approaches into what (1963), Cronbach (1963), Stufflebeam (1966),
fishermen sometimes call the “keepers” and the Tyler (1966), Scriven (1967), Stake (1967),
“throwbacks.” More importantly, I have Stufflebeam (1967), Suchman (1967), Alkin
attempted to characterize each approach; (1969), Guba (1969), Provus (1969),
identify its strengths and weaknesses; and Stufflebeam et al. (1971), Parlett and Hamilton
consider whether, when, and how each approach (1972), Eisner (1975), Glass (1975), Cronbach
is best applied. The reviewed approaches and Associates (1980), House (1980), and
emerged mainly in the U.S. between 1960 and Patton (1980). These and other authors/scholars
1999. began to project alternative approaches to
program evaluation. In the ensuing years a rich
Following a period of relative inactivity in the literature on a wide variety of alternative
1950s, a succession of international and program evaluation approaches developed [see,
national forces stimulated the development of for example, Cronbach (1982); Guba and
evaluation theory and practice. Main influences Lincoln (1981, 1989); Nave, Misch, and
were the efforts to vastly strengthen the U. S. Mosteller (1999), Nevo (1993); Patton (1982,
defense system spawned by the Soviet Union’s 1990, 1994, 1997); Rossi and Freeman (1993);
1957 launching of Sputnik I; the new U.S. laws Schwandt (1984); Scriven (1991, 1993, 1994a,
in the 1960s to equitably serve persons with 1994b, 1994c); Shadish, Cook, and Leviton
disabilities and minorities; the federal (1991); Smith, M. F. (1989); Smith, N. L.
evaluation requirements of the Great Society (1987); Stake (1975b, 1988, 1995); Stufflebeam
programs initiated in 1965; the U.S. movement (1997); Stufflebeam and Shinkfield (1985);
begun in the 1970s to hold educational and Wholey, Hatry, and Newcomer (1995);
social organizations accountable for both Worthen and Sanders (1987, 1997)].
prudent use of resources and achievement of
objectives; the stress on excellence in the 1980s Evaluation Models and Approaches
as a means of increasing U.S. international
competitiveness; and the trend in the 1990s for The chapter uses the term evaluation approach
various organizations, both inside and outside rather than evaluation model because, for one
the U.S., to employ evaluation to assure quality, reason, the former is broad enough to cover
competitiveness, and equity in delivering illicit as well as laudatory practices. Also,
services. Education has consistently been at the beyond covering both creditable and
2 Stufflebeam
noncreditable approaches, some authors of evaluation researchers identify, examine, and

evaluation approaches say that the term model is address conceptual and technical issues
too demanding to cover their published ideas pertaining to the development of the evaluation
about how to conduct program evaluations. But discipline. Operationally, a critical view of
for these two considerations, the term model alternatives can help evaluators consider and
would have been used to encompass most of the assess optional frameworks for planning and
evaluation proposals discussed in this chapter. conducting particular studies. On this point, the
This is so because most of the presented author has found that different approaches may
approaches are idealized or “model” views for work differentially well, depending on the
conducting program evaluations according to evaluation’s context. Often it is advantageous
the beliefs and experiences of their authors. to borrow strengths of different approaches to
create a “best fit” approach for specific
The Nature of Program Evaluation evaluation projects. Thus, it behooves
evaluators to develop a repertoire of different
The chapter employs a broad view of program legitimate approaches they can use, plus the
evaluation. It encompasses evaluations of any ability to discern which approaches work best
coordinated set of activities directed at under what circumstances. However, a main
achieving goals. Examples are assessments of value in studying alternative program evaluation
ongoing, cyclical curricular programs; time- approaches is not to enshrine any of them. On
bounded projects; and regional or state systems the contrary, the purposes are to discover their
of services. Such program evaluations both strengths and weaknesses, decide which ones
overlap and yet are distinguishable from other merit substantial use, determine when and how
forms of evaluation, especially student they are best applied, and obtain direction for
evaluation, teacher evaluation, materials improving these approaches and devising better
evaluation, and school evaluation. The alternatives.
program evaluation approaches that are
considered cut across a wide array of programs Classifications Of Alternative Evaluation
and services, e.g., curriculum innovations, Approaches
school health services, counseling, adult
education, preschool, state systems of education, In analyzing the 22 alternative evaluation
school-to-work projects, adult literacy, and approaches, prior assessments regarding
parent involvement in schools. Clearly, program evaluation’s state of the art were
program evaluation applies importantly to a consulted. Stake’s analysis of 9 program
broad array of activities. evaluation approaches provided a useful
application of advance organizers (the types of
Need to Study Alternative Approaches variables used to determine information
requirements) for ascertaining different types of
The study of alternative evaluation approaches program evaluations.1 Hastings’ review of the
is vital for the professionalization of program growth of evaluation theory and practice helped
evaluation and for its scientific advancement to place the evaluation field in a historical
and operation. Professionally, careful study of perspective. 2 Guba’s presentation and
the approaches being employed in the name of assessment of six major philosophies in
program evaluation can help evaluators evaluation was provocative.3 House’s (1983)
legitimize approaches that comport with sound analysis of different approaches illuminated
principles of evaluation and discredit those that important philosophical and theoretical
don’t. Scientifically, such a review can help distinctions. Finally, Scriven’s (1991, 1994a)
Introduction 3
writings on the transdiscipline of evaluation Pseudoevaluations

helped to sort out different evaluation
approaches; it was also invaluable in seeing This paper’s first group of program
program evaluation approaches in the broader evaluation approaches includes what I have
context of evaluations focused on various termed pseudoevaluations. These promote a
objects other than programs. Although the positive or negative view of a program,
paper does not always agree with the irrespective of its actual merit and worth.
conclusions put forward in these publications, Such studies often are motivated by political
all of the prior assessments helped sharpen the objectives, e.g., persons holding or seeking
issues addressed. authority may present unwarranted claims
about their achievements and/or the faults of
Program Evaluation Defined their opponents or hide potentially damaging
information. These objectionable approaches
In characterizing and assessing different are presented because they deceive through
evaluation approaches, careful consideration evaluation and can be used by those in power
was given to the various kinds of activities to mislead constituents or to gain and
conducted in the name of program evaluation. maintain an unfair advantage over others,
These activities were classified based on their especially those persons with little power. If
degree of conformity to a particular definition of evaluators acquiesce to and support
evaluation. This chapter defines evaluation as a pseudoevaluations, they help promote and
study designed and conducted to assist some support injustice, mislead decision making,
audience to assess an object’s merit and worth. lower confidence in evaluation services, and
This definition should be widely acceptable discredit the evaluation profession. Thus, the
because it agrees with common dictionary paper discusses pseudoevaluations in order to
definitions of evaluation; also, it is consistent sensitize professional evaluators and their
with the definition of evaluation that underlies clients to the prevalence of and harm caused
published sets of professional standards for by such inappropriate studies and to convince
evaluations (Joint Committee 1981, 1994). them to oppose such invalid evaluation
However, it will become apparent that many practices.
studies done in the name of program evaluation
either do not conform to this definition or Questions/Methods-Oriented Approaches
directly oppose it.
The second category of approaches includes
The above definition of an evaluation study studies that are oriented to (1) address
was used to classify program evaluation specified questions whose answers may or may
approaches into four categories. The first not be sufficient to assess a program’s
category includes approaches that promote merit and worth and/or (2) use some preferred
invalid or incomplete findings (referred to as method(s). These Questions/Methods-
pseudoevaluations), while the other three Oriented Approaches include studies that
include approaches that agree, more or less, employ as their starting points operational
with the employed definition of evaluation objectives, standardi zed measurement
(i.e., Questions/Methods-Oriented, devices, cost analysis procedures, expert
Improvement/Accountability, and Social judgment, a theory or model of a program,
Agenda/Advocacy). case s t ud y p r oc e dures, management
information systems, designs for controlled
experiments, and/or a commitment to employ
4 Stufflebeam
a mixture of qualitative and quantitative Social Agenda/Directed (Advocacy) Models

methods. Most of them emphasize technical
quality and posit that it is usually better to The fourth category of approaches is labeled
answer a few pointed questions well than to Social Agenda/Directed (Advocacy) Models.
attempt a broad assessment of something’s The approaches in this group are quite heavily
merit and worth. Since these approaches tend oriented to employing the perspectives of
to concentrate on methodological adequacy in stakeholders as well as experts in
answering given questions rather than characterizing, investigating, and judging
determining a program’s value, the set of programs. Mainly, they eschew the
these approaches may be referred to as quasi- possibility of finding right or best answers
evaluation approaches. While they are and reflect the philosophy of postmodernism,
typically labeled as evaluations, they may or with its attendant stress on cultural pluralism,
may not meet the requirements of a sound moral relativity, and multiple realities.
evaluation. Typically, these evaluation approaches favor a
constructivist orientation and the use of
Improvement/A ccou nt ability-Oriented qualitative methods. These evaluation
Evaluations approaches emphasize the importance of
democratically engaging stakeholders in
The third set of approaches involves studies obtaining and interpreting findings. They also
designed primarily to assess and/or improve a stress serving the interests of underprivileged
program’s merit and worth. These are labeled groups. Worries about these approaches are
Improvement/Accountability-Oriented that they might concentrate so heavily on
Evaluations. They are expansive and seek serving a social mission that they fail to meet
comprehensiveness in considering the full the standards of a sound evaluation. For
range of questions and criteria needed to example, if an evaluator is so intent on
assess a program’s value. Often they employ serving the underprivileged, empowering the
the assessed needs of a program’s disenfranchised, and/or righting educational
stakeholders as the foundational criteria for and/or social injustices, he or she might
assessing the program’s merit and worth. compromise the independent, impartial
They seek to examine the full range of perspective needed to produce valid findings.
appropriate technical and economic criteria In the extreme, an advocacy evaluation could
for judging program plans and operations. compromise the integrity of the evaluation
They also look for all relevant outcomes, not process in order to achieve social objectives and
just those keyed to program objectives. Such thus devolve into a pseudoevaluation.
studies sometimes are overly ambitious in The particular social agenda/advocacy
trying to provide broad-based assessments approaches presented in this paper seem to
leading to definitive, documented, and have sufficient safeguards needed to walk the
unimpeachable judgments of merit and worth. fine line between sound evaluation services
Typically, they must use multiple qualitative and politically corrupted evaluations. Worries
and quantitative assessment methods to about bias control in these approaches
provide cross-checks on findings. In general, increase the importance of subjecting
these approaches conform closely to this advocacy evaluations to metaevaluations
paper’s definition of evaluation. grounded in standards for sound evaluations.
Introduction 5
Of the 22 program evaluation approaches My analyses reflect 35 years of experience in

discussed, 2 are classified as applying and studying different evaluation
pseudoevaluations, 13 as questions/methods- approaches. Hopefully, as parochial as these
oriented approaches, 3 as improvement/ might be, they will be useful to evaluators
accountability-oriented approaches, and 4 as and evaluation students at least in the form of
social agenda/advocacy-directed approaches. working hypotheses to be tested.
The analysis of the 20 legitimate approaches is
preceded with a discussion of the 2 Also, I have mainly looked at the approaches
approaches that often are used to distort as relatively discrete ways to conduct
findings and conclusions. The latter group is evaluations. In reality, there are many
considered because evaluators and clients occasions when it is functional to mix and
should be alert to and reject approaches that match different approaches. A careful
often are masqueraded as sound evaluations, analysis of such combinatorial applications no
but in reality lack truthfulness and integrity. doubt would produce several hybrid
approaches for analysis. Unfortunately, that
Each approach is analyzed in terms of ten step is beyond the scope of what I have
descriptors: (1) advance organizers, that is, attempted here.
the main cues that evaluators use to set up a
study; (2) main purpose(s) served; (3) sources
of questions addressed; (4) questions that are
characteristic of each study type; (5) methods
typically employed; (6) persons who
pioneered in conceptualizing each study type;
(7) other persons who have extended
development and use of each study type; (8)
key considerations in determining when to use
each approach; (9) strengths of the approach,
and (10) weaknesses of the approach. Using
these descriptors, comments on each of the 22
program evaluation approaches are presented.
These assessments are then used to reach
conclusions about which approaches should
be avoided, which are most meritorious, and
under what circumstances the worthy
approaches are best applied.
Caveats
I acknowledge, without apology, that the

assessments of approaches and the entries in
the charts throughout the paper are mainly my
best judgments. I have taken no poll, and no
definitive research exists to represent a
consensus on the characteristics and strengths
and weaknesses of the different approaches.
II. PSUEDOEVALUATIONS
Because this paper is focused on describing and official project a convincing, positive public
assessing the state of the art in evaluation, it is image for a program, project, process,
necessary to discuss bad and questionable organization, leadership, etc. The guiding
practices, as well as the best efforts. questions are derived from the public relations
Evaluations can be viewed as threatening or specialists’ and administrators’ conceptions of
approached in opportunistic ways. In such which questions would be most popular with
cases, evaluators and their clients are sometimes their constituents. In general, the public
tempted to shade, selectively release, or even relations study seeks information that would
falsify findings. While such efforts may look most help an organization confirm its claims of
like sound evaluations, they are judged in this excellence and secure public support. From the
analysis to be psuedoevaluations, if they do not start, this type of study seeks not a valid
forthrightly attempt to produce and report to all assessment of merit and worth but information
right-to-know audiences valid assessments of needed to help the program “put its best foot
merit and worth. The first type of forward.” Such studies avoid gathering or
psuedoevaluation considered—Public Relations releasing negative findings.
approach—may meet the standard for
addressing all right-to-know audiences but fails Typical methods used in public relations studies
as a legitimate evaluation approach, because are biased surveys, inappropriate use of norms
typically it presents a program’s strengths (or an tables, biased selection of testimonials and
exaggerated view of them) but not its anecdotes, “massaging” of obtained information,
weaknesses. The second psuedoevaluation selective release of only the positive findings,
approach—Politically Controlled cover-up of embarrassing incidents, and the use
evaluation—may be quite strong in obtaining of “expert,” advocate consultants. In contrast to
valid information but fail as a sound evaluation the “critical friends” employed in Australian
by either withholding information from right-to- evaluations, public relations studies use
know audiences or releasing only those parts “friendly critics.” A pervasive characteristic of
that are advantageous to the client. the public relations evaluator’s use of dubious
methods is a biased attempt to nurture a good
Approach 1: Public Relations-Inspired Studies picture for the program being evaluated. The
fatal flaw of built-in bias to report only good
The public relations approach begins with an things offsets any virtues of this approach. If an
intention to use data to convince constituents organization substitutes biased reporting of only
that a program is sound and effective. Other positive findings for balanced evaluations of
names for the approach are “ideological strengths and weaknesses, it soon will
marketing” (see Ferguson, June 1999), demoralize evaluators who are trying to conduct
advertising, and infomercial. and report valid evaluations and may discredit
its overall practice of evaluation.
The advance organizer is the propagandist’s
information needs. The study’s purpose is to By disseminating only positive information on
help the program director/public relations a program’s performance while withholding
8 Stufflebeam
information on shortcomings and problems, Evaluators need to be cautious in how they

evaluators and clients may mislead the relate to the public relations activities of their
taxpayers, constituents, and other stakeholders sponsors, clients, and supervisors. Certainly,
concerning the program’s true value. The public relations documents will reference
possibility of such positive bias in advocacy information f rom sound evaluations.
evaluations underlies the longstanding policy of Evaluators should persuade their audiences to
Consumers Union not to include advertising by make honest use of the evaluation findings.
the owners of the products and services being Evaluators should not be party to misuses,
evaluated in its Consumer Reports magazine. In especially in cases where erroneous reports are
order to maintain credibility with consumers, issued that predictably will mislead readers to
Consumers Union has steadfastly maintained an believe that a seriously flawed program is good.
independent perspective and a commitment to As one safeguard evaluators can promote and
identify and report both strengths and help their clients arrange to have independent
weaknesses in the items evaluated and not to metaevaluators examine the organization’s
supplement this information with biased ads. production and use of evaluation findings
against professional standards for evaluations.
A contact with an urban school district illustrates
the public relations type of study. A Approach 2: Politically Controlled Studies
superintendent requested a community survey for
his district. The superintendent said, The politically controlled study is an approach
straightforwardly, that he wanted a survey that that can be either defensible or indefensible. A
would yield a positive report on the district’s politically controlled study is illicit if the
performance and his leadership. He said such a evaluator and/or client (a) withhold the full set
positive report was desperately needed at the of evaluation findings from audiences who have
time so that the community would restore express, legitimate, and legal rights to see the
confidence in the school district and him. The findings; (b) abrogate their prior agreement to
superintendent did not get the survey and fully disclose the evaluation findings; or (c) bias
positive report, and it soon became clear why he the evaluation message by releasing only part of
thought one was needed. Several weeks after the findings. It is not legitimate for a client first
making the request, he was summarily fired. to agree to make the findings of a commissioned
Another example occurred when a large urban evaluation publicly available and then, having
school district used one set of national norms to previewed the results, to release none or only
interpret pretest results and another norms table part of the findings. If and when a client or
for the posttest. The result was a spurious evaluator violates the formal written agreement
portrayal and attendant wrong conclusion that on disseminating findings or applicable law,
the students’ test performance had vastly then the other party has a right to take
improved between the first and second test appropriate actions and/or seek an
administrations. Still another example was seen administrative or legal remedy.
when an evaluator gave her superintendent a
sound program evaluation report, showing both However, clients sometimes can legitimately
strengths and weaknesses of the targeted commission covert studies and keep the findings
program. The evaluator was surprised and private, while meeting applicable laws and
dismayed one week later, when the adhering to an appropriate advance agreement
superintendent released to the public a revised with the evaluator. This is especially the case in
version showing only the program’s strengths. the U.S. for private organizations not governed
by public disclosure laws. Also, an evaluator,
Psuedoevaluations 9
under legal contractual agreements, can plan, “public’s right to know” law, this type of study
conduct, and report an evaluation for private can degenerate into a pseudoevaluation.
purposes, while not disclosing the findings to
any outside party. The key to keeping client- For obvious reasons, persons have not been
controlled studies in legitimate territory is to nominated to receive credit as pioneers or
reach appropriate, legally defensible, advance, developers of the illicit, politically controlled
written agreements and to adhere to the study. To avoid the inference that this type of
contractual provisions concerning release of the study is imaginary, consider the following
study’s findings. Such studies also have to examples.
conform to applicable laws on release of
information. A superintendent of one of the nation’s largest
public school districts once confided that he
The advance organizers for a politically possessed an extensive notebook of detailed
controlled study include implicit or explicit information about each school building in his
threats faced by the client for a program district. The information included student
evaluation and/or objectives for winning achievement, teacher qualifications, racial mix
political contests. The client’s purpose in of teachers and students, average per-pupil
commissioning such a study is to secure expenditure, socioeconomic characteristics of
assistance in acquiring, maintaining, or the student body, teachers’ average length of
increasing influence, power, and/or money. The tenure in the system, and so forth. The
questions addressed are those of interest to the aforementioned data revealed a highly
client and special groups that share the client’s segregated district with uneven distribution of
interests and aims. The main questions of resources and markedly different achievement
interest to the client are, What is the truth, as levels across schools. When asked why all the
best can be determined, surrounding the notebook’s entries were in pencil, the
particular dispute or political situation? What superintendent replied it was absolutely
information would be advantageous in a essential that he be kept informed about the
potential conflict situation? What data might be current situation in each school; but he said it
used advantageously in a confrontation? was also imperative that the community-at-
Typical methods of conducting the politically large, the board, and special interest groups in
controlled study include covert investigations, the community, in particular, not have access to
simulation studies, private polls, private the information, for any of these groups might
information files, and selective release of point to the district’s inequities as a basis for
findings. Generally, the client wants obtained protest and even removing the superintendent.
information to be as technically sound as Hence, one special assistant kept the document
possible. However, he or she may also want to up-to-date; only one copy existed, and the
withhold findings that do not support his or her superintendent kept that locked in his desk. The
position. The approach’s strength is that it point of this example is not to negatively judge
stresses the need for accurate information. the superintendent’s behavior. Instead, the
However, because the client might release superintendent’s ongoing covert investigation
information selectively to create or sustain an and selective release of information was
erroneous picture of a program’s merit and decidedly not a case of true evaluation, for what
worth, might distort or misrepresent the he disclosed to the right-to-know audiences did
findings, might violate a prior agreement to not fully and honestly inform them about the
fully release the findings, or might violate a observed situation in the district. This example
may appropriately be termed a pseudoevaluation
10 Stufflebeam
because it both underinformed and misinformed these kinds of potential conflicts. Otherwise,
the school district’s stakeholders. they will be unwitting accomplices in efforts to
mislead through evaluation.
Cases like this undoubtedly led to the federal
and state sunshine laws in the United States. Such instances of misleading constituents
Under current U.S. and state freedom of through purposely biased reports or cover-up of
information provisions, most information findings, to which the public has a right,
obtained through the use of public funds must underscore the importance of having
be made available to interested and potentially professional standards for evaluation work,
affected citizens. Thus, there exist legal faithfully applying them, and periodically
deterrents to and remedies for illicit, politically engaging outside evaluators to assess one’s
controlled evaluations that use public funds. evaluation work. It is also prudent to develop
advance contracts and memoranda of
While it would be unrealistic to recommend that agreements to ensure that the sponsor and
administrators and other evaluation users not evaluator agree on procedures and safeguards to
obtain and selectively employ information for assure that the evaluation will comply with
political gain, they should not misrepresent their canons of sound evaluation and pertinent legal
politically controlled information-gathering and requirements. Despite these warnings, it can be
reporting activities as sound evaluation. legitimate for evaluators to give private
Evaluators should not lend their names and evaluative feedback to clients, provided that
endorsements to evaluations presented by their applicable laws, statutes, and policies are met
clients that misrepresent the full set of relevant and sound contractual agreements on release of
findings, that present falsified reports aimed at findings are reached and honored.
winning political contests, or that violate
applicable laws and/or prior formal agreements
on release of findings.
Before addressing the next group of study

types, a few additional comments are in order
concerning pseudoevaluation studies. These
approaches have been considered because they
are a prominent part of the evaluation scene.
Sometimes “evaluators” and their clients are co-
conspirators in performing a purposely
misleading study. On other occasions,
evaluators, believing they are doing an
assessment that is impartial, technically sound,
and contracted to inform the public, discover
that their client had other intentions or decides
to abrogate prior evaluation agreements. When
the time is right, the client is able to subvert the
study in favor of producing the desired biased
picture or none at all. It is imperative that
evaluators be more alert than they often are to
III. QUESTIONS/METHODS-ORIENTED\
EVALUATION APPROACHES
Questions/methods-oriented program The methods used in objectives-based studies

evaluation approaches are so labeled because essentially involve specifying operational
they start with particular questions and then objectives and collecting and analyzing
move to the methodology appropriate for pertinent information to determine how well
answering the questions. Only subsequently do each objective was achieved. A wide range of
they consider whether the questions and objective and performance assessments may be
methodology are appropriate for developing and employed. Criterion-referenced tests are
supporting value claims. These studies can be especially relevant to this evaluation approach.
called quasi-evaluation studies, because
sometimes they happen to provide evidence that Ralph Tyler is generally acknowledged to be
fully assesses a program’s merit and worth, the pioneer in the objectives-based type of
while in other cases, their focus is too narrow or study, although Percy Bridgman and E. L.
is only tangential to questions of merit and Thorndike probably should be credited along
worth. Quasi-evaluation studies have legitimate with Tyler.4 Many people have furthered the
uses apart from their relationship to program work of Tyler by developing variations of his
evaluation, since they can focus on important objectives-based evaluation model. A few of
questions, even though they are narrow in them are Bloom et al. (1956), Hammond
scope. The main caution is that these types of (1972), Metfessel and Michael (1967), Popham
studies not be uncritically equated with (1969), Provus (1971), and Steinmetz (1983).
evaluation.
The objectives-based approach is especially
Approach 3: Objectives-Based Studies applicable in assessing tightly focused projects
that have clear, supportable objectives. Even
The objectives-based study is the classic then, such studies can be strengthened by
example of a questions/methods-oriented judging project objectives against the intended
evaluation approach (Madaus & Stufflebeam, beneficiaries’ assessed needs, searching for side
1988). In this approach, some statement of effects, and studying the process as well as the
objectives provides the advance organizer. The outcomes.
objectives may be mandated by the client,
formulated by the evaluator, or specified by the Undoubtedly, the objectives-based study has
service providers. The usual purpose of an been the most prevalent approach used in the
objectives-based study is to determine whether name of program evaluation. It is one that has
the program’s objectives have been achieved. good common sense appeal; program
Program developers, sponsors, and managers administrators have had a great amount of
are typical audiences for such a study. These experience with it; and it makes use of
audiences want to know the extent to which technologies of behavioral objectives and both
each stated objective was achieved. norm-referenced and criterion-referenced
testing. Common criticisms are that such
12 Stufflebeam
studies lead to terminal information that is of groups; school boards; and local, state, and
little use in improving a program or other national funding organizations. The main
enterprise; that this information often is far too question that the groups want answered
narrow in scope to constitute a sufficient basis concerns whether each involved service
for judging the object’s merit and worth; provider and organization charged with
relatedly, that they do not uncover positive and responsibility for delivering and improving
negative side effects; and that they may credit services is carrying out its assignments and
unworthy objectives. achieving all it should, given the investments of
resources to support the work.
Approach 4: Accountability, Particularly
Payment By Results Studies A wide variety of methods have been used to
ensure and assess accountability. These include
The accountability study became prominent in performance contracting; Program Planning and
the early 1970s. Its emergence seems to have Budgeting System (PPBS); Management By
been connected to widespread disenchantment Objectives (MBO); Zero Based Budgeting;
with the persistent stream of evaluation reports mandated “program drivers” and indicators;
indicating that almost none of the massive state program input, process, output databases;
and federal investments in educational and independent goal achievement auditors;
social programs were making any positive, procedural compliance audits; peer review;
statistically discernable difference. One merit pay for individuals and/or organizations;
proposed solution posited that accountability collective bargaining agreements; mandated
systems could be initiated to ensure both that testing programs; institutional report cards; self-
service providers would carry out their studies; site visits by expert panels; and
responsibilities to improve services and that procedures for auditing the design, process, and
evaluators would do a thorough job of results of self-studies. Also included are
identifying the effects of improvement programs mandated goals and standards, decentralization
and determining which persons and groups were and careful definition of responsibility and
succeeding and which were not. authority, payment by results, awards and
recognition, sanctions, takeover/intervention
The advance organizers for the accountability authority by oversight bodies, and competitive
study are the persons and groups responsible for bidding.
producing results, the service providers’ work
responsibilities, and the expected outcomes. Lessinger (1970) is generally acknowledged as
The study’s purposes are to provide constituents a pioneer in the area of accountability. Some of
with an accurate accounting of results, to ensure the people who have extended Lessinger’s work
that the results are primarily positive, and to are Stenner and Webster, in their development
pinpoint responsibility for good and bad of a handbook for conducting auditing
outcomes. Sometimes accountability programs activities,5 and Kearney, in providing leadership
administer both sanctions and rewards to the to the Michigan Department of Education in
responsible service providers, depending on the developing the first statewide educational
extent and quality of their services and accountability system. A recent major attempt
achievement. at accountability, involving sanctions and
rewards, was the ill-fated, heavily-funded
The questions addressed in accountability Kentucky Instructional Results Information
studies come from the program’s constituents System (Koretz & Barron, 1998). The failure of
and controllers, such as taxpayers; parent this program was clearly associated with fast
Questions/Methods-Oriented Evaluation Approaches 13
pace implementation in advance of validation, both freedom to innovate on procedures and

reporting and later retraction of flawed results, clear expectations and requirements for
results that were not comparable to those in producing and reporting on sound outcomes. In
other states, payment by results that fostered addition, setting up healthy, fair competition
teaching tests and other cheating in the schools, between comparable programs can result in
and heavy expense–associated with performance better services and products for consumers.
assessments–that could not be sustained over
time. Kirst (1990) analyzed the history and A main disadvantage is that accountability
diversity of attempts at accountability in studies often issue invidious comparisons and
education within the following six broad types thereby produce unhealthy competition and
of accountability: performance reporting, much political unrest and acrimony among
monitoring and compliance with standards or educators and between them and their
regulations, incentive systems, reliance on the constituents. Also, accountability studies often
market, changing locus of authority or control of focus too narrowly on outcome indicators and
schools, and changing professional roles. can undesirably narrow the range of services
provided. Another disadvantage is that
Accountability approaches are applicable to politicians tend to force the implementation of
organizations and professionals funded and accountability efforts before the needed
charged to carry out public mandates, deliver instruments, scoring rubrics, assessor training,
public services, implement specially funded etc. can be planned, developed, field-tested, and
programs, etc. It behooves these program validated. Furthermore, prospects for rewards
leaders to maintain a dynamic baseline of and threats of sanctions have often led service
information needed to demonstrate fulfillment providers to cheat in order to assure positive
of responsibilities and achievement of positive evaluation reports. For example, in schools,
results. They should focus accountability cheating to obtain rewards and avoid sanctions
mechanisms especially on those program has frequently generated bad teaching, bad
elements that can be changed with the prospect press, and turnover in leadership.
of improving outcomes. They should also focus
accountability to enhance staff cooperation Approach 5: Objective Testing Programs
toward achievement of collective goals rather
than to stimulate counterproductive competition. Since the 1930s, American education has been
Moreover, accountability studies that compare inundated with standardized, multiple choice,
different programs should fairly consider the norm-referenced testing programs. Probably
programs’ different contexts, including every school district in the United States has
especially beneficiaries’ characteristics and some type of standardized testing program of
needs, local support, available resources, and this type. Such tests are administered annually
external forces. by local school districts and/or state education
departments to inform students, parents,
The main advantages of accountability studies educators, and the public at large about the
are that they are popular among constituent achievements of children and youth. Their main
groups and politicians and are aimed at purposes are to assess the achievements of
improving public services. Also, they can individual students and groups of students
provide program personnel with clear compared to norms and/or standards. Typically,
expectations against which to plan, execute, and these tests are administered to all students in
report on their services and contributions. They applicable grade levels. Because these test
can also be designed to give service providers results focus on student outcomes and are
14 Stufflebeam
conveniently available, many educators have programs, the American College Testing
tried to use the results to evaluate the quality of Program, the National Merit Scholarship
special projects and specific school programs by Testing Program, and the General Educational
inferring that high scores reflect successful Development Testing Program, as well as the
efforts and that low scores reflect poor efforts. Measurement Research Center at the University
Such inferences can be erroneous if the tests of Iowa. Many people have contributed
were not targeted on particular project or substantially to the development of educational
program objectives or the needs of particular testing in America, including Ebel (1965),
target groups of students and if the students’ Flanagan (1939), Lord and Novick (1968), and
background characteristics were not taken into Thorndike (1971). In the 1990s a number of
account. persons innovated in such areas of testing as
item response theory (Hambleton &
Advance organizers for standardized Swaminathan, 1985) and value-added
educational tests include areas of the school measurement (Sanders & Horn, 1994; Webster,
curriculum and specified norm groups. The 1995).
main purposes of testing programs are to
compare the test performance of individual Virtually all public schools in the U.S. engage
students and groups of students to those of in one or more forms of standardized, objective
selected norm groups and/or to diagnose achievement testing. If the school’s personnel
shortfalls related to particular objectives. carefully select such tests and use them
Additionally, standardized test results are often appropriately to assess and improve student
used to compare the performance of different learning and report to the public, the involved
programs, schools, etc., and to examine expense and effort is highly justified. However,
achievement trends across years. Metrics used they should be careful not to rely on these
to make the comparisons typically are results for evaluating specially targeted projects
standardized individual and mean scores for the and programs. Student outcome measures for
total test and subtests. judging specific projects and programs must be
validated in terms of the particular objectives
The sources of questions addressed by testing and the characteristics and needs of the students
programs are usually test publishers and test being served by the program.
development/selection committees. The typical
question addressed by these tests concerns The main advantages of standardized-testing
whether the test performance of individual programs are that they are efficient in producing
students is at or above the average performance valid and reliable information on student
of local, state, and national norm groups. Other performance in many areas of the school
questions may concern the percentages of curriculum and that they are a familiar strategy
students who surpassed one or more cut-score at every level of the school program in virtually
standards, where the group of students ranks in all school districts in the United States. The
comparison with other groups, or whether the main limitations are that they provide data only
current year’s achievement is better than in prior about student outcomes; they reinforce students’
years. The main process involved in using multiple-choice test-taking behavior rather than
testing programs is to select, administer, score, their writing and speaking behaviors; they tend
interpret, and report the tests. to address only lower-order learning objectives;
and, in many cases, they are perhaps a better
Lindquist (1951), a major pioneer in this area, indicator of the socioeconomic levels of the
was instrumental in developing the Iowa testing students in a given program, school, or school
district than of the quality of the implicated between classes, curricular areas, grade levels,
teaching and learning. Stake (1971) and others teachers, schools, different size and resource
have argued effectively that standardized tests classifications of schools, districts, and different
often are poor approximations of what teachers areas of a state. This approach differs from the
actually teach. Moreover, as has been patently typical standardized achievement testing
clear in evaluations of programs for both program in its emphasis on uncovering and
disadvantaged students and gifted students, analyzing policy issues rather than only
norm-referenced tests often do not measure reporting on students’ progress. Otherwise, the
achievements well for the low and high scoring two approaches have much in common.
students. Unfortunately, program evaluators
often have made uncritical uses of standardized The advance organizers in monitoring
test results to judge a program’s outcomes, just outcomes and employing value-added analysis
because the results are conveniently available are the indicators of expected and possible
and have face validity to the public. Many outcomes and the scheme for classifying results
times the contents of such tests do not match the to examine policy issues and/or program effects.
program’s objectives. Also, they may measure The purposes of Outcomes Monitoring/Value-
well the differences between students in the Added Assessment systems are direction for
middle of the achievement distribution but policymaking, accountability to constituents,
poorly for the slow learners often targeted by and feedback for improving programs and
special education programs and high achievers. services. This approach also ensures
standardization of data for assessment and
Approach 6: Outcomes Monitoring/Value- improvement throughout a system. The source
Added Assessments of questions to be addressed by such
monitoring systems originate from funding
Recurrent outcomes/value-added assessment is organizations, policymakers, the system’s
a special case of the use of standardized testing professionals, and constituents.
to evaluate the effects of programs and policies.
The emphasis here is on annual testing in order Illustrative questions addressed by Outcomes
to assess trends and partial out effects of the Monitoring/Value-Added Assessment systems
different levels and components of an are To what extent are particular programs
educational system. Characteristic of this adding value to students’ achievement? What
approach is the cyclical collection of outcome are the cross-year trends in outcomes? In what
measures based on standardized indicators, sectors of the system is the program working
analysis of results in relation to policy best and poorest? What are key, pervasive
questions, and reporting of overall results plus shortfalls in particular program objectives that
specific policy-relevant analyses. The main require further study and attention? To what
interest is in an aggregate, not individual extent are program successes and failures
performance. A state education department associated with the system’s different
may regularly collect achievement data from all organizational levels?
students (at selected grade levels), as is the case
in the Tennessee Value-Added Assessment Developers of the Outcomes Monitoring/Value-
System. The evaluator may analyze the data to Added Assessment approach include especially
look at contrasting results related to particular William Sanders and Sandra Horn (1994);
objectives for schools using and not using William Webster (1995); Webster, Mendro, and
particular programs. These results may be Almaguer (1994); and Peter Tymms (1995).
further broken out to make comparisons These developers have used census data on
16 Stufflebeam
student achievement trends to diagnose areas for removed from the static, cut score standards.
improvement and look for effects of programs Also, Sanders and Horn (1994) have shown that
and policies. What distinguishes the Outcomes use of static cut scores may produce a “shed
Monitoring/Value-Added Assessment approach pattern,” in which students who began below
from the traditional standardized testing the cut score make the greatest gains while those
program is sophisticated analysis of data to who started above the cut score standard make
partial out effects of programs and policies and little progress. Like the sloping roof of a tool
to identify areas where new policies and shed, the gains are greatest for previously low
programs are needed. In contrast to these scoring students and progressively lower for the
applications, the typical standardized testing higher achievers. This suggests that teachers are
program is focused more on providing feedback concentrating mainly on getting students to the
on the performance of individual students and cut score standard but not beyond it and thus
groups of students, without the attendant policy- “holding back the high achievers.” This
oriented analysis. Probably the Outcomes approach makes efficient use of standardized
Monitoring/Value-Added Assessment approach tests; is amenable to analysis of trends at state,
is mainly feasible for well-endowed state district, school, and classroom levels; uses
education departments and large school districts students as their own controls; and emphasizes
where there is strong support from policy service to every student.
groups, administrators, and service providers to
make the approach work. It requires A major disadvantage of this approach is that it
systemwide buy-in; politically effective leaders is politically volatile, since it is used to identify
to continually explain and sell the program; a responsibility for successes and failures down to
smoothly operating, dynamic, computerized the levels of schools and teachers. Also, it is
baseline of relevant input and output constrained mainly to use quantitative
information; highly skilled technicians to make information such as that coming from
it run efficiently and accurately; complicated standardized, multiple choice achievement tests.
statistical analysis; and high-level commitment Consequently, the complex and powerful
to use the results for purposes of policy analyses are based on a limited scope of
development , accountabi lity, program outcome variables. Nevertheless, Sanders
evaluation, and improvement at all levels of the (1989) has argued that a strong body of
system. evidence supports the use of well-constructed,
standardized, multiple choice achievement tests.
The central advantage of Outcomes Beyond the issue of outcome measures, the
Monitoring/Value-Added Assessment is in the approach does not provide in-depth
systematization and institutionalization of a documentation of program inputs and processes
database of outcomes that can be used over time and makes little if any use of qualitative
and in a standardized way to study and find methods. Despite the advancements in
means to improve outcomes. Also, Outcomes objective measurement and the employment of
Monitoring/Value-Added Assessment is hierarchical mixed models to defensibly partial
conducive to using a standard of continuous out effects of a system’s organizational
progress across years for every student as components and individual staff members,
opposed to employing static cut scores. The critics of the approach argue that causal factors
latter, while prevalent in accountability are so complex that no measurement and
programs, basically fail to take into account analysis system can fairly fix responsibility to
meaningful gains by low or high achieving the level of teachers for the academic progress
students, since these gains usually are far of individual and collections of students.
Approach 7: Performance Testing determine scoring rubrics; define standards for

assessing performance; train and calibrate
In the 1990s, there were major efforts to offset scorers; validate the measures; and administer,
the limitations of the typical multiple choice score, interpret, and report the test results.
tests by employing performance or authentic
measures. These are devices that require In speaking of licensing tests, Flexner (1910)
students to demonstrate the performance being called for tests that ascertain students’ practical
assessed by producing authentic responses, such ability to successfully confront and solve
as written or spoken answers, musical or problems in concrete cases. Some of the
psychomotor presentations, portfolios of work pioneers in applying performance assessment to
products, or group solutions to defined state education systems were the state education
problems. Arguments given for such departments in Vermont and Kentucky
performance tests are that they have high face (Kentucky Department of Education, 1993;
validity and model and reinforce the skills that Koretz, 1986, 1996; Koretz & Barron, 1998).
students should be acquiring through their Other sources of information about the general
studies. For example, students are not being approach and issues in performance testing
taught so that they will do well in choosing best include Baker, O’Neil, and Linn (1993);
answers from a list, but so that they will master Herman, Gearhart, and Baker (1993); Linn,
the underlying understandings and skills and Baker, and Dunbar (1991); Mehrens (1972);
effectively apply them to real life problems. Messick (1994); Stillman, Haley, Regan,
Philbin, Smith, O’Donnell, and Pohl (1991);
The advance organizers in performance Swanson, Norman, and Linn (1995); Torrance
assessments are life skill objectives and content- (1993); and Wiggins (1989).
related performance tasks plus ways that their
achievement can be demonstrated in practice. Often it is difficult to obtain the conditions
The main purpose of performance tests is to necessary to employ the performance testing
compare the test performance of individual approach. It requires a huge outlay of time and
students and groups of students to model resources for development and application.
performance on the assessment tasks. Grades Typically, state education departments and
assigned to each respondent’s performance, school districts probably should use this
using set rubrics, enables assessment of the approach very selectively and only when they
quality of achievements represented and can make the investment needed to produce
comparisons across groups. valid results that are worth the large, required
investment. On the other hand, students’
The sources of questions that performance tests writing ability is best assessed and nurtured
address are analyses of selected life skill tasks through obtaining, assessing, and providing
and content specifications in curricular critical feedback on students’ writing samples.
materials. The typical questions addressed by
performance tests concern whether individual The main advantages of performance testing
students can effectively write, speak, figure, programs are that they require students to
analyze, lead, work cooperatively, and solve construct responses to assessment tasks that are
given problems up to the level of acceptable akin to what they will have to do in real life.
standards. The main process involved in using They eliminate guessing from the testing task.
performance tests is to define areas of skills to They also reinforce life skills, such as being able
be assessed; select the type of assessment to write or otherwise construct responses rather
device; construct the assessment tasks; than pass multiple choice tests.
18 Stufflebeam
Major disadvantages of the approach are heavy Kaplan’s (1964) famous warning against the so-
time requirements for administration; high costs called “law of the instrument,” whereby a given
of scoring; difficulty in achieving reliable method is equated to a field of inquiry. In such
scores; narrow scope of skills that can feasibly a case, the field of inquiry is restricted to the
be assessed; and lack of norms for questions that are answerable by the given
comparisons, especially at the national level. In method. Fisher (1951) specifically warned
general, performance tests are inefficient, costly, against equating his experimental methods with
and often of dubious reliability. Moreover, science. Similarly, experimental design is a
compared with multiple choice tests, method that can contribute importantly to
performance tests, in the same amount of testing program evaluation, as Nave, Misch, and
time, can cover only a much narrower range of Mosteller (1999) have demonstrated, but by
questions. itself it is often insufficient to address a client’s
full range of evaluation questions.
Approach 8: Experimental Studies
The advance organizers in experimental studies
In using controlled experiments, program are problem statements, competing treatments,
evaluators randomly assign subjects or groups hypotheses, investigatory questions, and
of subjects to experimental and control groups randomized treatment and comparison groups.
and then contrast the outcomes when the The usual purpose of the controlled experiment
experimental group receives a particular is to determine causal relationships between
intervention and the control group receives no specified independent and dependent variables,
special treatment or some different treatment. such as a given instructional method and student
This type of study was quite prominent in standardized-test performance. It is particularly
program evaluation during the late 1960s and noteworthy that the sources of questions
early 1970s, when there was a federal investigated in the experimental study are
requirement to assess the effectiveness of researchers, program developers, and policy
federally funded innovations. However, figures, and not usually a program’s constituents
e x p e r i me n t al p r o g r a m e va l u a t i o n s and practitioners.
subsequently fell into disfavor and disuse. (In
the 1990s, controlled experiments in education The frequent question in the experimental study
h a ve b e e n r a r e [ Na ve , M i s c h , & is, What are the effects of a given intervention
Mosteller,1999].) Apparent reasons for this on specified outcome variables? Typical
decline are that evaluators rarely can meet the methods used are experimental and quasi-
required experimental conditions and experimental designs. Pioneers in using
assumptions and the prevalent finding has been experimentation to evaluate programs are
“no statistically significant result.” Campbell and Stanley (1963), Cronbach and
Snow (1969), and Lindquist (1953). Other
This approach is labeled as a questions-oriented persons who have developed the methodology
or quasi-evaluation strategy because it starts of experimentation substantially for program
with questions and methodology that may evaluation are Boruch (1994); Glass and
address only a narrow set of the questions Maguire (1968); Nave, Misch, and Mosteller
needed to assess a program’s merit and worth. (1999); Suchman (1967); and Wiley and Bock
In the 1960s, Campbell and Stanley (1963) and (1967).
others hailed the true experiment as the only
sound means of evaluating interventions. This Evaluators should consider conducting a
piece of evaluation history reminds one of controlled experiment only when its required
conditions and assumptions can be met. Often information than schools or other organizations
this requires substantial political influence, often need to assess and strengthen their
substantial funding, and widespread programs. On this point, experimental studies
agreement–e.g., among the targeted educators, tend to provide terminal information that is not
parents, and teachers—to submit to the useful for guiding the development and
requirements of the experiment. Such improvement of programs and in fact need to
requirements typically include, among others, a thwart ongoing modifications of the treatments.
stabilized program that will not have to be studied
and modified during the evaluation; the ability to Approach 9: Management Information Systems
establish and sustain comparable program and
control groups; the ability to keep the program The management information system is like the
and control conditions separate and politically controlled approaches, except that it
uncontaminated; and the ability to obtain the supplies managers with the information they need
needed criterion measures from all or at least a to conduct and report on their programs, as
representative group of the members of the opposed to supplying them with the information
program and comparison groups. Evaluability they need to win a political advantage. The
assessment was developed as a particular management information approach is also like the
methodology for determining the feasibility of decision/accountability-oriented approach, which
moving ahead with an experiment (Smith, 1989; will be discussed later, except that the
Wholey, 1995). decision/accountability-oriented approach
provides information needed to both develop
Controlled experiments have a number of and defend a program’s merit and worth, which
advantages. They focus on results and not just goes beyond providing information that managers
intentions or judgments. They provide strong need to implement and report on their
methods for establishing relatively unequivocal management responsibilities.
causal relationships between treatment and
outcome variables; this ability can be especially The advance organizers in most management
significant when program effects are small but information systems include program
important. Moreover, because of the prevalent objectives, specified activities, and projected
use and success of experiments in such fields as program milestones or events. A management
medicine and agriculture, the approach has information system’s purpose, as already implied,
widespread credibility. is to continuously supply managers
with the information they need to plan, direct,
The above advantages are offset by serious control, and report on their programs or spheres
objections to experimenting on school students of responsibility.
and other subjects. It is often considered
unethical or even illegal to deprive the control The sources of questions addressed are the
group of the benefits of special funds for management personnel and their superiors. The
improving services. Likewise, many parents main questions they typically want answered
don’t want schools to experiment on their are, Are program activities being implemented
children by applying unproven interventions. according to schedule, according to budget, and
Typically, schools find it impractical and with the expected results? To provide ready
unreasonable to randomly assign students to access to information for addressing such
treatments and to hold treatments constant questions, these systems regularly store and make
throughout the study period. Also, experimental accessible up-to-date information on the
studies provide a much narrower range of program’s goals, planned operations, actual
20 Stufflebeam
operations, staff, program organization, Nevertheless, given modern database

operations, expenditures, threats, problems, technology, program managers often can and
publicity, achievements, etc. should employ management information
systems in multiyear projects and programs.
Methods employed in management information Program databases can provide information not
systems include system analysis, Program only for keeping programs on track, but also for
Evaluation and Review Technique (PERT), assisting in the broader study and improvement
Critical Path Method, Program Planning and of program processes and outcomes.
Budgeting System (PPBS), Management by
Objectives, computer-based information A major advantage of the use of management
systems, periodic staff progress reports, and information systems is in giving managers
regular budgetary reporting. information they can use to plan, monitor,
control, and report on complex operations. A
Cook (1966) introduced the use of PERT in major difficulty with the application of this
education, and Kaufman (1969) wrote about the industry-oriented type of system to education
use of management information systems in and social services is that the products of many
education. Business schools and programs in such programs are not amenable to a narrow,
computer information systems regularly provide precise definition as is the case with a
courses in management information systems. corporation’s profit and loss statement.
Mainly, these focus on how to set up and Moreover, processes in educational and social
employ computerized information banks for use programs often are complex and evolving rather
in organizational decision making. than straightforward and standardized like those
of manufacturing and business. The
W. Edwards Deming (1986) argued that information gathered in management
managers should pay close attention to process information systems typically lacks the scope of
rather than being preoccupied with outcomes. context, input, process, and outcome
He advanced a systematic approach for information required to assess a program’s merit
monitoring and continuously improving an and worth.
enterprise’s process, arguing that close attention
to the process will result in increasingly better Approach 10: Benefit-Cost Analysis Approach
outcomes. It is commonly said that, in paying
attention to this and related advice from Benefit-cost analysis as applied to program
Deming, Japanese car makers and later the evaluation is a set of largely quantitative
Americans greatly increased the quality of procedures used to understand the full costs of
automobiles (Aguaro, 1990). Bayless and a program and to determine and judge what
Massaro (1992) applied Deming’s approach to those investments returned in objectives
program evaluations in education. Based on achieved and broader social benefits. The aim
this writer’s observations, the approach was not is to determine costs associated with program
well suited to assessing the complexities of inputs, determine the monetary value of the
educational processes—possibly because, unlike program outcomes, compute benefit-cost ratios,
the manufacture of automobiles, educators have compare the computed ratios to those of similar
no definitive, standardized models for linking programs, and ultimately judge the program’s
exact educational processes to specified productivity in economic terms.
outcomes.
The benefit-cost analysis approach to program
evaluation may be broken down into three levels
of procedures: (1) cost analysis of program these to similar ratios for competing programs.
inputs, (2) cost-effectiveness analysis, and (3) Ultimately, benefit-cost studies seek conclusions
benefit-cost analysis. These may be looked at about the comparative benefits and costs of the
as a hierarchy. The first type, cost analysis of examined programs.
program inputs, may be done by itself. Such
analyses entail an ongoing accumulation of a Advance organizers for the overall benefit-cost
program’s financial history. These analyses are approach are associated with cost breakdowns
of use in controlling program delivery and for both program inputs and program outputs.
expenditures. The program’s financial history Program input costs may be delineated by line
can be used to compare the program’s actual items (e.g., personnel, travel, materials,
costs to the projected costs in the original equipment, communications, facilities,
budget and to the costs of similar programs. contracted services, overhead, etc.), by program
Also, cost analyses can be extremely valuable to components, by year, etc. In cost-effectiveness
outsiders who might be interested in replicating analysis, a program’s costs are examined in
the program. relation to each program objective, and these must
be clearly defined and assessed. The more
Cost-effectiveness analysis necessarily includes ambitious benefit-cost analyses look at costs
cost analysis of program inputs to determine the associated with main effects and side effects,
cost associated with the progress toward tangible and intangible outcomes, positive and
achieving each objective. Such analyses might negative outcomes, and short-term and long-
compare two or more programs’ costs and term outcomes—both inside and outside the
successes in achieving the same objectives. A program. Frequently, they also may break
program could be judged superior on cost- down costs by individuals and groups of
effectiveness grounds if it had the same costs beneficiaries. One may also estimate the costs
but superior outcomes as similar programs. Or of foregone opportunities and, sometimes,
the program could still be judged superior on political costs. Even then, the real value of
cost-effectiveness grounds if it achieved the benefits associated with human creativity or
same objectives as more expensive programs. self-actualization are nearly impossible to
Cost-effectiveness analyses do not require estimate. Consequently, the benefit-cost
conversion of outcomes to monetary terms but equation rests on dubious assumptions and
must be keyed to clear, measurable program uncertain realities.
objectives.
The purposes of these three levels of benefit-
Benefit-cost analyses typically build on a cost cost analysis are to gain clear knowledge of
analysis of program inputs and a cost- what resources were invested, how they were
effectiveness analysis. But the benefit-cost invested, and with what effect. In popular
analysis goes further. It seeks to identify a vernacular, cost-effectiveness and benefit-cost
broader range of outcomes than just those analyses seek to determine the program’s “bang
associated with program objectives. It for the buck.” There is great interest in
examines the relationship between the answering this type of question. Policy boards,
investment in a program and the extent of positive program planners, and taxpayers are especially
and negative impacts on the program’s interested to know whether program investments
environment. In doing so, it ascertains and are paying off in positive results that exceed or
places a monetary value on program inputs and are at least as good as those produced by similar
each identified outcome. It identifies a programs.
program’s benefit-cost ratios and compares
22 Stufflebeam
Authoritative information on the benefit-cost Approach 11: Clarification Hearing

approach may be obtained by studying the
writings of Kee (1995), Levin (1983), and The clarification hearing is one label for the
Tsang (1997). judicial approach to program evaluation. This
approach essentially puts a program on trial.
Benefit-cost analysis is potentially important in Role-playing evaluators competitively
most program evaluations. Evaluators and their implement both a damning prosecution of the
clients are advised to discuss this matter program—arguing that it failed—and a defense
thoroughly with their clients, to reach of the program—arguing that it succeeded. A
appropriate advance agreements on what should judge hears these arguments within the
and can be done to obtain the needed cost framework of a jury trial and controls the
information, and to do as much cost- proceedings according to advance agreements
effectiveness and benefit-cost analysis as can be on rules of evidence and trial procedures. The
done well and within reasonable costs. actual proceedings are preceded by the
collection of and sharing of evidence by both
Benefit-cost analysis is an important but sides. The prosecuting and defending evaluators
problematic consideration in program may call witnesses and place documents and
evaluations. Most program evaluations are other exhibits into evidence. A jury hears the
amenable to analyzing the costs of program proceedings and ultimately makes and issues a
inputs and maintaining a financial history of ruling on the program’s success or failure.
expenditures. The main impediment to this is Ideally, the jury is composed of persons
that program authorities often do not want representative of the program’s stakeholders.
anyone other than the appropriate accountants By videotaping the proceedings, the
and auditors looking into the financial books. If administering evaluator can, after the trial,
cost analysis, even at only the input levels, is to compile a condensed videotape as well as
be done, this must be clearly provided for in the printed reports to disseminate what was learned
initial contractual agreements covering the through the process.
evaluation work. Performing cost-effectiveness
analysis can be feasible if cost analysis of inputs The advance organizers for a clarification
is agreed to; if there are clear, measurable hearing are criteria of program effectiveness that
program objectives; and if comparable cost both the prosecuting and defending sides agree
information can be obtained from competing to apply. The judicial approach’s main purpose
programs. Unfortunately, it is usually hard to is to ensure that the evaluation’s audience will
meet all these conditions needed for a successful receive balanced evidence on the program’s
cost-effectiveness analysis. Even more strengths and weaknesses. The key questions
unfortunate is the fact that it is usually essentially are, Should the program be judged a
impractical to conduct a thorough benefit-cost success or failure? Is it as good or better than
analysis. Not only must it meet all the alternative programs that address the same
conditions of the analysis of program inputs and objectives?
cost-effectiveness analysis, but it must also
place monetary values on identified outcomes, Robert Wolf (1975) pioneered the judicial
both those anticipated and those not expected. approach to program evaluation. Others who
applied, tested, and further developed the
approach include Levine (1974), Owens (1973),
and Popham and Carlson (1983).
Based on the past uses of this approach, it can The main thrust of the case study approach is to
be judged as only marginally relevant to delineate and illuminate a program, not
program evaluation. By its adversarial nature, necessarily to guide its development and to
the approach prods the evaluators to present assess and judge its merit and worth. Hence,
biased arguments in order to win their cases. this paper characterizes the case study approach
The approach subordinates truth seeking to as a questions/methods-oriented approach rather
winning. Accuracy suffers in this process. The than an improvement/ accountability approach.
most effective debaters are likely to convince
the jury of their position even when it is poorly The advance organizers in case studies include
founded. Also, the approach is politically the definition of the program, characterization of
problematic, since it generates considerable its geographic and organizational environment,
acrimony. Despite the attractiveness of using the historical period in which it is to be
the law as a metaphor for program evaluation, examined, the program’s beneficiaries and their
with the law’s attendant rules of evidence, the assessed needs, the program’s underlying logic
promise of this application has not been of operation and productivity, and the key roles
fulfilled. There are few occasions in which it involved in the program. A case study program
makes practical sense for evaluators to apply evaluation’s main purpose is to provide
this approach. stakeholders and their audiences with an
authoritative, in-depth, well-documented
Approach 12: Case Study Evaluations explication of the program.
A case-study-based program evaluation is a The case study should be keyed to the questions
focused, in-depth description, analysis, and of most interest to the evaluation’s main
synthesis of a particular program or other audiences. The evaluator must therefore
object. The investigators do not control the identify and interact with the program’s
program in any way. Instead, they look at it as stakeholders. Along the way stakeholders will
it is occurring or as it occurred in the past. The be engaged in helping to plan the study and
study looks at the program in its geographic, interpret findings. Ideally, the audiences
cultural, organizational, and historical contexts. include the program’s oversight body,
It closely examines the program’s internal administrators, staff, financial sponsors,
operations and how it uses inputs and processes beneficiaries, and potential adopters of the
to produce outcomes. It examines a wide range program.
of intended and unexpected outcomes. It looks
at the program’s multiple levels and also Typical questions posed by some or all of the
holistically at the overall program. It above audiences are, What is the program in
characterizes both central, dominant themes and concept and practice? How has it evolved over
variations and aberrations. It defines and time? How does it actually operate to produce
describes the program’s intended and actual outcomes? What has it produced? What are the
beneficiaries. It examines beneficiaries’ needs shortfalls and negative side effects? What are the
and to what extent the program effectively positive side effects? In what ways and to
addressed the needs. It employs multiple what degrees do various stakeholders value the
methods to obtain and integrate multiple sources program? To what extent did the program
of information. While it breaks apart and effectively meet beneficiaries’ needs? What
analyzes a program along various dimensions, it were the most important reasons for the
also provides an overall characterization of the program’s successes and failures? What are the
program. program’s most important unresolved issues?
24 Stufflebeam
How much has it cost? What are the costs per accuracy issues by employing and triangulating
beneficiary, per year, etc.? What parts of the multiple perspectives, methods, and information
program have been successfully transported to sources. It employs all relevant methods and
other sites? How does this program compare information sources. It looks at programs
with what might be called critical competitors? within relevant contexts and describes
The above questions only illustrate the range of contextual influences on the program. It looks
questions that a case study might address, since at programs holistically and in depth. It
each case study will be tempered by the examines the program’s internal workings and
interests of the client and other audiences for the how it produces outcomes. It includes clear
study and the evaluator’s interests. procedures for analyzing qualitative information.
It can be tailored to focus on the audience’s
To conduct effective case studies, evaluators most important questions. It can be done
need to employ a wide range of qualitative and retrospectively or in real time. It can be
quantitative methods. These may include reported to meet given deadlines and
analysis of archives; collection of artifacts, such subsequently updated based on further
as work samples; content analysis of program developments.
documents; both independent and participant
observations; interviews; logical analysis of The main limitation of the approach is that
operations; focus groups; tests; questionnaires; some evaluators may mistake its openness and
rating scales; hearings; forums; and lack of controls as an excuse for approaching it
maintenance of a program database. Reports haphazardly and bypassing steps to assure that
may incorporate in-depth descriptions and findings and interpretations possess rigor as well
accounts of key historical trends; focus on as relevance. Also, because of a preoccupation
critical incidents, photographs, maps, testimony, with descriptive information, the case study
relevant news clippings, logic models, and evaluator may not collect sufficient judgmental
cross-break tables; and summarize main information to permit a broad-based assessment
conclusions. The case study report may include of a program’s merit and worth. Users of this
papers on key dimensions of the case, as approach might slight quantitative analysis in
determined with the audience, as well as an favor of qualitative analysis. By trying to
overall holistic presentation and assessment. produce a comprehensive description of a
Case study reports may involve audio and visual program, the case study evaluator may not
media as well as printed documents. produce timely feedback needed to help in
program development. To overcome these
Case study methods have existed for many potential pitfalls, evaluators using the case study
years and have been applied in such areas as approach should fully address the principles of
clinical psychology, law, the medical sound evaluation as related to accuracy, utility,
profession, and social work. Pioneers in feasibility, and propriety.
applying the method to program evaluation
include Campbell (1975), Lincoln and Guba Approach 13: Criticism and Connoisseurship
(1985), Platt (1992), Stake (1995), and Yin
(1992). The connoisseur-based approach was developed
pursuant to the methods of art criticism and
The case study approach is highly conducive to literary criticism. This approach assumes that
program evaluation. It requires no controls of certain experts in a given substantive area are
treatments and subjects and looks at programs as capable of in-depth analysis and evaluation that
they naturally occur and evolve. It addresses could not be done in other ways. Just as a
national survey of wine drinkers could produce accept and use any evaluation that Dr. Elliott
information concerning their overall preferences Eisner agreed to present, but there are not many
for types of wines and particular vineyards, it Eisners out there.
would not provide the detailed, creditable
judgments of the qualities of particular wines The main advantage of the connoisseur-based
that might be derived from a single connoisseur study is that it exploits the particular expertise
who has devoted a professional lifetime to the and finely developed insights of persons who
study and grading of wines and whose have devoted much time and effort to the study
judgments are highly and widely respected. of a precise area. They can provide an array of
detailed information that the audience can then
The advance organizer for the connoisseur-based use to form a more insightful analysis than
study is the evaluator’s special expertise and otherwise might be possible. The approach’s
sensitivities. The study’s purpose is to describe, disadvantage is that it is dependent on the
critically appraise, and illuminate a particular expertise and qualifications of the particular
program’s merits. The evaluation questions expert doing the program evaluation, leaving
addressed by the connoisseur-based evaluation room for much subjectivity.
are determined by expert evaluators—the critics
and authorities who have undertaken the Approach 14: Program Theory-Based Evaluation
evaluation. Among the major questions they
can be expected to ask are, What are the Program evaluations based on program theory
program’s essence and salient characteristics? begin with either (1) a well-developed and
What merits and demerits distinguish the validated theory of how programs of a certain
particular program from others of the same type within similar settings operate to produce
general kind? outcomes or (2) an initial stage to
approximate such a theory within the context
The methodology of connoisseurship includes of a particular program evaluation. The
the critics’ systematic use of their perceptual former of these conditions is much more
sensitivities, past experiences, refined insights, reflective of the implicit promises in a theory-
and abilities to communicate their assessments. based program evaluation, since the existence
The evaluator’s judgments are conveyed in of a sound theory means that a substantial
vivid terms to help the audience appreciate and body of theoretical development has produced
understand all of the program’s nuances. and tested a coherent set of conceptual,
hypothetical, and pragmatic principles, plus
Eisner (1975, 1983) has pioneered this strategy associated instruments to guide inquiry in the
in education.6 A dozen or more of Eisner’s particular area. Then, the theory can aid a
students have conducted research and program evaluator to decide what questions,
development on the connoisseurship approach, indicators, and assumed linkages between and
e.g., Vallance (1973) and Flinders and Eisner among program elements should be used to
(1994). evaluate a program covered by the theory.
This approach obviously depends on the Some well-developed theories for use in
qualifications of the particular expert chosen to evaluations exist, which gives this approach
do the program evaluation. The approach also some measure of viability. For example,
requires an audience that has confidence in and health education/behavior change programs
is willing to accept and use the connoisseur’s are sometimes founded on validated
report. The author of this paper would willingly theoretical frameworks, such as the Health
26 Stufflebeam
Belief Model (Becker, 1974; Mullen, Hersey, questions include, Is the program grounded in
& Iverson, 1987; Janz & Becker, 1984). an appropriate, well-articulated, and validated
Other examples are the PRECEDE- theory? Is the employed theory up to date and
PROCEED Model for health promotion reflective of recent research? Are the
planning and evaluation (Green & Kreuter, program’s targeted beneficiaries, design,
1991), Bandura’s (1977) Social Cognitive operation, and intended outcomes consistent
Theory, the Stages of Change Theory by with the guiding theory? How well does the
Prochaska and DiClemente (1992), and Peters program address and serve the full range of
and Waterman’s (1982) theory of successful pertinent needs of the targeted beneficiaries?
organizations. When such frameworks exist, If the program is consistent with the guiding
their use probably can enhance a program’s theory, are the expected results being
effectiveness and provide a structure for achieved? Are program inputs and operations
validly evaluating the program’s functioning. producing outcomes in the ways the theory
Unfortunately, however, few program areas predicts? What changes in the program’s
are buttressed by well-articulated and tested design or implementation might produce
theories. better outcomes? What elements of the
program are essential for successful
Thus, most theory-based evaluations begin by replication? Overall, was the program
setting out to develop a theory that theoretically sound, did it operate in
appropriately could be used to guide the accordance with an appropriate theory, did it
particular program evaluation. As will be produce the expected outcomes, were the
discussed later in this characterization, such hypothesized causal linkages confirmed, is the
ad hoc theory development efforts and their program worthy of continuation and/or
linkage to program evaluations are dissemination, and what program features are
problematic. In any case, let us look at what essential for successful replication?
the theory-based evaluator attempts to
achieve. The nature of these questions suggests that the
success of the theory-based approach is
The point of the theory development or dependent on a foundation of sound theory
selection effort is to identify advance development and validation. This, of course,
organi zer s t o guide the evaluation. entails sound conceptualization of at least a
Essentially, these are the mechanisms by context-dependent theory, formulation and
which program activities are understood to rigorous testing of hypotheses derived from
produce or contribute to program outcomes, the theory, development of guidelines for
along with the appropriate description of practical implementation of the theory based
context, specification of independent and on extensive field trials, and independent
dependent variables, and portrayal of key assessment of the theory. Unfortunately, not
linkages. The main purposes of the theory- many program areas in education and the
based program evaluation are to determine the social sciences are grounded in sound
extent to which the program of interest is theories. Moreover, evaluators wanting to
theoretically sound, to understand why it is employ a theory-based evaluation often find it
succeeding or failing, and to provide direction infeasible to conduct the full range of theory
for program improvement. development and validation steps and still to
get the evaluation done on time. Thus, in
Questions for the program evaluation are derived claiming to conduct a theory-based
from the guiding theory. Example
evaluation, evaluators often seem to promise However, if a relevant, defensible theory of

much more than they can deliver. the program’s logic does not exist, evaluators
need not develop one. In fact, if they attempt
The main procedure typically used in these to do so they will incur many threats to their
“theory-based program evaluations” is a evaluation’s success. Rather than evaluating
model of the program’s logic. This may be a the program and its underlying logic, the
detailed flowchart of how inputs are thought evaluators might usurp the program staff’s
to be processed to produce intended responsibility for program design. They
outcomes. It may also be a grounded theory might do a poor job of theory development,
like those advocated by Glaser and Strauss given limitations on time and resources to
(1967). The network analysis of the former develop and test an appropriate theory. They
approach is typically an armchair theorizing might incur the conflict of interest associated
process involving the evaluators and persons with having to evaluate the theory they
who are supposed to know how the program developed. They might pass off an
is expected to operate and produce results. unvalidated model of the program as a theory,
They discuss, scheme, discuss some more, when it meets almost none of the
network, discuss further, and finally produce requirements of a sound theory. They might
networks in varying levels of detail of what is bog down the evaluation in too much effort to
involved in making the program work and develop a theory for the program. They might
how the various elements are linked to also focus attention on a theory developed
produce the desired outcomes. The more early in a program and later discover that the
demanding grounded theory requires a program has evolved to be a quite different
systematic, empirical process of observing enterprise than what was theorized at the
events or analyzing materials drawn from outset. In this case the initial theory could
operating programs followed by an extensive become a “Procrustean bed” for the program
modeling process. evaluation.
Pioneers in applying theory development Overall, there really isn’t much to recommend
procedures to program evaluation include theory-based program evaluation, since doing
Glaser and Strauss (1967) and Weiss (1972, it right is usually not feasible and since failed
1995). Other developers of the approach are or misrepresented attempts can be highly
Bickman (1990), Chen (1990), and Rogers (in counterproductive. Nevertheless, modest
press). attempts to model programs—labeled as
such—can be useful for identifying
In any program evaluation assignment, it is measurement variables, so long as the
reasonable for the evaluator to examine the evaluator doesn’t spend too much time on this
extent to which program plans and operations and so long as the model is not considered as
are grounded in an appropriate theory or fixed or as a validated theory. Also, in the
model. Also, it can be useful to engage in a rare case where an appropriate theory already
modicum of effort to network the program exists, the evaluator can make beneficial use
and thereby seek out key variables and of the theory to help structure and guide the
linkages. As noted previously, in the enviable evaluation and interpret the findings.
but rare situation where a relevant, validated
theory exists, the evaluator can beneficially
apply it in structuring the evaluation and
analyzing findings.
28 Stufflebeam
Approach 15: Mixed Methods Studies basically look at whether objectives were
achieved, but may look for a broader array of
In an attempt to resolve the longstanding outcomes. Qualitative and quantitative
debate about whether program evaluations methods are employed in combination to
should employ quantitative or qualitative assure depth, scope, and dependability of
methods, some authors have proposed that findings. This approach also applies to
evaluators should regularly combine these carefully selected single programs or to
methods in given program evaluations (for comparisons of alternative programs.
example, see t he National Science
Foundation’s 1997 User-Friendly Handbook The basic purposes of the mixed method
for Mixed Method Evaluations). Such approach are to provide direction for
recommendations, along with practical improving programs as they are evolving and
guidelines and illustrations, are no doubt to assess their effectiveness after they have
useful to many program staff members and to had time to produce results. Use of both
evaluators. But in the main, the quantitative and qualitative methods is
recommendation for a mixed method intended to assure dependable feedback on a
approach only highlights a large body of wide range of questions; depth of
longstanding practice of mixed-methods understanding of particular programs; a
program evaluation rather than proposing a holistic perspective; and enhancement of the
new approach. All seven approaches validity, reliability, and usefulness of the full
discussed in the remainder of this section of set of findings. Investigators look to
the paper employ both qualitative and quantitative methods for standardized,
quantitative methods. What sets them apart replicable findings on large data sets. They
from the mixed method approach is that their look to qualitative methods for elucidation of
first considerations are not the methods to be the program’s cultural context, dynamics,
employed but either the assessment of value meaningful patterns and themes, deviant
or the social mission to be served. The mixed cases, diverse impacts on individuals as well
methods approach is included in this section as groups, etc. Qualitative reporting methods
on questions/methods approaches, because it are applied to bring the findings to life,
is preoccupied with using multiple methods making them clear, persuasive, and
rather than using whatever methods are interesting. By using both quantitative and
needed to comprehensively assess a qualitative methods, the evaluator secures
program’s merit and worth. As with the other cross-checks on different subsets of findings
approaches in this section, the mixed methods and thereby instills greater stakeholder
approach may or may not fully assess a confidence in the overall findings.
program’s value; thus, it is classified as a
quasi-evaluation approach. The sources of evaluation questions are the
program’s goals, plans, and stakeholders. The
The advance organizers of the mixed methods stakeholders often include skeptical as well as
approach are formative and summative supportive audiences. Among the important
evaluations, qualitative and quantitative stakeholders are program administrators and
methods, and intra-case or cross-case analysis. staff, policy boards, financial sponsors,
Formative evaluations are employed to beneficiaries, taxpayers, and program area
examine a program’s development and assist experts.
in improving its structure and
implementation. Summative evaluations
The approach may pursue a wide range of presentations, and workshops. They should
questions. Examples of formative evaluation include a balance of narrative and numerical
questions are information.
• To what extent do program activities follow Considering his book on service studies in
the program plan, time line, and higher education, Ralph Tyler (Tyler et al.,
budget? 1932) was certainly a pioneer in the mixed
• To what extent is the program achieving method approach to program evaluation.
its goals? Other authors who have written cogently on
• W h a t p r o b l e m s i n d e s i gn o r the mixed methods approach are Guba and
implementation need to be addressed? Lincoln (1981), Kidder and Fine (1987),
Lincoln and Guba (1985), Miron (1998),
Examples of summative evaluation questions Patton (1990), and Schatzman and Strauss
are (1973).
• To what extent did the program achieve Basically, it is almost always appropriate to
its goals? consider using a mixed methods approach.
• Was the program appropriately effective Certainly, the evaluator should take advantage
for all beneficiaries? of opportunities to obtain any and all
• What interesting stories emerged? potentially available information that is
• What are program stakeholders’ relevant to assessing a program’s merit and
judgments of program operations, worth. Sometimes a study can be mainly or
processes, and outcomes? only qualitative or quantitative, but usually
• What were the important side effects? such studies would be strengthened by
• Is the program sustainable and including both types of information. The key
transportable? point is to choose methods because they can
effectively address the study’s questions, not
The approach employs a wide range of because they are either qualitative or
methods. Among the quantitative methods quantitative.
employed are surveys using representative
samples, both cohort and cross-sectional Key advantages of using both qualitative and
samples, norm-referenced tests, rating scales, qu a nt i t a t i ve methods are that they
quasi experiments, significance tests for main complement each other in ways that are
effects, and a posteriori statistical tests. The important to the evaluation’s audiences.
qualitative methods may include ethnography, Information from quantitative methods tends
document analysis, narrative analysis, to be standardized, efficient, amenable to
purposive samples, single cases, participant standard tests of reliability, easily summarized
observers, independent observers, key and analyzed, and accepted as “hard” data.
informants, advisory committees, structured Information from qualitative approaches adds
and unstructured interviews, focus groups, depth; can be delivered in interesting, story-
case studies, study of outliers, diaries, logic like presentations; and provides a means to
models, grounded theory development, flow explore and understand the more superficial
charts, decision trees, matrices, and quantitative findings. Using both types of
performance assessments. Reports may methods affords important cross-checks on
include abstracts, executive summaries, full findings.
r e p o r t s , o r a l b r i e f i n gs , c o n f e r e n c e
30 Stufflebeam
The main pitfall in pursuing the mixed

methods approach is using multiple methods
because this is the popular thing to do rather
than because the selected methods best
respond to the evaluation questions.
Moreover, sometimes evaluators let the
combination of methods compensate for a
lack of rigor in applying them. Also, using a
mixed methods approach can produce a
schizophrenic evaluation if the investigator
uncritically mixes positivistic and postmodern
paradigms. Along this line, quantitative and
qualitative methods are derived from different
theoretical approaches to inquiry and reflect
different conceptions of knowledge; and many
evaluators do not possess the requisite
foundational knowledge in both the sciences
and humanities to effectively combine
quantitative and qualitative methods. The
approaches in the remainder of this paper
place proper emphasis on mixed methods,
making choice of the methods subservient to the
approach’s dominant philosophy and to
the particular evaluation questions to be
addressed.
The mixed methods approach to evaluation

concludes this paper’s discussion of the
questions/ methods approaches to evaluation.
These 13 approaches tend to concentrate on
selected questions and methods and thus may
or may not fully address an evaluation’s
fundamental requirement to assess a
program’s merit and worth. The array of
these approaches suggests that the field has
advanced considerably since the 1950s when
program evaluations were rare and mainly
used approaches grounded in behavioral
objectives, standardized tests, and/or
accreditation visits.
Tables 1 through 6 summarize the similarities

and differences between the models in
relationship to advance organizers, purposes,
characteristic questions, methods, strengths,
and weaknesses.
Table 1: Comparison of the 13 Quasi-Evaluation Approaches on Most Common ADVANCE ORGANIZERS
Evaluation Approaches (by identification number)*

Advance Organizers
3 4 5 6 7 8 9 10 11 12 13 14 15
Program content/definition U U
Program rationale U
Context U
Treatments U
Time period U
Beneficiaries U
Comparison groups U
Norm groups U
Assessed needs U
Problem statements U
Objectives U U U U
Independent/dependent U U
Indicators/criteria U U
Life skills U
Performance tasks U
Questions/hypotheses/ U U
causal factors
Policy issues U
Tests in use U U
Formative & summative evaluation U
Qualitative & quantitative methods U
Program activities/milestones U
Employee roles & U U

responsibilities
Costs U
Evaluator expertise & sensitivities U
Intra-case/cross-case analysis U
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments,

9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship,
14. Program theory-based, 15. Mixed methods.
32 Stufflebeam
Table 2: Comparison of the 13 Quasi-Evaluation Approaches on Primary EVALUATION PURPOSES

Evaluation Purposes
3 4 5 6 7 8 9 10 11 12 13 14 15
Determine whether program U U U U U

objectives were achieved
Provide constituents with an U U U U U

accurate accounting of results
Assure that results are positive U
Assess learning gains U
Pinpoint responsibility for U U U

good & bad outcomes
Compare students’ test scores U

to norms
Compare students’ test U U U

performance to standards
Diagnose program U U U U U
shortcomings
Compare performance of U U U U U
competing programs
Examine achievement trends U U
Inform policymaking U U U U
Direction for program U U U U

improvement
Ensure standardization of U U
outcome measures
Determine cause and effect U U

relationships in programs
Inform management decisions U

& actions
Assess investments and payoffs U
Provide balanced information U

on strengths & weaknesses U
Explicate & illuminate a U U

program
Describe & critically appraise a U

program
Assess a program’s theoretical U

soundness
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8.

Experiments, 9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13.
Criticism & connoisseurship, 14. Program theory-based, 15. Mixed methods.
Table 3: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION QUESTIONS

Evaluation Questions
3 4 5 6 7 8 9 10 11 12 13 14 15
To w hat extent w as each program U U U

objective achieved?
D id the program effectively discharge U U

its responsibilities?
D id tested performance meet or exceed U

pertinent norms?
D id tested performance meet or exceed U U

standards?
W here does a group’s tested U U

performance rank compared w ith other
groups?
Is a group’s present performance better U U U

than past performance?
W hat sectors of a system are U

performing best and poorest?
W here are the shortfalls in specific U

curricular areas?
At w hat grade levels are the strengths U

& shortfalls?
W hat value is being added by U

particular programs?
To w hat extent can students effectively U

speak, w rite, figure, analyze, lead,
w ork cooperatively, & solve problems?
W hat are a program’s effects on U U

outcomes?
Are program activities being U

implemented according to schedule,
budget, & expected results?
W hat is the program’s return on U

investment?
Is the program sustainable & U U U

transportable?
Is the program w orthy of continuation U U U U U

and/or dissemination?
Is the program as good or better than U U U

others that address the same
objectives?
W hat is the program in c oncept & U U

practice?
34 Stufflebeam
Table 3: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION QUESTIONS
How has the program evolved over U

time?
How does the program produce U U

outcomes?
W hat has the program produced? U U
W hat are the program’s shortfalls & U U

negative side effects?
W hat are the program’s positive side U U

effects?
How do various stakeholders value the U U

program?
D id the program meet all the U U U

beneficiaries’ needs?
W hat w ere the most important reasons U U

for the program’s success or failure?
W hat are the program’s most important U

unresolved issues?
How much did the program cost? U U
W hat w ere the costs per beneficiary, U U

per year, etc.?
W hat parts of the program w ere U

successfully transported to other sites?
W hat are the program’s essence & U U

salient characteristics?
W hat merits & demerits distinguish the U U

program from similar programs?
Is the program grounded in a validated U

theory?
Are program operations consistent w ith U

the guiding theory?
W ere hypothesized causal linkages U U

confirmed?
W hat changes in the program’s design or U U U U U U U

implementation might produce better
outcomes?
W hat program features are essential for U U U

successful replication?
W hat interesting stories emerged? U U
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management
information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-
based, 15. Mixed methods.
Table 4: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION METHODS

3 4 5 6 7 8 9 10 11 12 13 14 15
O perational objectives U U
C riterion-referenced test U U U U
Performance contracting U
Program Planning & Budgeting U U

System
Program Evaluation & R eview U

Technique
M anagement by objectives U U U
Staff progress reports U
Financial reports & audits U
Zero Based Budgeting U
C ost analysis, cost-effectiveness U

analysis, & benefit-cost analysis
M andated “program drivers” & U

indicators
Input, process, output databases U U
Independent goal achievement U U

auditors
Procedural compliance audits U
Peer review U
M erit pay for individual and/or U

organizations
C ollective bargaining agreements U
Trial proceedings U
M andated testing U U
Institutional report cards U
Self-studies U
Site visits by experts U
Program audits U
Standardized testing U U U U
Performance measures U U U
C omputerized or other database U U U

36 Stufflebeam
Table 4: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION METHODS
Hierarchical mixed model analysis U
Policy analysis U
Exp erimental & quasi-experimental U U

designs
Study of outliers U U U
System analysis U
Analysis of archives U U
C ollection of artifacts U U
Log diaries U
C ontent analysis U U
Independent & participant observers U U
Key informants U U
Advisory committees U
Interview s U U
O perational analysis U
Focus group U U
Q uestionnaires U U
R ating scales U U
Hearings & forums U U
In-depth descriptions U
Photographs U
C ritical incidents U
Testimony U U U
Flow charts U
D ecision trees U
Logic models U U U
G rounded theory U U
N ew s clippings analysis U U
C ross-break tables U U U U
Expert critics U U U U
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management
information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-
based, 15. Mixed methods.
Table 5: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent STRENGTHS

3 4 5 6 7 8 9 10 11 12 13 14 15
C ommon senses appeal U U U U U U U
W idely know n & applied U U U U
Employs operational objectives U
Employs the technology of testing U U U U U U
Efficient use of standardized tests U U
Popular among constituents & U U U U

politicians
Focus on improving public services U
C an focus on audience’s most U U U U

important questions
D efines obligations of service U

providers
R equires production of and reporting U

on positive outcomes
Seeks to improve services through U U

competition
Efficient means of data collection U U U
Stress and validity & reliability U U U U
Triangulates findings from multiple U U U

sources
U ses institutionalized database U
M onitors progress on each student U U
Emphasizes service to every student U
Hierarchical analysis of achievement U
C onducive to policy analysis U U
Employs trend analysis U
Strong provision for analyzing U U U

qualitative information
R ejects use of artificial cut scores U U
C onsiders student background by U

using students as their ow n controls
C onsiders contextual influences U U U

38 Stufflebeam
Table 5: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent STRENGTHS
U ses authentic measures U U
Eliminates guessing U
R einforces life skills U
Focuses on outcomes U U U U U U U
Focuses on a program’s strengths & U U U

w eaknesses
D etermines cause & effects U
E x a m ine s program’s inte rn a l U U

w o rk in g s & h o w it p ro d uc e s
outcomes
G uides program management U
Helps keep programs on track U
G uides broad study & improvement U U

of program processes & outcomes
C an be done retrospectively or in U U U U
real time
D ocuments costs of program inputs U
M aintains a financial history for the U

program
C ontrasts program alternatives on U

both costs & outcomes
Employs rules of evidence U
R equires no controls of treatments U U

& participants
E x a m ine s programs as they U U

naturally occur
Examines programs holistically & U U

in depth
Engages experts to render refined U U

descriptions & judgements
Y ields in-depth, refined, effectively U U

communicated analysis
Employs all relevant information U U

sources & methods
Stresses complementarity of U U
qualitative & quantitative methods
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9.

Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship,
14. Program theory-based, 15. Mixed methods.
Table 6: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent WEAKNESSES/LIMITATIONS

Weaknesses/Limitations
3 4 5 6 7 8 9 10 11 12 13 14 15
M ay credit unw orthy objectives U
M ay define a program’s success in terms U

that are too narrow and mechanical and
not attuned to beneficiaries’ various needs
M ay employ only low er-order learning U U U

objectives
R elies almost exclusively on multiple U U

choice test data
M ay indicate mainly socioeconomic

status, not quality of teaching & learning U
M ay reinforce & overemphasize multiple U U

choice test taking ability to the exclusion
of w riting, speaking, etc.
M ay poorly test w hat teachers teach U U
Y ields mainly terminal information that U U

lacks utility for program improvement
Provides data only on student outcomes U U U U
N arrow scope of skills that can feasibly U

be assessed
M ay provide too narrow an information

basis for judging a program’s merit & U U U U U U U U
w orth
M ay employ many methods because it is

the thing to do rather than because they U
are needed
M ay inappropriately &
counterproductively mix positivistic & U
postmodern paradigms
M ay oversimplify the complexities

involved in assigning responsibility for U
student learning gains to individual
teachers
M ay miss important side effects U U U U U
M ay rely too heavily on the expertise & U

judgment of a single evaluator
M ay issue invidious comparisons U U U U
M ay produce unhealthy competition U U U U U
M ay provoke political unrest U U U U U
Accuracy suffers in the face of competing U

evaluations
40 Stufflebeam
Table 6: Comparison of the 13 Quasi-Evaluation Approaches on Prevalent WEAKNESSES/LIMITATIONS
M ay undesirably narrow the range of U U

program services
Politicians tend to press for premature U U U

implementation
G ranting rew ards & sanctions may produce U U U

cheating
Inordinate time requirements for U

administration & scoring
High costs of scoring U
D ifficulty in achieving reliability U
High cost U
Low feasibility U U U
M ay inappropriately deprive control group U

subjects of entitlements
C arries a connotation of experimenting on U

children or other subjects using unproven
methods
R equirements of random assignments is U

often not feasible
Tend to stifle continual improvement of the U

program
V ital data may be inaccessible to evaluators U
Investigators may mistake the approach’s U

openness & lack of controls as license to
ignore rigor
Evaluators might unsurp the program staff’s U

responsibilities for program design
M ight ground an evaluation in a hastily U

developed, inadequate program theory
M ight develop a conflict of interest to U

defend the evaluation-generated program
theory
M ight bog dow n the evaluation in a U

seemingly endless process of program
theory development
M ight create a theory early in a program U

and impede the program from redefinition
and refinement
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments,

9. Management information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism &
connoisseurship, 14. Program theory-based, 15. Mixed methods.
III. IMPROVED/ACCOUNTABILITY-ORIENTED
EVALUATION APPROACHES
This paper turns next to a set of approaches that Decision makers, decision situations, and
stress the need to fully assess a program’s merit program accountability requirements provide
and worth, whatever the required questions and useful advance organizers for
methods. These are the improvement/ decision/accountability-oriented studies. The
accountability-oriented evaluation approaches, approach emphasizes that decision makers
labeled Decisions/Accountability, Consumer- include not just top managers but stakeholders
Orientation, and Accreditation. Respectively, at all organizational levels of a program. From
these three approaches emphasize improvement the bottom up, such stakeholders may include
through serving program decisions, providing beneficiaries, parents and guardians, service
consumers with assessments of optional providers, administrators, support personnel,
programs and services, and helping consumers to policy boards, funding authorities, taxpayers,
gain assurances that given programs are etc. The generic decision situations to be served
professionally sound. may include formulation of goals and priorities,
identification and assessment of competing
Approach 16: Decision/Accountability-Oriented approaches, planning and budgeting program
Studies operations, staffing programs, carrying out
planned activities, judging outcomes,
The decision/accountability-oriented approach determining how best to use programs,
emphasizes that program evaluation should be recycling program operations, etc. Key classes
used proactively to help improve a program as of needed evaluative information are
well as retroactively to judge its merit and assessments of needs, problems, and
worth. As mentioned previously, the opportunities; identification and assessment of
decision/accountability-oriented approach competing program approaches; assessment of
should be distinguished from management program plans; asse ssment of staff
information systems and from politically qualifications and performance; assessment of
controlled studies because of the emphasis in program facilities and materials; monitoring and
decision/accountability-oriented studies on assessment of program implementation;
questions of merit and worth. The approach’s assessment of intended and unintended and
philosophical underpinnings include an short-range and long-range outcomes; and
objectivist orientation to finding best answers to assessment of cost-effectiveness.
context-limited questions and subscription to the
principles of a well-functioning democratic Basically, the purpose of decision/accountability
society, especially human rights, equity, studies is to provide a knowledge and value
excellence, conservation, and accountability. base for making and being accountable for
Practically, the approach is oriented to engaging decisions that result in developing, delivering,
stakeholders in focusing the evaluation; and making informed use of cost-effective
addressing their most important questions; services. Serving this purpose requires that
providing timely, relevant information to assist evaluators interact with representative members
decision making; and producing an of their audiences and supply them with
accountability record. relevant, timely, efficient, and accurate
evaluative feedback. A theme of this approach
42 Stufflebeam
is that the most important purpose of evaluation order to help define evaluation questions, shape
is not to prove but to improve. evaluation plans, review draft reports, and help
disseminate findings. This panel should include
The sources of questions addressed by the representatives of all stakeholder groups. The
decision/accountability-oriented approach are evaluator’s exchanges with this group involve
the concerned and involved stakeholders. These conveyance of evaluation feedback that may be
may include all persons and groups who must of use in program improvement and use, also
make choices related to initiating, planning, planning what future evaluation activities and
implementing, and using a program’s services. reports would be most helpful to program
Main questions addressed are, What beneficiary personnel and other stakeholders. Interim
needs should be addressed? What are the reports may also assist beneficiaries, program
available alternatives for addressing these needs, staff, and others to obtain feedback on the
and what are their comparative merits? What program’s merits and worth. By maintaining a
plan of services should be operationalized and dynamic baseline of evaluation information and
delivered? What facilities, materials, and ways that the information was applied, the
equipment are needed? Who should conduct evaluator can use this information to develop a
the program? What roles should the different comprehensive summative evaluation report, to
participants carry out? Is the program working present periodic feedback to the broad group of
and should it be revised in any way? Is the stakeholders, and to supply program personnel
program effectively reaching all the targeted with information they need to make their own
beneficiaries and meeting their needs? Were the accountability reports.
program staff members responsible and effective
in carrying out their responsibilities to Involvement of stakeholders, as a key feature of
implement the program and meet the this approach, is consistent with a key principle
beneficiaries’ needs? Is the program better than of the change process. An enterprise—read
competing alternatives? Is it sustainable? Is it evaluation here—can best help bring about
transportable? Is the program worth the required change in a target group’s behavior if that group
initial investment? Answers to these and related was involved in planning, monitoring, and
questions are to be based on the underlying assessing outcomes of the enterprise. By
standard of good programs, i.e., they must involving stakeholders throughout the
effectively reach and serve the beneficiaries’ evaluation process, decision-oriented evaluators
targeted needs at a reasonable cost and do so as lay the groundwork for bringing stakeholders to
well or better than reasonably available understand and value the evaluation process and
alternatives. apply the findings.
Many methods may be used in decision/ Cronbach (1963) first introduced educators to
accountability-oriented program evaluations. the idea that evaluation should be reoriented
Among others, these include surveys, needs from its objectives-based history to a concern
assessments, case studies, advocate teams, for helping program personnel make better
observations, interviews, resident evaluators, decisions about how to deliver effective
and quasi-experimental and experimental services. While he did not use the terms
designs. The point needs to be underscored that formative and summative evaluation, he
this approach involves the evaluator and a essentially defined the underlying concepts. In
representative body of stakeholders in regular discussing the distinctions between the
exchanges about the evaluation. Typically, the constructive, proactive orientation on the one
evaluator should establish and regularly interact hand and the retrospective, judgmental
with an evaluation advisory or review panel in orientation on the other, he argued for placing
Improvement/Accountability-Oriented Evaluation Approaches 43
more emphasis on the former—in contrast to the process to assure that their evaluation needs are
evaluation tradition of stressing well addressed and to encourage and support
retrospective outcomes evaluation. Later, I them to make effective use of evaluation
(Stufflebeam, 1966, 1967) introduced a findings. It is comprehensive in attending to
conceptualization of evaluation that was based context, inputs, process, and outcomes. It
on the idea that evaluation should help program balances the use of quantitative and qualitative
personnel make and defend decisions that are in methods. It is keyed to professional standards
the best interest of meeting beneficiaries’ needs. for evaluations. Finally, the approach
While I argued for an improvement orientation emphasizes that evaluations must be grounded
to evaluation, I also emphasized that evaluators in the democratic principles of a free society.
must both inform decisions and provide an
informational basis for accountability. I also A main limitation is that the collaboration
emphasized that the approach should interact required between an evaluator and stakeholders
with and serve the full range of stakeholders introduces opportunities for impeding the
who need to make judgments and choices about evaluation and/or biasing its results, especially
a program. Other persons who have contributed when the evaluative situation is politically
to the development of a decision/accountability charged. Also, when evaluators are actively
orientation to evaluation are Alkin (1969) and influencing the course of a program, they may
Webster.7 identify so closely with it that they lose some of
the independent, detached perspective needed to
The decision/accountability-oriented approach is provide objective, forthright reports. Moreover,
applicable in cases where program staffs and the approach may overemphasize formative
other stakeholders want and need both evaluation and give too little attention to
formative and summative evaluation. It can summative evaluation. External
provide the evaluation framework for both metaevaluation has been employed to
internal evaluation and external evaluation. counteract opportunities for bias and to assure
When used for internal evaluation, usually it is the proper balance of formative and summative
important to commission an independent evaluation. Though the charge is erroneous, this
metaevaluation of the inside evaluator’s work. approach carries the connotation that only top
In addition to application to program decision makers are served.
evaluations, this approach has proved useful in
evaluating personnel, students, projects, Approach 17: Consumer-Oriented Studies
facilities, and products.
In the consumer-oriented approach, the
A main advantage of the evaluator is the “enlightened surrogate
decision/accountability-oriented approach is that consumer.” He or she must draw direct
it encourages program personnel to use evaluative conclusions about the program being
evaluation continuously and systematically in evaluated. Evaluation is viewed as the process
their efforts to plan and implement programs of determining something’s merit and worth,
that meet beneficiaries’ targeted needs. It aids with evaluations being the products of that
decision making at all levels of a system and process. The approach regards a consumer’s
stresses program improvement. It also presents welfare as a program’s primary justification and
a rationale and framework of information for accords that welfare the same primacy in
helping program personnel to be accountable for program evaluation. Grounded in a deeply
their decisions and actions in implementing a reasoned view of ethics and the common good
program. It is heavily geared to involving the plus skills in obtaining and synthesizing
full range of stakeholders in the evaluation pertinent, valid, and reliable information, the
44 Stufflebeam
evaluator should help developers produce and Questions for the consumer-oriented study are
deliver products and services that are of derived from society, from program
excellent quality and of great use to consumers constituents, and especially from the evaluator’s
(e.g., students, their parents, teachers, and frame of reference. The general question
taxpayers). More importantly, the evaluator addressed is, Which of several alternative
should help consumers identify and assess the programs is the best choice, given their
merit and worth of competing programs, differential costs, the needs of the consumer
services, and products. group, the values of society at large, and
evidence of both positive and negative
Advance organizers include societal values, outcomes?
consumers’ needs, costs, and criteria of
goodness in the particular evaluation domain. Methods include checklists, needs assessments,
The purpose of a consumer-oriented program goal-free evaluation, experimental and quasi-
evaluation is to judge the relative merits and experimental designs, modus operandi analysis,
worths of the products and services of applying codes of ethical conduct, and cost
alternative programs and, thereby, to help analysis (Scriven, 1974). A preferred method is
taxpayers, practitioners, and potential for an external, independent consumer advocate
beneficiaries make wise choices. This approach to conduct and report findings of studies of
is objectivist in assuming an underlying reality publicly supported programs. The approach is
and positing that it is possible although often keyed to employing a sound checklist of the
extremely difficult to find best answers. This program’s key aspects. Scriven (1991)
approach looks at a program comprehensively in developed a generic “Key Evaluation
terms of its quality and costs, functionally Checklist” for this purpose. The main
regarding the assessed needs of the intended evaluative acts in this approach are grading,
beneficiaries, and comparatively considering scoring, ranking, apportioning, and producing
reasonably available alternative programs. the final synthesis (Scriven, 1994a).
Evaluators are expected to subject their program
evaluations to evaluations, what Scriven termed Scriven (1967) was a pioneer in applying the
metaevaluation. consumer-oriented approach to program
evaluation, and his work parallels the
The approach employs a wide range of concurrent work of Ralph Nader and the
assessment topics. These include program Consumers Union in the general field of
description, background and context, client, consumerism. Glass has supported and
consumers, resources, function, delivery developed Scriven’s approach.8 Scriven coined
system, values, standards, process, outcomes, the terms formative and summative evaluation.
costs, critical competitors, generalizability, He allowed that evaluations can be divergent in
statistical significance, assessed needs, bottom- early quests for critical competitors and
line assessment, practical significance, explorations related to clarifying goals and
recommendations, reports, and metaevaluation. making programs function well. However, he
The evaluation process begins with also maintained that ultimately evaluations must
consideration of a broad range of such topics, converge on summative judgments about a
continuously compiles information on all of program’s merit and worth. While accepting the
them, and ultimately culminates in a super- importance of formative evaluation, he also
compressed judgment of the program’s merit argued against Cronbach’s (1963) position that
and worth. formative evaluation should be given the major
emphasis. According to Scriven, the bottom-
line aim of a sound evaluation is to judge the
program’s merit, comparative value, and overall One disadvantage is that the approach can be so
worth. Scriven (1991, 1994a) sees evaluation as independent from practitioners that it may not
a transdiscipline encompassing all evaluations of assist them to do a better job of serving
various entities across all applied areas and consumers. If summative evaluation is applied
disciplines and comprised of a common logic, too early, it can intimidate developers and stifle
methodology, and theory that transcends their creativity. However, if summative
specific evaluation domains, which also have evaluation is applied only near a program’s end,
their unique characteristics. the evaluator may have great difficulty in
obtaining sufficient evidence to confidently and
The consumer-oriented study requires a highly credibly judge the program’s basic value. This
credible and competent expert plus either often iconoclastic approach is also heavily
sufficient resources to allow the expert to dependent on a highly competent, independent,
conduct a thorough study or other means to and “bulletproof” evaluator.
obtain the needed information. Often, a
consumer-oriented evaluator is engaged to Approach 18: Accreditation/Certification
evaluate a program after its formative stages are Approach
over. In these situations, the external consumer-
oriented evaluator is often dependent on being Most school districts and universities and many
able to access a substantial base of information professional organizations have periodically
that the program staff had accumulated. If no been the subject of an accreditation study, and
such base of information exists, the consumer- many professionals, at one time or another, have
oriented evaluator will have great difficulty in had to meet certification requirements for a
obtaining enough information to produce a given position. Such studies of institutions and
thorough, defensible summative program personnel are in the realm of accountability-
evaluation. oriented evaluations, and they have an
improvement element as well. Institutions,
One of this approach’s main advantages is that institutional programs, and personnel are studied
it is a hard-hitting, independent assessment to prove whether they are fit to serve designated
intended to protect consumers from shoddy functions in society; typically, the feedback
programs, services, and products and instead to reports include areas for improvement.
guide them to support and use those
contributions that best and most cost-effectively The advance organizers used in the
address their needs. Also, the approach’s stress accreditation/certification study usually are
on independent/objective assessment yields guidelines and criteria that some accrediting or
high credibility with consumer groups. The certifying body has adopted. As previously
approach directly attempts to achieve a suggested, the evaluation’s purpose is to
comprehensive assessment of merit and worth. determine whether institutions, institutional
This is aided by Michael Scriven’s (1991) Key programs, and/or personnel should be approved
Evaluation Checklist and his Evaluation to perform specified functions.
Thesaurus (in which he presents and explains
the checklist). The approach provides for a The source of questions for accreditation or
summative evaluation to yield a bottom-line certification studies is the accrediting or
judgment of merit and worth, preceded by a certifying body. Basically, they address this
formative evaluation to assist developers to help question: Are institutions and their programs
assure that their programs will succeed. and personnel meeting minimum standards, and
how can their performance be improved?
46 Stufflebeam
Typical methods used in the accreditation/ many opportunities for corruption and inept
certification approach are self-study and self- performance. As has been said for a number of
reporting by the individual or institution. In the the evaluation approaches described above, it is
case of institutions, panels of experts are prudent to subject accreditation and certification
assigned to visit the institution, verify a self- processes themselves to independent
report, and gather additional information. The metaevaluations.
basis for the self-studies and the visits by expert
panels are usually guidelines and criteria that The three improvement/accountability-oriented
have been specified by the accrediting agency. approaches emphasize the assessment of merit
and worth, which is the thrust of the definition
Accreditation of education was pioneered by the of evaluation used to classify the 22 approaches
College Entrance Examination Board around considered in this paper. Tables 7 through 12
1901. Since then, the accreditation function has summarize the similarities and differences
been implemented and expanded, especially by between the models in relationship to advance
the Cooperative Study of Secondary School organizers, purposes, characteristic questions,
Standards, dating from around 1933. methods, strengths, and weaknesses. The paper
Subsequently, the accreditation approach has turns next to the fourth and final set of program
been developed, further expanded, and evaluation approaches—those concerned with
administered by the North Central Association using evaluation to further some social agenda.
of Secondary Schools and Colleges, along with
their associated regional accrediting agencies
across the United States, and by many other
accrediting and certifying bodies. Similar
accreditation practices are found in medicine,
law, architecture, and many other professions.
Any area of professional service that potentially

could put the public at risk if services are not
delivered by highly trained specialists in
accordance with standards of good practice and
safety should consider subjecting its programs
to accreditation reviews and its personnel to
certification processes. Such use of evaluation
services is very much in the public interest and
also is a means of getting feedback of use in
strengthening capabilities and practices.
The main advantage of the accreditation or

certification study is that it aids lay persons in
making informed judgments about the quality of
organizations and programs and the
qualifications of individual personnel. The
main difficulties are that the guidelines of
accrediting and certifying bodies often
emphasize inputs and processes and not
outcome criteria. Also, the self-study and
visitation processes used in accreditation offer
Table 7: Comparison of the Three Improvement/Accountability Approaches on Most Common ADVANCE

ORGANIZERS
Evaluation Approaches
Advance Organizers
16. 17. Consumer 18. Accreditation
Decision/ Orientation
Accountability
Decision makers/stakeholders U
Decision situations U
Program accountability requirements U U
Needs, problems, opportunities U
Competing program approaches U U
Program operations U U
Program outcomes U U U
Cost-effectiveness U U
Assessed needs U U
Societal values U
Intrinsic criteria of merit U U
Accreditation guidelines & criteria U

48 Stufflebeam
Table 8: Comparison of the Primary PURPOSES of the Three Improvement/Accountability Evaluation Approaches
16. Decision/Accountability 17. Consumer Orientation 18. Accreditation

Purposes
Provide a knowledge
and value base for U
decisions
Judge alternatives U
Approve/
recommend U
professional services
Table 9: Comparison of the Improvement/Accountability Evaluation Approaches on

Characteristic EVALUATION QUESTIONS
Characteristic Evaluation Questions Evaluation Approaches
16. Decision/ 17. Consumer 18. Accreditation

Accountability Orientation
What consumer needs should be addressed? U U
What alternatives are available to address the needs & U U

what are their comparative merits?
What plan should guide the program? U
What facilities, materials, and equipment are needed? U
Who should conduct the program & what roles

should the different participants carry out? U
Is the program working & should it be revised? U U U
How can the program be improved? U U
Is the program reaching all the rightful beneficiaries?

U
What are the outcomes? U U U
Did staff responsibly & effectively discharge their U

program responsibilities?
Is the program superior to critical competitors? U U
Is the program worth the required investment? U U
Is the program meeting minimum accreditation U

requirements?
50 Stufflebeam
Table 10: Comparison of Main METHODS of the Three Improvement/Accountability

Evaluation Methods 16. Decision/Accountability 17. Consumer Orientation 18. Accreditation
Surveys U
Needs assessments U U
Case studies U
Advocate teams U
Observations U U
Interviews U U U
Resident evaluators U
Quasi experiments U U
Experiments U U
Checklists U U
Goal-free evaluations U
Modus operandi analysis U
Applying codes of ethical U

conduct
Cost analysis U
Self-study U
Site visits by expert U U

panels
Table 11: Comparison of the Prevalent STRENGTHS of the Three Improvement/Accountability Evaluation
Approaches
Strengths 16. Decision/Accountability 17. Consumer Orientation 18. Accreditation
Keyed to professional
standards U U
Examines context, inputs,

process, & outcomes U U
Balances use of quantitative

and qualitative methods U U U
Integrates evaluation into

management operations U
Targets constituents’ needs U U
Stresses program U
improvement
Provides basis for U U U

accountability
Involves and addresses the

needs of all stakeholders U U
Serves decision making at all

system levels U
Promotes & assists uses of

evaluation findings U U
Emphasizes democratic U U
principles
Stresses an independent U U
perspective
Stresses consumer protection U U
Produces a comprehensive
assessment of merit & worth U U U
Emphasizes cost-effectiveness U
Provides formative & U U

summative evaluation
Grades the quality of

programs & institutions U U
Aided by Scriven’s Key

Evaluation Checklist & U
Evaluation Thesaurus
52 Stufflebeam
Table 12: Comparison of the Prevalent WEAKNESSES of the Three Improvement/Accountability Evaluation
Approaches
Weaknesses 16. Decision/ 17. Consumer 18. Accreditation

Accountability Orientation
Involved collaboration with client/stakeholders may

engender interference & bias U U
Influence on program operations may compromise the U

evaluation’s independence
May be too independent to help strengthen operations U
Carries connotation that top decision makers are most U

important
May overemphasize formative evaluation and

underemploy summative evaluation U
Stress on independence may minimize formative U

assistance
Summative evaluation applied too early may stifle staffs’

creativity U
Summative evaluation applied too late in a program’s

process may be void of much needed information U
Heavily dependent on a highly competent, independent

evaluator U
May overstress intrinsic criteria U
May underemphasize U
outcome information
Includes many opportunities for evaluatees to coopt &

bias the evaluators U
V. SOCIAL AGENDA/
ADVOCACY APPROACHES
The Social Agenda/Advocacy approaches are administer, or directly operate the programs
heavily directed to making a difference in society under study and seek or need evaluators’
through program evaluation. These counsel and advice in understanding, judging,
approaches especially are employed to ensure and improving the programs. The approach
that all segments of society have equal access to charges evaluators to interact continuously with
educational and social opportunities and and respond to the evaluative needs of the
services. The approaches even have an various clients.
affirmative action bent toward giving
preferential treatment through program This approach contrasts sharply with Scriven’s
evaluation to the disadvantaged. If—as many consumer-oriented approach. Stake’s evaluators
persons have stated—information is power, then are not the independent, objective assessors as
this set of approaches could be said to be seen in Scriven’s approach. The client-centered
oriented toward employing program evaluation, study embraces local autonomy and helps
sometimes in a biased way, to empower the people who are involved in a program to
disenfranchised. By giving stakeholders the evaluate it and use the evaluation for program
authority for key evaluation decisions, related improvement. The evaluator in a sense is the
especially to interpretation and release of client’s handmaiden as they strive to make the
findings, evaluators empower these persons to evaluation serve their needs. Moreover, the
use evaluation to their best advantage; but they client-centered approach rejects objectivist
also may make the evaluation vulnerable to bias evaluation and instead subscribes to the
and other misuse. Nevertheless, there is much postmodernist view, wherein there are no best
to recommend these approaches, since they are answers or clearly preferable values and wherein
strongly oriented to democratic principles of subjective information is preferred. In this
equity and fairness and employ practical approach, the program evaluation may
procedures for involving the full range of culminate in conflicting findings and
stakeholders. conclusions, leaving interpretation to the eyes of
the beholders. Client-centered evaluation is
Approach 19: Client-Centered Studies (or perhaps the leading entry in the “relativistic
Responsive Evaluation) school of evaluation,” which calls for a
pluralistic, flexible, interactive, holistic,
The classic approach in this set is the client- subjective, constructivist, and service-oriented
centered study or what Robert Stake (1983) has approach. The approach is relativistic because
termed the responsive evaluation. The label it seeks no final authoritative conclusion, but
client-centered evaluation is used here, because instead interprets findings against stakeholders’
one pervasive theme is that the evaluator must different and often conflicting values. The
work with and for the support of a diverse client approach seeks to examine a program’s full
group including, for example, teachers, countenance and prizes the collection and
administrators, developers, taxpayers, legislators, reporting of multiple, often conflicting
and financial sponsors. They are the perspectives on the value of a program’s format,
Clients in the sense that they support, develop, operations, and achievements. Side effects and
54 Stufflebeam
incidental gains as well as intended outcomes This approach reflects a formalization of the
are to be identified and examined. longstanding practice of informal, intuitive
evaluation. It requires a relaxed and continuous
The advance organizers in client-centered exchange between the evaluator and clients.
evaluations are stakeholders’ concerns and The approach is more divergent than
issues in the program itself, also the program’s convergent. Basically, the approach calls for
rationale, background, transactions, outcomes, continuing communication between evaluator
standards, and judgments. The client-centered and audience for the purposes of discovering,
program evaluation may serve a wide range of investigating, and addressing a program’s
purposes. Some of these are helping people in issues. Designs for client-centered program
a local setting gain a perspective on the evaluations are relatively open-ended and
program’s full countenance; understanding the emergent, building to narrative description,
ways that various groups see the program’s rather than aggregating measurements across
problems, strengths, and weaknesses; and cases. The evaluator attempts to issue timely
learning the ways affected people value the responses to clients’ concerns and questions by
program plus the ways program experts judge it. collecting and reporting useful information,
The evaluator’s process goal is to carry on a even if the needed information hadn’t been
continuous search for key questions and to anticipated at the study’s beginning.
provide the clients with useful information as it Concomitant with the ongoing conversation
becomes available. with the clients, the evaluator attempts to obtain
and present a rich set of information on the
The client-centered/responsive approach has a program. This includes its philosophical
strong philosophical base: evaluators should foundation and purposes, history, transactions,
promote equity and fairness, help those with and outcomes. Special attention is given to side
little power, thwart the misuse of power, expose effects, the standards that various persons hold
the huckster, unnerve the assured, reassure the for the program, and their judgments of the
insecure, and always help people see things program.
from alternative viewpoints. The approach
subscribes to moral relativity and posits that for Depending on the evaluation’s purpose, the
any given set of findings there are potentially evaluator may legitimately employ a range of
multiple, conflicting interpretations that are different methods. Some of the preferred
equally plausible. methods are the case study, expressive
objectives, purposive sampling, observation,
Community, practitioner, and beneficiary adversary reports, story telling to convey
groups in the local environment plus external complexity, sociodrama, and narrative reports.
program area experts provide the questions Client-centered evaluators are charged to check
addressed by the client-centered study. In for existence of stable and consistent findings by
general, the groups usually want to know what employing redundancy in their data-collecting
the program has achieved, how it operated, and activities and replicating their case studies.
the ways in which it is judged by involved Evaluators are not expected to act as a
persons and experts in the program area. The program’s sole or final judges, but should
more specific evaluation questions emerge as collect, process, and report the opinions and
the study unfolds based on the evaluator’s judgments of the full range of the program’s
continuing interactions with the stakeholders stakeholders plus pertinent experts. In the end,
and their collaborative assessment of the the evaluator makes a comprehensive statement
developing evaluative information. of what the program is observed to be and
Social Agenda/Advocacy Approaches 55
references the satisfaction and dissatisfaction perspectives, and heavy involvement of

that appropriately selected people feel toward stakeholders in interpreting and using findings.
the program. Overall, the client- Finally, the clients must be sufficiently patient
centered/responsive evaluator uses whatever to allow the program evaluation to unfold and
information sources and techniques seem find its direction based on the ongoing
relevant to portraying the program’s interactions between the evaluator and the
complexities and multiple realities and stakeholders.
communicates the complexity even if the result
instills doubt and makes decision making more A main strength of the responsive/client-
difficult. centered approach is that it involves action-
research, in which people funding,
Stake (1967) is the pioneer of the client- implementing, and using programs are helped to
centered/responsive type of study, and his conduct their own evaluations and use the
approach has been supported and developed by findings to improve their understanding,
Denny (1978), MacDonald (1975), Parlett and decisions, and actions. The evaluations look
Hamilton (1972), Rippey (1973), and Smith and deeply into the stakeholders’ main interests and
Pohland (1974). Guba’s (1978) early search broadly for relevant information. They
development of constructivist evaluation also also examine the program’s rationale,
was heavily influenced by Stake’s writings on background, process, and outcomes. They
responsive evaluation. Stake has expressed make effective use of qualitative methods and
skepticism about scientific inquiry as a triangulate findings from different sources. The
dependable guide to developing generalizations approach stresses the importance of searching
about human services and pessimism about the widely for unintended as well as intended
potential benefits of formal program outcomes. It also gives credence to the
evaluations. meaningful participation in the evaluation by the
full range of interested stakeholders. Judgments
The main condition for applying the client- and other inputs from all such persons are
centered approach is a receptive client group respected and incorporated in the evaluations.
and a confident, competent, responsive The approach also provides for effective
evaluator. The client must be willing to communication of findings.
endorse a quite open, flexible evaluation plan as
opposed to a well-developed, detailed, A main weakness is the approach’s
preordinate plan and must be receptive to vulnerability regarding external credibility,
equitable participation by a representative group since people in the local setting, in effect, have
of stakeholders. They must find qualitative considerable control over the evaluation of their
methods acceptable and usually be willing to work. Similarly, evaluators working so closely
forego anything like a tightly controlled with stakeholders may lose their independent
experimental study, although in exceptional perspectives. Also, the approach is not very
cases a controlled field experiment might be amenable to reporting clear findings in time to
employed. The client and other involved meet decision or accountability deadlines.
stakeholders need tolerance, even appreciation Moreover, rather than bringing closure, the
for ambiguity and to hold out only modest approach’s adversary aspects and divergent
hopes for obtaining definitive answers to qualities may generate confusion and
evaluation questions. The clients must be contentious relations among the stakeholders.
receptive to ambiguous findings, multiple Sometimes, this cascading, evolving approach
interpretations, employment of competing value
56 Stufflebeam
may bog down in an unproductive quest for measurements and statistics. He or she employs
multiple inputs and interpretations. a relativist perspective to obtain and analyze
findings, stressing locality and specificity over
Approach 20: Constructivist Evaluation generalizability. The evaluator posits that there
can be no ultimately correct conclusions. He or
The constructivist approach to program she exalts openness and the continuing search
evaluation is heavily philosophical, service for more informed and illuminating
oriented, and paradigm driven. The constructions.
constructivist paradigm rejects the existence of
any ultimate reality and employs a subjectivist This approach is as much recognizable for what
epistemology. It sees knowledge gained as one it rejects as for what it proposes. In general, it
or more human constructions, uncertifiable and strongly opposes positivism as a basis for
constantly problematic and changing. It places evaluation, with its realist ontology, objectivist
the evaluators and program stakeholders at the epistemology, and experimental method. It
center of the inquiry process, employing all of rejects any absolutist search for correct answers.
them as the evaluation’s “human instruments.” It directly opposes the notion of value-free
The approach insists that evaluators be totally evaluation and attendant efforts to expunge
ethical in respecting and advocating for all the human bias. It rejects positivism’s deterministic
participants, especially the disenfranchised. and reductionist structure and its belief in the
Evaluators are authorized, even expected to possibility of fully explaining studied programs.
maneuver the evaluation to emancipate and
empower involved or affected disenfranchised The constructivist approach’s advance
people. Evaluators do this by raising organizers are basically the philosophical
stakeholders’ consciousness so that they are constraints placed on the study, as seen above,
energized, informed, and assisted to transform including the requirement of collaborative,
their world. The evaluator must respect the unfolding inquiry. A constructivist approach’s
participants’ free will in all aspects of the main purpose is to determine and make sense of
inquiry and should empower them to help shape the variety of constructions that exist among
and control the evaluation activities in their stakeholders. The approach keeps the inquiry
preferred ways. The inquiry process must be open to ongoing communication and to the
consistent with effective ways of changing and gathering, analysis, and synthesis of further
improving society. Thus, stakeholders must play constructions. One construction is not
a key role in determining the evaluation considered more true than others, but some may
questions and variables. Throughout the study, be judged as more informed and sophisticated
the evaluator regularly and continuously informs than others. All evaluation conclusions are
and consults the stakeholders in all aspects of viewed as indeterminate with the continuing
the study. The approach rescinds any special possibility of finding better answers. All
privilege of scientific evaluators to work in constructions are also context dependent. In this
secret and control/manipulate human subjects. respect, the evaluator does define boundaries on
In guiding the program evaluation, the what is being investigated.
evaluator balances verification with a quest for
discovery, balances rigor with relevance, and The questions addressed in constructivist
balances the use of quantitative and qualitative studies cannot be determined apart from the
methods. The evaluator also provides rich and participants’ interactions. Together, the
deep description in preference to precise evaluator and stakeholders identify the
questions to be addressed. These questions
emerge in the process of formulating and the philosophical underpinnings of constructivist

discussing the study’s rationale, planning the evaluation. Fetterman’s (1994) empowerment
schedule of discussions, and obtaining various evaluation approach is closely aligned with
initial persons’ views of the program to be constructivist evaluation, since it seeks to
evaluated. These questions develop further over engage and serve all stakeholders, especially
the course of the approach’s hermeneutic and those with little influence. However, there is a
dialectic processes. The questions may or may key difference between the constructivist and
not cover the full range of issues involved in empowerment evaluation approaches. While the
assessing something’s merit and worth. Also, constructivist evaluator retains control of the
the set of questions to be studied is never evaluation and works with stakeholders to
considered fixed. develop a consensus, the empowerment
evaluation “gives away” authority for the
The constructivist methodology is first evaluation to the stakeholders, with the
divergent, then convergent. Through the use of evaluator serving in a technical assistance role.
hermeneutics the evaluator collects and
describes alternative individual constructions on The constructivist approach can be applied
an evaluation question or issue, assuring that usefully when the evaluator, client, and
each depiction meets with the respondent’s stakeholders in the program fully agree that the
approval. Communication channels are kept approach is appropriate and that they will
open throughout the inquiry, and all respondents cooperate. They should reach such agreements
are encouraged and facilitated to make their based on an understanding of what the approach
inputs and keep apprised of all aspects of the can and cannot deliver. They need to accept
study. The evaluator then moves to a dialectical that questions and issues to be studied will
process aimed at bringing the different unfold throughout the process. They also
constructions into as much consensus as should be willing to receive ambiguous,
possible. Respondents are provided possibly contradictory findings, reflecting the
opportunities to review the full range of stakeholders’ diverse perspectives. They should
constructions along with other relevant know also that the shelf life of the findings is
information. The evaluator engages the likely to be short (not unlike any other
respondents in a process of studying and evaluation approach, but clearly acknowledged
contrasting existing constructions, considering in the constructivist approach). They also need to
relevant contextual and other information, value qualitative information that largely
reasoning out the differences among the reflects stakeholders’ various perspectives and
constructions, and moving as far as they can judgments. On the other hand, they should not
toward a consensus. The constructivist expect to receive definitive pre-post measures of
evaluation is, in a sense, never ending. There is outcomes and statistical conclusions about
always more to learn, and finding ultimately causes and effects. While these persons can
correct answers is considered impossible. hope for achieving a consensus in the findings,
they should agree that such a consensus might
Guba and Lincoln (1985, 1989) are pioneers in not emerge and that in any case such a
applying the constructivist approach to program consensus would not generalize to other settings
evaluation. Also, Bhola (1998), a disciple of or time periods.
Guba, has extensive experience in applying the
constructivist approach to evaluating programs This approach has a number of advantages. It is
in Africa. Thomas Schwandt (1984), another exemplary in fully disclosing the whole
disciple of Guba, has written extensively about evaluation process and set of findings. It is
58 Stufflebeam
consistent with the principle of effective change informed about the issues being addressed in an
processes that people are more likely to value evaluation and thus are poor data sources. It can
and use something (read evaluation here) if they be unrealistic to expect that the evaluation can
are consulted and involved in its development. and will take the needed time to inform and then
It also seeks to directly involve the full range of meaningfully involve those who begin as
stakeholders who might be harmed or helped by basically ignorant of the program being
the evaluation as important, empowered partners assessed. Also, constructivist evaluations can be
in the evaluation enterprise. It is said to be greatly burdened by itinerant evaluation
educative for all the participants, whether or not stakeholders who come and go and who expect to
a consensus is reached. It also lowers reopen questions previously addressed and
expectations for what clients can learn about any consensus previously reached. In addition,
causes and effects. While it doesn’t promise some evaluation clients don’t take kindly to
final answers, it does move from a divergent evaluators who are prone to report competing,
stage, in which it searches widely for insights perspectivist answers without taking a stand
and judgments, to a convergent stage in which regarding the program’s merit and worth. Also,
some unified answers are sought. In addition, it many clients aren’t necessarily attuned to the
uses participants as instruments in the constructivist philosophy. Instead, they may
evaluation, thus taking advantage of their value reports that mainly include hard data on
relevant experiences, knowledge, and value outcomes and assessments of statistical
perspectives; this greatly reduces the burden of significance. Often, they also expect that reports
developing, field-testing, and validating should be based on relatively independent
information collection instruments before using perspectives that are free of program
them. The approach makes effective use of participants’ conflicts of interest. In addition,
qualitative methods and triangulates findings the constructivist approach is a countermeasure to
from different sources. assigning responsibility for successes and
failures in a program to certain individuals or
However, the approach is limited in its groups; many policy boards, administrators, and
applicability and has some disadvantages. financial sponsors might see this rejection of
Because of the need for full involvement and individual and group accountability as
ongoing interaction through both the divergent unworkable and unacceptable. It is easy to say
and convergent stages, it is often difficult to that all persons in a program should share the
produce the timely reports that funding agencies glory or the disgrace; but try to tell this to an
and decision makers demand. Also, to work exceptionally hardworking and effective teacher
well the approach requires the attention and in a school program where virtually no one else
responsible participation of a wide range of tries or succeeds.
stakeholders. The approach seems to be
unrealistically utopian in this regard. Approach 21: Deliberative Democratic
Widespread, grass-roots interest and Evaluation
participation are often hard to obtain and sustain
throughout a program evaluation. This can be Perhaps the newest entry in the program
exacerbated by a continuing turnover of evaluation models enterprise is the deliberative
stakeholders. While the process emphasizes and democratic approach advanced by House and
promises openness and full disclosure, some Howe (1998). The approach functions within
participants don’t want to tell their private an explicit democratic framework and charges
thoughts and judgments to the world. evaluators to uphold democratic principles in
Moreover, stakeholders sometimes are poorly reaching defensible evaluative conclusions. The
approach envisions program evaluation as a program. The evaluator(s) determines the

principled, influential societal institution, evaluation questions to be addressed but does so
contributing to democratization through the through dialogue and deliberation with engaged
issuing of reliable and valid claims. stakeholders. Presumably, the bottom-line
questions concern judgments about the
The approach’s advance organizers are seen in program’s merit and its worth to the
its three main dimensions: democratic stakeholders.
participation, dialogue to examine and
authenticate stakeholders’ inputs, and Methods employed may include discussions
deliberation to arrive at a defensible assessment with stakeholders, surveys, and debates.
of the program’s merit and worth. All three Inclusion, dialogue, and deliberation are
dimensions are considered essential in all considered relevant in all stages of an
aspects of a sound program evaluation. evaluation—inception, design, implementation,
analysis, synthesis, write-up, presentation, and
In the democratic dimension, the approach discussion. House and Howe (1998) presented
proactively identifies and arranges for the the following 10 questions for assessing the
equitable participation of all interested adequacy of a democratic deliberative
stakeholders throughout the course of the evaluation:
program evaluation. The approach stresses
equity and does not tolerate power imbalances Whose interests are represented?
in which the message of powerful parties would Are major stakeholders represented?
dominate the evaluation message. In the Are any excluded?
dialogic dimension the evaluator engages Are there serious power imbalances?
stakeholders and other audiences to assist in Are there procedures to control imbalances?
compiling preliminary evaluation findings. How do people participate in the evaluation?
Subsequently, the collaborators seriously How authentic is their participation?
discuss and debate the draft evaluation findings How involved is their interaction?
to ensure that no participant’s views are Is there reflective deliberation?
misrepresented. In the culminating deliberative How considered and extended is the
stage, the evaluator(s) honestly considers and deliberation?
discusses with others all inputs obtained but
then renders what he or she considers a fully Ernest House originated this approach. He and
defensible assessment of the program’s merit and Kenneth Howe say that many evaluators
worth. All interested stakeholders are given voice already implement their proposed principles.
in the evaluation, and the evaluator Especially, they pointed to an article by
acknowledges their views in the final report, but Karlsson (1998) to illustrate their approach.
may express disagreement with some of them. Also, they refer to a number of authors who
The deliberative dimension sees the evaluator(s) have proposed practices that at least in part are
reaching a reasoned conclusion by reviewing all compatible with the democratic dialogic
inputs; debating them with stakeholders and approach.
others; reflecting deeply on all these inputs; then
reaching a defensible, well-justified conclusion. This approach is applicable when a client agrees
to fund an evaluation that requires democratic
This approach’s purpose is to employ participation of at least a representative group of
democratic participation in the process of stakeholders. Thus, the funding agent must be
arriving at a defensible assessment of a
60 Stufflebeam
willing to give up sufficient power to allow improving higher education. He would say that
inputs from a wide range of stakeholders, early changing any aspect of our university would
disclosure of preliminary findings to all require getting every professor to withhold her
interested parties, and opportunities for the or his veto. In view of the very ambitious
stakeholders to play an influential role in demands of the democratic dialogic approach,
reaching the final conclusions. Also, a House and Howe have proposed it as an ideal to
representative group of stakeholders must be be kept in mind even though evaluators will
willing to engage in open and meaningful seldom, if ever, be able to achieve this ideal.
dialogue and deliberation in all stages of the
study. Approach 22. Utilization-Focused Evaluation
This approach has many advantages associated The utilization-focused approach is explicitly
with any democratic process. It is a direct geared to assure that program evaluations make
attempt to make evaluations just. It assures impacts. It is a process for making choices
democratic participation of stakeholders in all about an evaluation study in collaboration with
stages of the evaluation. It strives to incorporate a targeted group of priority users, selected from
the views of all interested parties, including a broader set of stakeholders, in order to focus
insiders and outsiders, disenfranchised persons effectively on their intended uses of an
and groups, those who control the purse strings, evaluation. All aspects of a utilization-focused
etc. Meaningful democratic involvement should program evaluation are chosen and applied to
direct the evaluation to the issues that people help the targeted users obtain and apply
care about and incline them to respect and use evaluation findings to their intended uses and to
the evaluation findings. It employs dialogue to maximize the possibility they will do so. Such
examine and authenticate stakeholders’ inputs. studies are judged more for the difference they
A key advantage over some other advocacy make in improving programs and influencing
approaches is that the democratic deliberative decisions and actions than for their elegance and
evaluator reserves the right to rule out inputs technical excellence. No matter how good an
that are considered incorrect or unethical. The evaluation report is, if it only sits on the shelf
evaluator is open to all stakeholders’ views, gathering dust, then it contributed little if
carefully considers them, but then renders as anything to the evaluation’s success.
defensible a judgment of the program as
possible. He or she does not leave the The advance organizers of utilization-focused
responsibility for reaching a defensible final program evaluations are, in the abstract, the
a s se s s men t t o a maj or i t y vote of possible users and uses to be served. Working
stakeholders—some of whom are sure to have from this initial conception, the evaluator moves
conflicts of interest and be uninformed. In as directly as possible to identify in concrete
rendering a final judgment, the evaluator terms the actual users to be served. Through
ensures closure. careful and thorough analysis of stakeholders,
the evaluator identifies the multiple and varied
As House and Howe have acknowledged, the perspectives and interests that should be
democratic dialogic approach is, at this time, represented in the study. He or she then selects
unrealistic and often cannot be fully applied. a group that is willing to pay the price of
This approach—in offering and expecting full substantial involvement and that appropriately
democratic participation in order to make an represents the program’s stakeholders. The
evaluation work—reminds me of a colleague evaluator then engages this client group to
who used to despair of ever changing or clarify why they need the evaluation, how they
intend to apply its findings, and how they think c ol le agu e, ex te rn al e xp er t, an al ys t,

it should be conducted. The evaluator facilitates spokesperson, mediator, etc.
the users’ choices by supplying a menu of
possible uses, information, and reports for the The evaluator works with the targeted users to
evaluation. But this is done not to supply the determine the evaluation questions. Such
choices but to help the client group questions are to be determined locally, may
thoughtfully focus and shape the study. The address any of a wide range of concerns, and
main possible uses of evaluation findings probably will change over time. Example foci
contemplated in this approach are assessment of are processes, outcomes, impacts, costs, cost
merit and worth, improvement, and generation benefits, etc. The chosen questions are kept
of knowledge. The approach also values the front and center and provide the basis for
evaluation process itself, seeing it as helpful in information collection and reporting plans and
enhancing shared understandings among activities, so long as the users continue to value
stakeholders, bringing support to a program, and pay attention to the questions. Often,
promoting participation in the program, and however, the evaluator and client group will
developing and strengthening organizational adapt, change, or refine the questions as the
capacity. evaluation unfolds.
In deliberating with the intended users, the All evaluation methods are fair game in the
evaluator emphasizes that the program utilization-focused program evaluation. The
evaluation’s purpose must be to give them the evaluator will creatively employ whatever
information they need to fulfill their objectives. methods are relevant, e.g., quantitative and
Such objectives include socially valuable qualitative, formative and summative,
aims such as combating problems of illiteracy, naturalistic and experimental. As much as
crime, hunger, homelessness, unemployment, possible, the utilization-focused evaluator puts
child abuse, spouse abuse, substance abuse, the client group in “the driver’s seat” in
illness, alienation, discrimination, determining evaluation methods, so that they
malnourishment, pollution, bureaucratic waste, will make sure the evaluator addresses their
etc. However, it is the targeted users who most important questions; collects the right
determine the program to be evaluated, what information; applies the relevant values;
information is required, how and when it must be addresses the key action-oriented questions;
reported, and how it will be used. uses techniques they respect; interprets the
findings against a pertinent theory; reports the
In this approach, the evaluator is no iconoclast, information in a form and at a time when it can
but instead is the intended users’ servant and a best be used; convinces stakeholders of the
facilitator. The evaluation should meet the full evaluation’s integrity and accuracy; and
range of professional standards for program facilitates the users’ study, application, and—as
evaluations, not just utility. The evaluator must appropriate—dissemination of the findings.
therefore be an effective negotiator, standing on The bases for interpreting evaluation findings
principles of sound evaluation, but working hard are the users’ values, with the evaluator
to gear a defensible program evaluation to the engaging in much values clarification to ensure
targeted users’s evolving needs. The utilization- that evaluative information and interpretations
focused evaluation is considered situational and serve the users’ purposes. The users are actively
dynamic. Depending on the circumstances, the involved in interpreting findings. Throughout
evaluator may play any of a variety of the evaluation process, the evaluator balances
roles—trainer, measurement expert, internal
62 Stufflebeam
the concern for utility with provisions for for both conducting the program evaluation and
validity and cost-effectiveness. the needed follow-through.
In general, the method of utilization-focused This approach is geared to maximizing

program evaluation is labeled “active-reactive- evaluation impacts. It comports with a key
adaptive and situationally responsive,” principle of change. Persons who are involved
emphasizing that the methodology evolves in in an enterprise, such as an evaluation, are more
response to ongoing deliberations between the likely to understand, value, and use it if they
evaluator and client group and in consideration of were meaningfully involved in its development.
contextual dynamics. Patton (1997) says that As Patton says, “ . . . by actively involving
“Evaluators are active in presenting to intended primary intended users, the evaluator is training
users their own best judgments about users in use, preparing the groundwork for use,
appropriate evaluation focus and methods; they and reinforcing the intended utility of the
are reactive in listening attentively and evaluation . . . ” The approach engages
respectfully to others’ concerns; and they are stakeholders to determine the evaluation’s
adaptive in finding ways to design evaluations purposes and procedures and uses their
that incorporate diverse interests . . . while involvement to promote use of findings. It
meeting high standards of professional takes a more realistic approach to stakeholder
practice.” involvement than some other advocacy
approaches. Instead of trying to reach and work
Patton (1980, 1982, 1994, 1997) is the leading with all stakeholders, Patton’s approach works
proponent of utilization-focused evaluation. concretely with a representative group of users.
Others who have advocated for utilization- The approach places strong emphasis on values
focused evaluations are Alkin (1995), Cronbach clarification and attends closely to contextual
and Associates (1980), Davis and Salasin dynamics. The program evaluation may
(1975), and the Joint Committee on Standards selectively use any and all relevant evaluation
for Educational Evaluation (1981, 1994). procedures and triangulates findings from
different sources. Finally, this approach stresses
As defined by Patton, this approach has the need to meet all relevant standards for
virtually universal applicability. It is situational evaluations.
and can be tailored to meet any program
evaluation assignment. It carries with it the The approach’s main limitation is seen by
integrity of sound evaluation principles. Within Patton to be turnover of involved users.
these general constraints, the evaluator Replacement users may require that the program
negotiates all aspects of the evaluation to serve evaluation be renegotiated. This may
specific individuals who need to have a program be necessary to sustain or renew the prospects
evaluation performed and who intend to make for evaluation impacts. But it can also derail or
concrete use of the findings. The evaluator greatly delay the process. Also, the approach
selects from the entire range of evaluation seems to be vulnerable to corruption by the user
techniques those that best suit the particular groups, since they are given so much control
program evaluation. And the evaluator plays over what will be looked at, what questions
any of a wide range of evaluation and addressed, and what information employed.
improvement-related roles that fit the local Stakeholders with conflicts of interest may
needs. The approach requires a substantial inappropriately influence the evaluation.
outlay of time and resources by all participants Empowered stakeholders may inappropriately
limit the evaluation to only a subset of the
important questions. Also, it may be nigh unto

impossible to get a representative users group to
agree on a sufficient commitment of time and
safeguards to assure an ethical, valid process of
data collection, reporting, and use. Moreover,
effective implementation of this approach
requires a highly competent, confident evaluator
who can approach any situation flexibly without
compromising basic professional standards.
Strong skills of negotiation are essential, and the
evaluator(s) must possess expertise in the full
range of quantitative and qualitative evaluation
methods, strong communication and political
skills, and working knowledge of all applicable
standards for evaluations. Unfortunately, not
many evaluators are sufficiently trained and
experienced to meet these requirements.
Nevertheless, the utilization-focused approach is
tied for second in the ranking of the 22
approaches considered in this paper.
The utilization-focused approach to evaluation

concludes this paper’s discussion of the social
agenda/advocacy approaches to evaluation.
These four approaches concentrate on making
evaluation an instrument of social justice and
modesty and candor in presenting findings that
often are ambiguous and contradictory. Tables
13 through 18 summarize the similarities and
differences between these approaches in
relationship to advance organizers, purposes,
characteristic questions, methods, strengths, and
weaknesses.
64 Stufflebeam
Table 13: Comparison of the Four Social Agenda/Advocacy Evaluation Approaches on

Most Common ADVANCE ORGANIZERS
Advance Evaluation Approaches

Organizers
19. Client-Centered/ 20. Constructivist 21. Deliberative 22. Utilization-Focused
Responsive Democratic
Evaluation users U
Evaluation uses U
Stakeholders’ concerns
& issues in the program U U U
itself
Rationale for the U

program
Background of the U
program
Transactions/
operations in the U
program
Outcomes U
Standards U
Judgments U
Collaborative, unfolding
nature of the inquiry U U U
Constructivist U
perspective
Rejection of positivism U
Democratic participation U U U U
Dialogue with U
stakeholders to validate
their inputs

Primary EVALUATION PURPOSES
Evaluation Evaluation Approaches

Purposes
19. Client-Centered/ 20. Constructivist 21. Deliberative 22.
Responsive Democratic Utilization-Based
Inform stakeholders about a

program’s full countenance U
Conduct a continuous search

for key questions & provide
stakeholders with useful U U U
information as it becomes
available
Learn how various groups see

a program’s problems, U U
strengths, and weaknesses
Learn how stakeholders judge U U

a program
Learn how experts judge a U

program
Determine & make sense of a

variety of constructions about U
a program that exist among
stakeholders
Employ democratic
participation in arriving at a U
defensible assessment of a
program
Provide users the information

they need to fulfill their U U U U
objectives
66 Stufflebeam

Characteristic EVALUATION QUESTIONS
Characteristic Evaluation Approaches

Evaluation
Questions
19. Client-Centered/ 20. Constructivist 21. Deliberative 22. Utilization-Focused
Responsive Democratic
Were questions
negotiated with
stakeholders? U U U
What was achieved? U U
What were the impacts? U
How did the program

operate? U U
How do various
stakeholders judge the U U U
program?
How do experts judge

the program? U
What is the program’s

rationale? U U
What were the costs? U
What were the cost- U

benefits?

Main EVALUATION METHODS
Characteristic Evaluation Approaches

Methods
19. Client-Centered/ 20. Constructivist 21. Deliberative 22. Utilization-
Responsive Democratic Focused
Case study U U
Expressive objectives U
Purposive sampling U U
Observation U U
Adversary reports U
Story telling to convey

complexity U
Sociodrama to focus on issues U
Redundant data collection

procedures U
Collection & analysis of U

stakeholders’ judgments
Hermeneutics to identify U
alternative constructions
Dialectical exchange U
Consensus development U
Discussions with stakeholders U U
Surveys U U
Debates U
All relevant quantitative &

qualitative, formative & U
summative, & naturalistic &
experimental methods
68 Stufflebeam

Prevalent STRENGTHS
Strengths Evaluation Approaches

Helps stakeholders to conduct their own U

evaluations
Engages stakeholders to determine the

evaluation’s purposes & procedures U U
Stresses values clarification U
Looks deeply into stakeholders’ own interests U
Searches broadly for relevant information U
Examines rationale, background, process, & U

outcomes
Attends closely to contextual dynamics U U U
Identifies both side effects & main effects U U
Balances descriptive & judgmental information U
Meaningfully engages the full range of U U U

stakeholders
Engages a representative group of stakeholders

who are likely to apply the findings U
Empowers all stakeholders to influence & use

the evaluation for their purposes U
Collects & processes judgments from all U U U

interested stakeholders
Fully discloses the evaluation process & findings U
Educates all participants U
Both divergent & convergent in searching for U U

conclusions
Selectively employs all relevant evaluation U U

methods
Effectively uses qualitative methods U U U
Employs participants as evaluation instruments U
Triangulates findings from different sources U U U U
Focuses on the questions of interest to the U U U U

stakeholders
Directly works to make evaluations just U U U

Strengths Evaluation Approaches

Grounded in principles of democracy U
Assures democratic participation of stakeholders U

in all stages of the evaluation
Uses dialogue to examine & authenticate U

stakeholders’ inputs
Rules out incorrect or unethical inputs from U

stakeholders
Evaluator renders a final judgment, assuring U

closure
Geared to maximize evaluation impacts U
Promotes use of findings through stakeholder U U U U

involvement
Stresses effective communication of findings U U
Stresses need to meet all relevant standards for U

evaluations
70 Stufflebeam

Prevalent WEAKNESSES
Weaknesses Evaluation Approaches
19. Client- 20. Constructivist 21. Deliberative 22. Utilization-

Centered/ Democratic Focused
Responsive
May empower stakeholders to bias the U

evaluation
Evaluators may lose independence through U U U

advocacy
Divergent qualities may generate confusion U

& controversy
May bog down in an unproductive quest for

multiple inputs & interpretations U U
Time consuming to work through divergent U U

& convergent stages
Low feasibility of involving & sustaining U U U U

meaningful participation of all stakeholders
May place too much credence in abilities of U U

stakeholders to be credible informants
Thwarts individual accountability U
May be unacceptable to clients who are U U

looking for firm conclusions
Turnover of involved users may destroy the U

evaluation’s effectiveness
Empowered stakeholders may

inappropriately limit the evaluation to only U U
some of the important questions
Utopian, not yet developed for effective, U

efficient application
Open to possible bad influences on the

evaluation via stakeholders’ conflicts of U U U
interest
VI. BEST APPROACHES FOR 21ST CENTURY
EVALUATIONS
As shown in the preceding parts, a variety of Accreditation approach results from its being a
evaluation approaches emerged during the 20th labor intensive, expensive approach; its
century. Nine of these approaches appear to be susceptibility to conflict of interest; its
strongest and most promising for continued use overreliance on self-reports and brief site visits;
and development beyond the year 2000. As and its insular resistance to independent
shown in the preceding analyses, the other 13 metaevaluations. Nevertheless, the distinctly
approaches also have varying degrees of merit, American and pervasive accreditation approach
but I chose in this section to converge attention is entrenched. All who will use it are advised to
to the most promising approaches. The ratings strengthen it in the areas of weakness identified
of these 9 approaches appear in Table 19. They in this paper. The Consumer-Oriented approach
are listed in order of merit, within the categories also deserves its special place, with its emphasis
of Improvement/Accountability, Social on independent assessment of developed
Mission/Advocacy, and Questions/Methods products and services. While this consumer
evaluation approaches. The ratings are in protection approach is not especially applicable
relationship to the Joint Committee Program to internal evaluations for improvement, it
Evaluation Standards and were derived by the complements such approaches with the
author using a special checklist keyed to the outsider, expert view that becomes important
Standards.3 when products and services are put up for
dissemination.
All nine of the rated approaches earned overall
ratings of Very Good, except Accreditation, The Case Study approach scored surprisingly
which was judged Good overall. The well, considering that it is focused on use of a
Utilization-Focused and Client-Centered particular technique. An added bonus of this
approaches received Excellent ratings in the approach is that it can be employed as a
standards areas of Utility and Feasibility, while component of any of the other approaches, or it
the Decision/Accreditation approach was judged can be used by itself. As mentioned previously
Excellent in provisions for Accuracy. The in this paper, the Democratic Deliberative
rating of Good in the Accuracy area for the approach is new and appears to be promising for
Outcomes Monitoring/Value-Added approach testing and further development. Finally, the
was due not to low merit of what this approach’s Constructivist approach is a well-founded,
techniques, but to the narrowness of questions mainly qualitative approach to evaluation that
addressed and information used; in its narrow systematically engages interested parties to help
sphere of application the Outcomes conduct both the divergent and convergent
Monitoring/Value-Added approach provides stages of evaluation. All in all, the nine
technically sound information. The approaches summarized in Table 19 bode well
comparatively lower ratings given to the for the future application and further
development of alternative program evaluation
approaches.
3
The checklist used to evaluate each approach against
the Joint Committee Program Evaluation Standards appears in
this paper’s appendix.
72
Table 19: RATINGS Strongest Program Evaluation Approaches
Within types, listed in order of compliance with The Program Evaluation Standards
Evaluation Approach Graph of overall merit Overall UTILITY FEASIBILITY PROPRIETY ACCURACY
Score & Rating Rating Rating Rating
Rating
0 100
š P š F š G š VG š E š
IMPROVEMENT/ACCOUNTABILITY
Decision/Accountability ,,,,,,,,,,,,,,,,,, 92 (V G) 90 (V G) 92 (V G) 88 (V G) 98 (E)
Consumer Orientation ,,,,,,,,,,,,,,,, 81 (V G) 81 (V G) 75 (V G) 91 (V G) 81 (V G)
Accreditation ,,,,,,,,,,,, 60 (G) 71 (V G) 58 (G) 59 (G) 50 (G)
SOCIAL MISSION/ADVOCACY
,,,,,,,,,,,,,,,,,
Stufflebeam
Utilization-Based 87 (V G) 96 (E) 92 (E) 81 (V G) 79 (V G)
Client-Centered ,,,,,,,,,,,,,,,,, 87 (V G) 93 (E) 92 (E) 75 (V G) 88 (V G)
Democratic Deliberative ,,,,,,,,,,,,,,,, 83 (V G) 96 (E) 92 (V G) 75 (V G) 69 (V G)
Constructivist ,,,,,,,,,,,,,,,, 80 (V G) 82 (V G) 67 (V G) 88 (V G) 83 (V G)
QUESTIONS/METHODS
Case Study ,,,,,,,,,,,,,,,, 80 (V G) 68 (V G) 83 (V G) 78 (V G) 92 (V G)
Outcomes Monitoring/Value-Added ,,,,,,,,,,,,,, 72 (V G) 71 (V G) 92 (V G) 69 (V G) 56 (G)
The tests behind the ratings: The author rated each evaluation approach on each of the 30 Joint Committee program evaluation standards by judging whether the approach endorses each
of 10 key features of the standard. He judged the approach’s adequacy on each standard as follows: 9-10 Excellent, 7-8 Very Good, 5-6 Good, 3-4 Fair, 0-2 Poor. The score for the
approach on each of the 4 categories of standards (Utility, Feasibility, Propriety, Accuracy) was then determined by summing the following products: 4 x number of Excellent ratings, 3 x
number of Very Good ratings, 2 x number of Good ratings, 1 x number of Fair ratings. Judgments of the approach’s strength in satisfying each category of standards were then determined
according to percentages of the possible quality points for the category of standards as follows: 93%-100% Excellent, 68%-92% Very Good, 50% -67% Good, 25%-49% Fair, 0%-24%
Poor. This was done by converting each category score to the percent of the maximum score for the category and multiplied by 100. The 4 equalized scores were then summed, divided by
4, and compared to the total maximum value, 100. The approach’s overall merit was then judged as follows: 93-100 Excellent, 68-92 Very Good, 50-67 Good, 25-49 Fair, 0-24 Poor.
Regardless of the approach’s total score and overall rating, a notation of unacceptable would have been attached to any approach receiving a poor rating on the vital standards of P1 Service
Orientation, A5 Valid Information, A10 Justified Conclusions, A11 Impartial Reporting. The author’s ratings were based on his knowledge of the Joint Committee Program Evaluation
Standards, his many years of studying the various evaluation models and approaches, and his experience in seeing and assessing how some of these models and approaches worked in
practice. He chaired the Joint Committee on Standards for Educational Evaluation during its first 13 years and led the development of the first editions of both the program and personnel
evaluation standards. Nevertheless, his ratings should be viewed as only his personal set of judgments of these models and approaches. Also, his conflict of interest is acknowledged, since
he was one of the developers of the Decision/Accountability approach. The scale ranges in the above graphs are P =Poor, F=Fair, G=Good, VG=Very Good, E=Excellent.
Best Approaches for 21st Century Evaluation 73
Conclusions study. It is not believed, however, that

politically inspired and controlled studies
This completes the paper’s review of the 22 serve appropriate purposes in evaluating
approaches used to evaluate programs. As programs. Granted, they may be necessary in
stated at the paper’s beginning, a critical administration and public relations, but they
analysis of these approaches has important should not be confused with, or substituted
implications for the practitioner of evaluation, for, sound evaluation. Moreover, it is
the theoretician who is concerned with imperative to remember that no one type of
devising better concepts and methods, and study consistently is the best in evaluating
those engaged in professionalizing program programs. In the write-ups of the approaches,
evaluation. different ones are seen to work differentially
well depending on circumstances.
A main point for the practitioner is that
evaluators may encounter considerable For the theoretician, a main point to be
difficulties if their perceptions of the study gleaned from the review of the 22 types of
being undertaken differ from those of their studies is that they have inherent strengths
clients and audiences. Often, clients want a and weaknesses. In general, the weaknesses
politically advantageous study performed, of the politically oriented studies are that they
while the evaluators want to conduct are prone to manipulation by unscrupulous
questions/methods-oriented studies that allow persons and may help such people mislead an
them to exploit the methodologies in which audience into developing an unfounded,
they were trained. Moreover, audiences perhaps erroneous judgment of a program’s
usually want values-oriented studies that will merit and worth. The main problem with the
help them determine the relative merits and questions/methods-oriented studies is that
worths of competing programs, or advocacy they often address questions that are more
evaluations that will give them voice in the narrow in scope than the questions needing to
issues that affect them. If evaluators are be addressed in a true assessment of merit and
ignorant of the likely conflicts in purposes, worth. However, it is also noteworthy that
the program evaluation is probably doomed to these types of studies compete favorably with
failure from the start. The moral is, at the i mp r o ve me n t / a c c ou ntabili t y-o r i e n t e d
onset of the study, evaluators must be keenly evaluation studies and social agenda/advocacy
sensitive to their own agendas for an studies in the efficiency of methodology and
evaluation study as well as those that are held technical adequacy of information employed.
by the client and the other right-to-know Also, the improvement/accountability-
audiences. Further, the evaluator should oriented studies with their concentration on
advise involved parties of possible conflicts in merit and worth undertake a very ambitious
the evaluation’s purposes and should, at the task, for it is virtually impossible to fully and
beginning, negotiate a common understanding unequivocally assess any program’s ultimate
of the evaluation’s purpose and the worth. Such an achievement would require
appropriate approach. omniscience, infallibility, an unchanging
environment, and an unquestioned, singular
Presented alternatives legitimately could be value base. Nevertheless, the continuing
either a questions/methods (quasi-evaluation) attempt to consider questions of merit and
study directed at assessing particular worth certainly is essential for the
questions, an improvement/accountability- advancement of societal programs. Finally,
oriented study, or a social agenda/advocacy the social mission/advocacy studies are to be
74 Stufflebeam
applauded for their quest for equity as well as some of the authors’ hesitancy to accord the
excellence in the programs being studied. status of a model to their contributions or
They model their mission by attempting to inclination to label them as utopian. As also
make evaluation a participatory, democratic seen in the paper, there are some approaches
enterprise. Unfortunately, many pitfalls that in the main seem to be a waste of time or
attend such utopian approaches to evaluation. even counterproductive.
Especially, these include susceptibility to bias
and political subversion of the study and Theoreticians should diagnose strengths and
practical constraints on involving, informing, weaknesses of existing approaches, and they
and empowering all the stakeholders. should do so in more depth than demonstrated
here. They should use these diagnoses to
For the evaluation profession itself, the evolve better, more defensible approaches and
review of program evaluation models to help expunge the use of hopelessly flawed
underscores the importance of evaluation approaches; they should work with
standards and metaevaluations. Professional practitioners to operationalize and test the
standards are needed to obtain a consistently new approaches; and, of course, both groups
high level of integrity in uses of the various should collaborate in developing still better
program evaluation approaches. All approaches. Such an ongoing process of
legitimate approaches are enhanced when critical review and development is essential if
keyed to and assessed against professional the field of program evaluation is not to
standards for evaluations. In addition, stagnate, but instead is to provide vital
benefits from evaluations are enhanced when support for advancing programs and services.
they are subjected to independent review
through metaevaluations. Therefore, it is necessary, indeed essential,
that evaluators develop a repertoire of
As evidenced in this paper, the last half of the different program evaluation approaches so
20th century saw considerable development of they can selectively apply them individually or
program evaluation approaches. Many of the in combination to best advantage. Going out
approaches introduced in the 1960s and 1970s on the proverbial limb, but also based on
have been extensively refined and applied. the preceding analysis, the best approaches
The category of social agenda/advocacy seem to be decision/accountability,
models has emerged as a new and important utilization-based, client-centered, consumer-
part of the program evaluation cornucopia. oriented, case study, democratic deliberative,
There is among the approaches an constructivist, accreditation, and outcomes
increasingly balanced quest for rigor, monitoring. The worst bets, in my judgment,
relevance, and justice. Clearly, the are the politically controlled, public relations,
approaches are showing a strong orientation accountability (especially payment by results),
to stakeholder involvement and use of clarification hearings, and program theory-
multiple methods. based approaches. The rest fall somewhere in
the middle. While House and Howe’s (1998)
Recommendations democratic deliberative approach is new and
in their view utopian, it has many elements of
In spite of the progress described above, there a sound, effective evaluation approach and
is clearly a need for continuing efforts to merits study, further development, and trial.
develop and implement better approaches to
program evaluation. This is illustrated by
Evaluation training programs should Notes

effectively address the ferment over and
development of new program evaluation 1. Stake, R. E. Nine approaches to
approaches. Evaluation trainers should evaluation. Unpublished chart. Urbana,
directly teach their students about the Illinois: Center for Instructional Research
expanding and increasingly sophisticated and Curriculum Evaluation, 1974.
program evaluation approaches. These 2. Hastings, T. A portrayal of the changing
approaches will serve well when evaluators evaluation scene. Keynote speech at the
can discern which approaches are worth using annual meeting of the Evaluation
and which are not, when they clearly Network, St. Louis, Missouri, 1976.
understand the worthy approaches, and 3. Guba, E. G. Alternative perspectives on
provided they know when and how to apply evaluation. Keynote speech at the annual
them. The most likely scenario is that present meeting of the Evaluation Network, St.
approaches will be extended and refined Louis, Missouri, 1976.
rather than completely new approaches being 4. Presentation by Robert W. Travers in a
developed. Therefore, a knowledge of these seminar at the Western Michigan
approaches is very important. University Evaluation Center,
Kalamazoo, Michigan, October 24, 1977.
In addition, evaluators should regularly train 5. Stenner, A. J., and Webster, W. J. (Eds.)
the participants in their evaluations in the Technical auditing procedures.
selected approach’s logic, rationale, process, Educational product audit handbook, 38-
and pitfalls. This will enhance the 103. Arlington, Virginia: Institute for
stakeholders’ cooperation and constructive the Development of Educational
use of findings. Auditing, 1971.
6. Eisner, E. W. The perceptive eye:
Finally, evaluators are advised to adopt and Toward the reformation of evaluation.
regularly apply professional standards for Paper presented at the annual meeting of the
sound program evaluations. They should use American Educational Research
the standards to guide development of better Association, Washington, DC, March
evaluation approaches. They should apply 1975.
them in choosing and tailoring approaches. 7. Webster, W. J. The organization and
They should engage external evaluators to functions of research and evaluation in
apply the standards in assessing evaluations large urban school districts. Paper
through the process called metaevaluation. presented at the annual meeting of the
They should also contribute to improvements American Educational Research
in the professional standards. In accordance Association, Washington, DC, March
with The Program Evaluation Standards 1975.
(Joint Committee, 1994), program evaluators 8. Glass, G. V. Design of evaluation
should develop and selectively apply studies. Paper presented at the Council
evaluation approaches that in the particular for Exceptional Children Special
contexts will meet the conditions of utility, Conference on Early Chi ldhood
feasibility, propriety, and accuracy. Education, New Orleans, Louisiana,
1969.
76 Stufflebeam
Bibliography Bloom, B. S., Englehart, M. D., Furst, E.

J., Hill, W. H., & Krathwohl, D. R. (1956).
Aguaro, R. (1990). R. Deming: The Taxonomy of educational objectives:
American who taught the Japanese about Handbook I: Cognitive domain. New York:
quality. New York: Fireside. David McKay.
Alkin, M. C. (1969). Evaluation theory Boruch, R. F. (1994). The future of

development. Evaluation Comment, 2, 2-7. controlled randomized experiments: A
briefing. Evaluation Practice, 15(3), 265-274.
Alkin, M. C. (1995, November). Lessons
learned about evaluation use. Panel Bryk, A. S. (Ed.) (1983). Stakeholder-
presentation at the International Evaluation based evaluation. San Francisco: Jossey-Bass.
Conference, American Evaluation
Association, Vancouver, British Columbia. Campbell, D. T. (1975). Degrees of
freedom and the case study. Comparative
Baker, E. L, O’Neil, H. R., & Linn, R. L. Political Studies, 8, 178-193.
(1993). Policy and validity prospects for
performance-based assessment. American Campbell, D. T., & Stanley, J. C. (1963).
Psychologist, 48, 1210-1218. Experimental and quasi-experimental designs
for research on teaching. In N. L. Gage (Ed.),
Bandura, A. (1977). Social learning Handbook of research on training. Chicago:
theory. Englewood Cliffs, NJ: Prentice-Hall. Rand McNally.
Bayless, D., & Massaro, G. (1992). Campbell, D. T., & Stanley, J. C. (1966).
Quality improvement in education today and Experimental and quasi-experimental designs
the future: Adapting W. Edwards Deming’s for research. Boston, MA: Houghton Mifflin.
quality improvement principles and methods
to education. Kalamazoo, MI: Center for Chen, H. (1990). Theory driven
Research on Educational Accountability and evaluations. Newbury Park, CA: Sage.
Teacher Evaluation.
Coffey, A., & Atkinson, P. (1996).
Becker, M. H. (Ed.) (1974). The health M a ki n g se n s e o f q ua l i t a t i v e da t a :
belief model and personal health behavior Complementary research strategies.
[Entire issue]. Health Education Monographs, Thousand Oaks, CA: Sage.
2, 324-473.
Cook, D. L. (1966). Program evaluation
Bhola, H. S. (1998). Program evaluation and review techniques, applications in
for program renewal: A study of the national education. U.S. Office of Education
literacy program in Namibia (NLPN). Studies Cooperative Monograph, 17 (OE-12024).
in Educational Evaluation, 24(4), 303-330.
Cronbach, L. J. (1963). Course
Bickman, L. (1990). Using program improvement through evaluation. Teachers
theory to describe and measure program quality. College Record, 64, 672-83.
In L. Bickman (Ed.), Advances in
Program Theory. New Directions in Program
Evaluation. San Francisco: Jossey-Bass.
Cronbach, L. J. (1982). Designing American Educational Research Association,

evaluations of educational and social Washington, DC.
programs. San Francisco: Jossey-Bass.
Eisner, E. W. (1983). Educational
Cronbach, L. J., & Associates. (1980). connoisseurship and criticism: Their form and
Toward reform of program evaluation. San functions in educational evaluation. In G. F.
Francisco: Jossey-Bass. Madaus, M. Scriven, & D. L. Stufflebeam
(Eds.), Evaluation models. Boston: Kluwer-
Cronbach, L. J., & Snow, R. E. (1969). Nijhoff.
Individual differences in learning ability as a
function of instructional variables. Stanford, Ferguson, R.(1999, June). Ideological
CA: Stanford University Press. marketing. The Education Industry Report.
Davis, H. R., & Salasin, S. E. (1975). The Fetterman, D. (1989). Ethnography: Step
utilization of evaluation. In E. L. Struening & by step. Applied Social Research Methods
M. Guttentag (Eds.), Handbook of evaluation Series, 17. Newbury Park, CA: Sage.
research, Vol. 1. Beverly Hills, CA: Sage.
Fetterman, D. (1994, February).
Debus, M. (1995). Methodological Empowerment evaluation. Evaluation
review: A handbook for excellence in focus Practice, 15(1).
group research. Washington, DC: Academy
for Educational Development. Fetterman, D., Shakeh, J. K., &
Wandersman, (Eds.). (1996). Empowerment
Deming, W. E. (1986). Out of the crisis. evaluation: Knowledge and tools for self-
Cambridge, MA: Center for Advanced assessment & accountability. Thousand Oaks,
Engineering Study, Massachusetts Institute of CA: Sage.
Technology.
Fisher, R .A. (1951). The design of
Denny, T. (1978, November). Story experiments (6th ed.) New York: Hafner.
telling and educational understanding.
Occasional Paper No. 12. Kalamazoo, MI: Flanagan, J. C. (1939). General
Evaluation Center, Western Michigan considerations in the selection of test items
University. and a short method of estimating the product-
moment coefficient from data at the tails of
Denzin, N. K., & Lincoln, Y. S. (Eds.). the distribution. Journal of Educational
(1994). Handbook of qualitative research. Psychology, 30, 674-80.
Thousand Oaks, CA: Sage.
Flexner, A. (1910). Medical education in
Ebel, R. L. (1965). Measuring the United States and Canada. Bethesda, MD:
educational achievement. Englewood Cliffs, Science and Health Publications.
NJ: Prentice-Hall.
Flinders, D. J., & Eisner, E. W. (1994,
Eisner, E. W. (1975, March). The December). Educational criticism as a form of
perceptive eye: Toward a reformation of qualitative inquiry. Research in the Teaching
educational evaluation. Invited address, of English, 28(4), 341-356.
Division B, Curriculum and Objectives,
78 Stufflebeam
Glaser, B. G., & Strauss, A. L. Hambleton, R. K., & Swaminathan, H.

(1967).The discovery of grounded theory. (1985). Item response theory. Boston:
Chicago: Aldine. Kluwer-Nijhoff.
Glass, G. V. (1975). A paradox about Hammond, R. L. (1972). Evaluation at

excellence of schools and the people in them. the local level. (mimeograph). Tucson, AZ:
Educational Researcher, 4, 9-13. EPIC Evaluation Center.
Glass, G. V, & Maguire, T. O. (1968). Herman, J. L., Gearhart, M. G., & Baker,
Analysis of time-series quasi-experiments. E. L. (1993). Assessing writing portfolios:
(U.S. Office of Education Report No. 6- Issues in the validity and meaning of scores.
8329.) Boulder: Laboratory of Educational Educational Assessment, 1, 201-224.
Research, University of Colorado.
House, E. R. (1980). Evaluating with
Green, L. W., & Kreuter, M. W. (1991). validity. Beverly Hills, CA: Sage.
In He al t h pr om oti on planning: An
educational and environmental approach, 2nd House, E. R. (1983). Assumptions
Edition (pp. 22-30). Mountain View, CA: underlying evaluation models. In G. F.
Mayfield Publishing. Madaus, M. Scriven, & D. L. Stufflebeam
(Eds.), Evaluation models. Boston: Kluwer-
Greenbaum, T. L. (1993). The handbook Nijhoff.
of focus group research. New York:
Lexington Books. House, E. R. (1993). Professional
evaluation–Social impact and political
Guba, E. G. (1969). The failure of consequences. Newbury Park, CA: Sage.
e d u c a t i o n a l eva lu a t i o n . E d u c a t i o n a l
Technology, 9, 29-38. House, E. R., & Howe, K. R. (1998).
Deliberative democratic evaluation in
Guba, E. G. (1978). Toward a practice. Boulder: University of Colorado.
methodology of naturalistic inquiry in
evaluation. CSE Monograph Series in Janz, N. K., & Becker, M. H.. (1984).
Evaluation. Los Angeles: Center for the Study The health belief model: A decade later.
of Evaluation. Health Education Quarterly, 11, 1-47.
Guba, E. G., & Lincoln, Y. S. (1981). Joint Committee on Standards for

Effective evaluation. San Francisco: Jossey- Educational Evaluation. (1981). Standards for
Bass. evaluations of educational programs,
projects, and materials. New York: McGraw-
Guba, E. G., & Lincoln, Y. S. (1989). Hill.
Fourth generation evaluation. Newbury Park,
CA: Sage. Joint Committee on Standards for
Educational Evaluation. (1994). The program
Hart, D. (1994). Authentic assessment: A evaluation standards: How to assess
handbook for educators. Menlo Park, CA: evaluations of educational programs.
Addison-Wesley. Thousand Oaks, CA: Sage.
Kaplan, A. (1964). The conduct of Koretz, D. (1996). Using student

inquiry. San Francisco: Chandler. assessments for educational accountability. In
R . Ha n u s he k ( E d. ) , I m provin g t h e
Karlsson, O. (1998). Socratic dialogue in performance of America’s schools, pp. 171-
the Swedish political context. In T. A. 196. Washington, DC: National Academy
Schwandt (Ed.), Scandinavian perspectives on Press.
the evaluator’s role in informing social
policy. New Directions for Evaluation, 77, Koretz, D. M., & Barron, S. I. (1998).
21-38. The validity of gains in scores on the
Kentucky Instructional Results Information
Kaufman, R. A. (1969, May). Toward System (KIRIS). Santa Monica, CA: Rand
educational system planning: Alice in Education.
educationland. Audiovisual Instructor, 14,
47-48. Kvale, S. (1995). The social construction
of validity. Qualitative Inquiry, 1, 19-40.
Kee, J. E. (1995). Benefit-cost analysis in
program evaluation. In J. S. Wholey, H. P. Lessinger, L. M. (1970). Every kid a
Hatry, & K. E. Newcomer, Handbook of winner: Accountability in education. New
practical program evaluation, pp. 456-488. York: Simon and Schuster.
San Francisco: Jossey-Bass.
Levin, H. M. (1983). Cost-effectiveness:
Kentucky Department of Education. A primer. New Perspectives in Evaluation, 4.
(1993). Kentucky results information system, Newbury Park, CA: Sage.
1991-92 technical report. Frankfort, KY:
Author. Levine, M. (1974, September). Scientific
method and the adversary model. American
Kidder, L., & Fine, M. (1987). Psychologist, 666-677.
Qualitative and quantitative methods: When
stories converge. Multiple methods in Lincoln, Y. S., & Guba, E. G. (1985).
program evaluation. New Directions for Naturalistic inquiry. Beverly Hills, CA: Sage.
Program Evaluation, 35. San Francisco:
Jossey-Bass. Lindquist, E. F. (Ed.) (1951). Educational
measurement. Washington, DC: American
Kirst, M. W. (July, 1990). Accountability: Council on Education.
Implications for state and local policymakers.
In Policy Perspectives Series. Washington, Lindquist, E. F. (1953). Design and analysis
DC: Information Services, Office of o f e x pe r i m en t s i n p s y c h o l o g y a nd
Educational Research and Improvement, U.S. education. Boston: Houghton-Mifflin.
Department of Education.
Linn, R. L., Baker, E. L., & Dunbar, S. B.
Koretz, D. (1986). The validity of gains in ( 1 9 91). Complex, p e r f or ma n c e -b a s e d
scores on the Kentucky Instructional Results assessment: Expectations and validation
Information System (KIRIS). Santa Monica, criteria. Educational Researcher, 20(8), 15-
CA: Rand Education. 21.
80 Stufflebeam
Lofland, J., & Lofland, L. H. (1995). Miron, G. (1998). Chapter in Lene

Analyzing social settings: A guide to Buchert (Ed.), Education reform in the south
qualitative observation and analysis, 3rd Ed. in the 1990s. Paris: UNESCO.
Belmont, CA: Wadsworth.
Mullen, P. D., Hersey, J., & Iverson, D.
Lord, F. M., & Novick, M. R. (1968). C. (1987). Health behavior models compared.
Statistical theories of mental test scores. Social Science and Medicine, 24, 973-981.
Reading, MA: Addison-Wesley.
National Science Foundation. (1993). User-
MacDonald, B. (1975). Evaluation and friendly handbook for project evaluation: Science,
the control of education. In D. Tawney (Ed.), mathematics, engineering and technology
Evaluation: The state of the art. London: education. NSF 93-152. Arlington, VA: Author.
Schools Council.
National Science Foundation. (1997).
McLean, R. A., Sanders, W. L., & Stroup, User-friendly handbook for mixed method
W. W. (1991). A unified approach to mixed evaluations. NSF 97-153. Arlington, VA:
linear models. The American Statistician, 45, Author.
54-64.
Nave, B., Misch, E. J., & Mosteller. (In
Madaus, G. F., & Stufflebeam, D. L.; press). A rare design: The role of field trials
(1988) Educational evaluation: The classical in evaluating school practices. In G. Madaus,
writings of Ralph W. Tyler. Boston: Kluwer. D. L. Stufflebeam, & T. Kellaghan (Eds.),
Ev al uation models. Bos t o n : K l u w e r
Mehrens, W. A. (1972). Using Academic Publishers.
performance assessment for accountability
purposes. Educational Measurement: Issues Nevo, D. (1993). The evaluation minded
and Practice, 11(1), 3-10. school: An application of perceptions from
program evaluation. Evaluation Practice,
Merton, R. K., Fiske, M., & Kendall, P. 14(1), 39-47.
L. (1990). The focused interview: A manual of
problems and procedures, 2nd Ed. New York: Owens, T. (1973). Educational evaluation
The Free Press. by adversary proceeding. In E. House (Ed.),
School evaluation: The politics and process.
Messick, S. (1994). The interplay of Berkeley, CA: McCutchan.
evidence and consequences in the validation
of performance assessments. Educational Parlett, M., & Hamilton, D. (1972).
Researcher, 23(3), 13-23. Evaluation as illumination: A new approach
to the study of innovatory programs.
Metfessel, N. S., & Michael, W. B. Edinburgh: Centre for Research in the
(1967). A paradigm involving multiple Educational Sciences, University of
criterion measures for the evaluation of the Edinburgh, Occasional Paper No. 9.
effectiveness of school programs.
Educational and Psychological Measurement, Patton, M. Q. (1980). Qualitative
27, 931-43. evaluation methods. Beverly Hills, CA: Sage.
Miles, M. B., & Huberman, A. M. (1994). Patton, M. Q. (1982). Practical

Qualitative data analysis: An expanded evaluation. Beverly Hills, CA: Sage.
sourcebook. Thousand Oaks, CA: Sage.
Patton, M. Q. (1990). Qualitative Rippey, R. M. (Ed.). (1973). Studies in

evaluation and research methods, 2nd Ed. transactional evaluation. Berkeley, CA:
Newbury Park, CA: Sage. McCutcheon.
Patton, M. Q. (1994). Developmental Rogers, P. R. (In press). Program theory:

evaluation. Evaluation Practice, 15(3), 311- Not whether programs work but how they
319. work. In G. Madaus, D. L. Stufflebeam, & T.
Kellaghan (Eds.), Evaluation models. Boston:
Patton, M. Q. (1997). Utilization-focused Kluwer Academic Publishers.
evaluation: The new century text (3rd Ed.).
Newbury Park, CA: Sage. Rossi, P. H., & Freeman, H. E. (1993).
Evaluation: A systematic approach (5th ed.).
Peters, T. J., & Waterman, R. H. (1982). Newbury Park, CA: Sage.
In search of excellence. New York: Warner
Books. Sanders, W. L. (1989). Using customized
standardized tests. (Contract No. R-88-
Platt, J. (1992). Case study in American 062003) Washington, DC: Office of
methodological thought. Current Sociology, Educational Research and Improvement, U. S.
40(1), 17-48. Department of Education. (ERIC Digest No.
ED 314429)
Popham, W. J. (1969). Objectives and
instruction. In R. Stake (Ed.), Instructional Sanders, W. L., & Horn, S. P. (1994). The
objectives. AERA Monograph Series on Tennessee value-added assessment system
Curriculum Evaluation, (Vol. 3). Chicago: (TVAAS): Mixed model methodology in
Rand McNally. educational assessment. Journal of Personnel
Evaluation in Education, 8(3) 299-311.
Popham, W. J., & Carlson, D. (1983).
Deep dark deficits of the adversary evaluation Schatzman, L., & Strauss, A. L. (1973).
model. In G. F. Madaus, M. Scriven, & D. L. Field research. Englewood Cliffs, NJ:
Stufflebeam, (Eds.), Evaluation models. Prentice-Hall.
Boston: Kluwer-Nijhoff.
Schwandt, T. A. (1984). An examination
Prochaska, J. O., & DiClemente, C. C. of alternative models for socio-behavioral
(1992). Stages of change in the modification inquiry. Unpublished Ph.D. dissertation,
of problem behaviors. In M. Hersen, R. M. Indiana University.
Eisler, & P. M. Miller (Eds.), Progress in
behavior modification, 28. Sycamore, IL: Scriven, M. S. (1967). The methodology
Sycamore Publishing Company. of evaluation. In R. E. Stake (Ed.) Curriculum
evaluation. AERA Monograph Series on
Provus, M. N. (1969). Discrepancy Curriculum Evaluation (Vol. 1). Chicago:
evaluation model. Pittsburgh: Pittsburgh Rand McNally.
Public Schools.
Scriven, M. (1974). Evaluation
Provus, M. N. (1971). Discrepancy perspectives and procedures. In W. J. Popham
evaluation. Berkeley, CA: McCutcheon. (Ed.), Evaluation in education: Current
applications. Berkeley, CA: McCutcheon.
82 Stufflebeam
Scriven, M. (1991). Evaluation thesaurus. Stake, R. E. (1967). The countenance of

Newbury Park, CA: Sage. educational evaluation. Teachers College
Record, 68, 523-540.
Scriven, M. (1993, Summer). Hard-won
lessons in program evaluation. New Stake, R. E. (1970). Objectives, priorities,
Directions. San Francisco: Jossey-Bass. and other judgment data. Review of
Educational Research, 40, 181-212.
Scriven, M. (1994a). Evaluation as a
discipline. Studies in Educational Evaluation, Stake, R. E. (1971). Measuring what
20(1), 147-166. learners learn. (mimeograph). Urbana, IL:
Center for Instructional Research and
Scriven, M. (1994b). The final synthesis. Curriculum Evaluation.
Evaluation Practice, 15(3), 367-382.
Stake, R. E. (1975a). Evaluating the arts
Scriven, M. (1994c). Product evaluation: in education: A responsive approach.
The state of the art. Evaluation Practice, Columbus, OH: Merrill.
15(1), 45-62.
Stake, R. E. (1975b, November).
Seidman, I. E. (1991). Interviewing as Program evaluation: Particularly responsive
qualitative research: A guide for researchers evaluation. Kalamazoo: Western Michigan
in education and social sciences. New York: University Evaluation Center, Occasional
Teachers College Press. Paper No. 5.
Shadish, W. R., Cook, T. D., & Leviton, Stake, R. E. (1976). A theoretical

L. C. (1991). Foundations of program statement of responsive evaluation. Studies in
evaluation. Newbury Park, CA: Sage. Educational Evaluation, 2, 19-22.
Smith, M. F. (1989). Evaluability Stake, R. E. (1978). The case-study

assessment: a practical approach. Boston: method in social inquiry. Education
Kluwer Academic Publishers. Researcher, 7, 5-8.
Smith, N. L. (1987). Toward the Stake, R. E. (1979). Should educational

justification of claims in evaluation research. evaluation be more objective or more
Evaluation and program planning, 10(4), subjective? Educational Evaluation and
309-314. Policy Analysis.
Smith, L. M., & Pohland, P. A. (1974). Stake, R. E. (1983). Program evaluation,

Educational technology and the rural particularly responsive evaluation. In G. F.
highlands. In L. M. Smith (Ed.), Four Madaus, M. Scriven, & D. L. Stufflebeam
examples: Economic, anthropological, (Eds.), Evaluation models, pp. 287-310.
narrative, and portrayal (AERA Monograph Boston: Kluwer-Nijhoff.
on Curriculum Evaluation). Chicago: Rand
McNally. Stake, R. E. (1988). Seeking sweet water.
In R. M. Jaeger (Ed.), Complementary
methods for research in education, pp. 253-
300. Washington, DC: American Educational
Research Association.
Stake, R. E. (1994). Case studies. In N. K. Stufflebeam, D. L. (1997). A standards-

Denzin & Y. S. Lincoln (Eds.), Handbook of based perspective on evaluation. In R. L.
qualitative research, pp. 236-247. Thousand Stake, Advances in program evaluation, 3,
Oaks, CA: Sage. pp. 61-88.
Stake, R. E. (1995). The art of case study Stufflebeam, D. L., Foley, W. J., Gephart,
research. Thousand Oaks, CA: Sage. W. J., Guba, E. G., Hammond, R. L.,
Merriman, H. O., & Provus, M. M. (1971).
Stake, R. E., & Easley, J. A., Jr. (Eds.) (1978). Educational evaluation and decision making.
Case studies in science education, Itasca, IL: Peacock.
1(2). NSF Project 5E-78-74. Urbana, IL:
CIRCE, University of Illinois College of Stufflebeam, D. L., & Shinkfield, A. J.
Education. (1985). Systematic evaluation. Boston:
Kluwer-Nijhoff.
Stake, R. E., & Gjerde, C. (1971). An
evaluation of TCITY: The Twin City Institute Suchman, E. A. (1967). Evaluative
for Talented Youth. Kalamazoo, MI: Western research. New York: Rus sell Sage
Michigan University Evaluation Center, Foundation.
Occasional Paper Series No. 1.
Swanson, D. B., Norman, R. N., & Linn,
Steinmetz, A. (1983). The discrepancy R. L. (1995 June/July). Performance-based
evaluation model. In G. F. Madaus, M. assessment: Lessons from the health
Scriven, & D. L. Stufflebeam (Eds.), professions. Educational Researcher, 24(5),
Evaluation models, pp. 79-100. Boston: 5-11.
Kluwer-Nijhoff.
Tennessee Board of Education. (1992).
Stillman, P. L., Haley, H. A., Regan, M. The master plan for Tennessee schools 1993.
B., Philbin, M. M., Smith, S. R., O’Donnell, Nashville: Author.
J., & Pohl, H. (1991). Positive effects of a
clinical performance assessment program. Thorndike, R. L. (1971). Educational
Academic Medicine, 66, 481-483. measurement (2nd ed.). Washington, DC:
American Council on Education.
Stufflebeam, D. L. (1966, June). A depth
study of the evaluation requirement. Theory Torrance, H. (1993). Combining
Into Practice, 5, 121-34. me as ur ement -d ri ve n i ns tr uc ti on wi th
a u t h e n t i c a s s e s s me n t : S o me i n i t i a l
Stufflebeam, D. L. (1967, June). The use observations of national assessment in
of and abuse of evaluation in Title III. Theory England and Wales. Educational Evaluation
Into Practice, 6, 126-33. and Policy Analysis, 15, 81-90.
Tsang, M. C. (1997, Winter). Cost

analysis for improved educational
84 Stufflebeam
policymaking and evaluation. Educational Weiss, C. H. (1995). Nothing as practical

Evaluation and Policy Analysis, 19(4), 318- as good theory: Exploring theory-based
324. evaluation for comprehensive community
initiatives for children and families. In J.
Tyler, R. W., et al. (1932). Service studies Connell, A. Kubisch, L. B. Schorr, & C. H.
in higher education. Columbus, OH: The Weiss (Eds.), New approaches to evaluating
Bureau of Educational Research, The Ohio community initiatives. New York: Aspen
State University. Institute.
Tyler, R. W. (1942). General statement on Weitzman, E. A., & Miles, M. B. (1995).

evaluation. Journal of Educational Research, A software sourcebook: Computer programs
35, 492-501. for qualitative data analysis. Thousand Oaks,
CA: Sage.
Tyler, R. W. (1950). Basic principles of
curriculum and instruction. Chicago: Wholey, J. S. (1995). Assessing the
University of Chicago Press. feasibility and likely usefulness of evaluation.
In J. S. Wholey, H. P. Hatry, & K. E.
Tyler, R. W. (1966). The objectives and Newcomer. (1995). Handbook of practical
plans for a national assessment of educational program evaluation, pp. 15-39. San
progress. Journal of Educ ational Francisco: Jossey-Bass.
Measurement, 3, 1-10.
Wiggins, G. (1989). A true test: Toward
Tymms, P. (1995). Setting up a national more authentic and equitable assessment. Phi
“value-added” system for primary education Delta Kappan, 70, 703-713.
in England: Problems and possibilities. Paper
presented at the National Evaluation Institute, Wiley, D. E., & Bock, R. D. (1967,
Kalamazoo, MI. Winter). Quasi-experimentation in educational
settings: Comment. The School Review, 353-
Vallance, E. (1973). Aesthetic criticism and 66.
curriculum description. Ph.D.
dissertation, Stanford University. Wolcott, H. F. (1994). Transforming
qualitative data: Description, analysis and
Webster, W. J. (1995). The connection interpretation. Thousand Oaks, CA: Sage.
between personnel evaluation and school
evaluation. Studies in Educational Wolf, R. L. (1975, November). Trial by
Evaluation, 21, 227-254. jury: A new evaluation method. Phi Delta
Kappan, 3(57), 185-87.
Webster, W. J., Mendro, R. L., &
Almaguer, T. O. (1994). Effectiveness Worthen, B. R., & Sanders, J. R. (1987).
indices: a “value-added” approach to E d u c a t i o n a l e v a l u a t i o n : Al t e r n a t i v e
measuring school effect. Studies in approaches and practical guidelines. White
Educational Evaluation, 20, 113-145. Plains, NY: Longman.
Weiss, C. H. (1972). Evaluation. Worthen, B. R., Sanders, J. R., &

Englewood Cliffs, NJ: Prentice Hall. Fitzpatrick, J. L. (1997). Program evaluation,
2nd ed. New York: Longman.
Yin, R. K. (1989). Case study research:

Design and method. Newbury Park, CA:
Sage.
Yin, R. K. (1992). The case study as a

tool for doing evaluation. Current Sociology,
40(1), 121-137.
APPENDIX
Checklist for Rating Evaluation Approaches in Relationship to

The Joint Committee Program Evaluation Standards
Appendix 89
METAEVALUATION CHECKLIST:
for Evaluating Evaluation Models against The Program Evaluation Standards
To meet the requirements for UTILITY, evaluations using the

Evaluation model should:
U1 Stakeholder Identification U3 Information Scope and Selection

Clearly identify the evaluation client Understand the client’s most important evaluation
Engage leadership figures to identify other stakeholders requirements
Consult potential stakeholders to identify their Interview stakeholders to determine their different
information needs perspectives
Use stakeholders to identify other stakeholders Assure that evaluator & client negotiate pertinent
audiences, questions, & required information
With the client, rank stakeholders for relative
importance Assign priority to the most important stakeholders
Arrange to involve stakeholders throughout the Assign priority to the most important questions
evaluation Allow flexibility for adding questions during the
Keep the evaluation open to serve newly identified evaluation
stakeholders Obtain sufficient information to address the
Address stakeholders’ evaluation needs stakeholders’ most important evaluation questions
Serve an appropriate range of individual stakeholders Obtain sufficient information to assess the program’s
merit
Serve an appropriate range of stakeholder
organizations Obtain sufficient information to assess the program’s
worth
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor Allocate the evaluation effort in accordance with the
priorities assigned to the needed information
U2 Evaluator Credibility
Engage competent evaluators 9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
Engage evaluators whom the stakeholders trust

U4 Values Identification
Engage evaluators who can address stakeholders’
Consider alternative sources of values for interpreting
concerns
evaluation findings
Engage evaluators who are appropriately responsive to
Provide a clear, defensible basis for value judgments
issues of gender, socioeconomic status, race, &
language & cultural differences Determine the appropriate party(s) to make the
valuational interpretations
Assure that the evaluation plan responds to key
stakeholders’ concerns Identify pertinent societal needs
Help stakeholders understand the evaluation plan Identify pertinent customer needs
Give stakeholders information on the evaluation Reference pertinent laws
Plan’s technical quality and practicality Reference, as appropriate, the relevant institutional
mission
Attend appropriately to stakeholders’ criticisms &
suggestions Reference the program’s goals
Stay abreast of social & political forces Take into account the stakeholders’ values
Keep interested parties informed about the evaluation’s As appropriate, present alternative interpretations
progress based on conflicting but credible value bases
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
90 Stufflebeam
U5 Report Clarity Make arrangements to provide follow-up assistance in

Clearly report the essential information interpreting & applying the findings
Issue brief, simple, & direct reports
Focus reports on contracted questions
Describe the program & its context Scoring the Evaluation M odel for UTILITY
Describe the evaluation’s purposes, procedures, &
findings Add the following:
Support conclusions & recommendations No. of Excellent ratings (0-7) x4=
Avoid reporting technical jargon No. of Very Good (0-7) x3=
Report in the language(s) of stakeholders No. of Good (0-7) x2=
No. of Fair (0-7) x1=
Provide an executive summary
Provide a technical report
Total score: =
Strength of the M odel’s Provisions for UTILITY
U6 Report Timeliness and Dissemination
Make timely interim reports to intended users 26 (93%) to 28: Excellent
Deliver the final report when it is needed 19 (68%) to 25: Very Good
Have timely exchanges with the program’s policy 14 (50%) to 18: Good
board 7 (25%) to 13: Fair
Have timely exchanges with the program’s staff 0 (0%) to 5: Poor
Have timely exchanges with the program’s customers
Have timely exchanges with the public media
Have timely exchanges with the full range of right-to- (Total score) ÷ 28 = x 100 =
know audiences
Employ effective media for reaching &informing the
different audiences
Keep the presentations appropriately brief
Use examples to help audiences relate the findings to
practical situations
U7 Evaluation Impact
Maintain contact with audiences
Involve stakeholders throughout the evaluation
Encourage and support stakeholders’ use of the
findings
Show stakeholders how they might use the findings in
their work
Forecast and address potential uses of findings
Provide interim reports
Make sure that reports are open, frank, & concrete
Supplement written reports with ongoing oral
communication
Conduct feedback workshops to go over & apply
findings
Appendix 91
To meet the requirements for FEASIBILITY, evaluations using the

evaluation model should:
F1 Practical Procedures F3 Cost Effectiveness

Tailor methods & instruments to information Be efficient
requirements Make use of in-kind services
Minimize disruption Produce information worth the investment
Minimize the data burden Inform decisions
Appoint competent staff Foster program improvement
Train staff Provide accountability information
Choose procedures that the staff are qualified to carry out Generate new insights
Choose procedures in light of known constraints Help spread effective practices
Make a realistic schedule Minimize disruptions
Engage locals to help conduct the evaluation Minimize time demands on program personnel
As appropriate, make evaluation procedures a part of
routine events
Scoring the Evaluation M odel for FEASIBILITY
F2 Political Viability
Anticipate different positions of different interest groups Add the following:
Avert or counteract attempts to bias or misapply the No. of Excellent ratings (0-3) x4=
findings No. of Very Good (0-3) x3=
Foster cooperation No. of Good (0-3) x2=
Involve stakeholders throughout the evaluation No. of Fair (0-3) x1=
Agree on editorial & dissemination authority
Total score: =
Issue interim reports
Report divergent views Strength of the Model’s Provisions for FEASIBILITY
Report to right-to-know audiences
Employ a firm public contract 11 (93%) to 12: Excellent
Terminate any corrupted evaluation 8 (68%) to 10: Very Good
6 (50%) to 7: Good
3 (25%) to 5: Fair
0 (0%) to 2: Poor
(Total score) ÷ 12 = x 100 =

92 Stufflebeam
To meet the requirements for PROPRIETY, evaluations using the

P1 Service Orientation P4 Human Interactions

Assess needs of the program’s customers Consistently relate to all stakeholders in a professional
Assess program outcomes against targeted customers’ manner
assessed needs Maintain effective communication with stakeholders
Help assure that the full range of rightful program Follow the institution’s protocol
beneficiaries are served
Minimize disruption
Promote excellent service
Honor participants’ privacy rights
Make clear to stakeholders the evaluation’s service
orientation Honor time commitments
Identify program strengths to build on Be alert to & address participants’ concerns about the
Identify program weaknesses to correct evaluation
Give interim feedback for program improvement Be sensitive to participants’ diversity of values &
cultural differences
Expose harmful practices
Be even-handed in addressing different stakeholders
Inform all right-to-know audiences of the program’s
positive & negative outcomes Do not ignore or help cover up any participant’s
incompetence, unethical behavior, fraud, waste, or
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor abuse
P2 Formal Agreements–Reach advance written 9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
agreements on:
P5 Complete and Fair Assessment
Evaluation purpose & questions
Assess & report the program’s strengths
Audiences
Assess & report the program’s weaknesses
Evaluation reports
Report on intended outcomes
Editing
Report on unintended outcomes
Release of reports
Give a thorough account of the evaluation’s process
Evaluation procedures & schedule
As appropriate, show how the program’s strengths
Confidentiality /anonymity of data could be used to overcome its weaknesses
Evaluation staff Have the draft report reviewed
Metaevaluation Appropriately address criticisms of the draft report
Evaluation resources
Acknowledge the final report’s limitations
Estimate & report the effects of the evaluation’s
limitations on the overall judgment of the program
P3 Rights of Human Subjects 9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
Make clear to stakeholders that the evaluation will

respect & protect the rights of human subjects
Clarify intended uses of the evaluation
Keep stakeholders informed
Follow due process
Uphold civil rights
Understand participant values
Respect diversity
Follow protocol
Honor confidentiality/anonymity agreements
Do no harm
Appendix 93
P6 Disclosure of Findings P8 Fiscal Responsibility

Define the right-to-know audiences Specify & budget for expense items in advance
Establish a contractual basis for complying with right- Keep the budget sufficiently flexible to permit
to-know requirements appropriate reallocations to strengthen the evaluation
Inform the audiences of the evaluation’s purposes & Obtain appropriate approval for needed budgetary
projected reports modifications
Assign responsibility for managing the evaluation
Report all findings in writing
finances
Report relevant points of view of both supporters &
Maintain accurate records of sources of funding &
critics of the program
expenditures
Report balanced, informed conclusions & Maintain adequate personnel records concerning job
recommendations allocations & time spent on the job
Show the basis for the conclusions & Employ comparison shopping for evaluation materials
recommendations Employ comparison contract bidding
Disclose the evaluation’s limitations Be frugal in expending evaluation resources
In reporting, adhere strictly to a code of directness, As appropriate, include an expenditure summary as
openness, & completeness part of the public evaluation report
Assure the reports reach their audiences
Scoring the Evaluation Model for PROPRIETY
P7 Conflict of Interest
Identify potential conflicts of interest early in the Add the following:
evaluation No. of Excellent ratings (0-8) x4=
Provide written, contractual safeguards against No. of Very Good (0-8) x3=
identified conflicts of interest No. of Good (0-8) x2=
Engage multiple evaluators No. of Fair (0-8) x1=
Maintain evaluation records for independent review
As appropriate, engage independent parties to assess Total score: =
the evaluation for its susceptibility or corruption by
conflicts of interest
When appropriate, release evaluation procedures, data, Strength of the Model’s Provisions for PROPRIETY
& reports for public review
Contract with the funding authority rather than the 30 (93%) to 32: Excellent
funded program 22 (68%) to 29: Very Good
Have internal evaluators report directly to the chief 16 (50%) to 21: Good
executive officer 8 (25%) to 15: Fair
Report equitably to all right-to-know audiences 0 (0%) to 7: Poor
Engage uniquely qualified persons to participate in the
evaluation, even if they have a potential conflict of
interest; but take steps to counteract the conflict
(Total score) ÷ 32 = x 100 =
94 Stufflebeam
To meet the requirements for ACCURACY, evaluations using the

A1 Program Documentation A3 Described Purposes and Procedures

Collect descriptions of the intended program from At the evaluation’s outset, record the client’s purposes
various written sources for the evaluation
Collect descriptions of the intended program from the Monitor & describe stakeholders’ intended uses of
client & various stakeholders evaluation findings
Describe how the program was intended to function Monitor & describe how the evaluation’s purposes stay
Maintain records from various sources of how the the same or change over time
program operated Identify & assess points of agreement & disagreement
As feasible, engage independent observers to describe among stakeholders regarding the evaluation’s
the program’s actual operations purposes
Describe how the program actually functioned As appropriate, update evaluation procedures to
Analyze discrepancies between the various descriptions accommodate changes in the evaluation’s purposes
of how the program was intended to function Record the actual evaluation procedures, as
Analyze discrepancies between how the program was implemented
intended to operate & how it actually operated When interpreting findings, take into account
Ask the client & various stakeholders to assess the the different stakeholders’ intended uses of the
accuracy of recorded descriptions of both the intended evaluation
and the actual program When interpreting findings, take into account
Produce a technical report that documents the the extent to which the intended procedures
program’s operations were effectively executed
Describe the evaluation’s pu rp oses and
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor procedures in the summary & full-length
evaluation reports
A2 Context Analysis As feasible, engage independent evaluators to
Use multiple sources of information to describe the monitor & evaluate the evaluation’s purposes &
program’s context procedures
Describe the context’s technical, social, political,
organizational, & economic features
Maintain a log of unusual circumstances
A4 Defensible Information Sources
Record instances in which individuals or groups
intentionally or otherwise interfered with the program Obtain information from a variety of sources
Use pertinent, previously collected information once
Record instances in which individuals or groups
validated
intentionally or otherwise gave special assistance to the
program As appropriate, employ a variety of data collection
methods
Analyze how the program’s context is similar to or
different from contexts where the program might be Document & report information sources
adopted Document, justify, & report the criteria &
Report those contextual influences that appeared to methods used to select information sources
significantly influence the program & that might be of For each source, define the population
interest to potential adopters For each population, as appropriate, define any
Estimate effects of context on program outcomes employed sample
Identify & describe any critical competitors to this Document, justify, & report the means used to
program that functioned at the same time & in the obtain information from each source
program’s environment Include data collection instruments in a technical
Describe how people in the program’s general area appendix to the evaluation report
perceived the program’s existence, importance, and Document & report any biasing features in the
quality obtained information
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor 9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
Appendix 95
A5 Valid Information A7 Systematic Information

Focus the evaluation on key questions Establish protocols for quality control of the evaluation
As appropriate, employ multiple measures to address information
each question Train the evaluation staff to adhere to the data
Provide a d etailed d escr iption of the protocols
constructs & behaviors about which Systematically check the accuracy of scoring & coding
information will be acquired When feasible, use multiple evaluators & check the
Assess & report what type of information each consistency of their work
employed procedure acquires Verify data entry
Train & calibrate the data collectors Proofread & verify data tables generated from
Document & report the data collection conditions & computer output or other means
process Systematize & control storage of the evaluation
Document how information from each procedure was information
scored, analyzed, & interpreted Define who will have access to the evaluation
Report & justify inferences singly & in combination information
Assess & report the comprehensiveness of Strictly control access to the evaluation information
the information provided by the procedures according to established protocols
as a set in relation to the information needed Have data providers verify the data they submitted
to answer the set of evaluation questions
Establish meaningful categories of
i n fo r ma ti o n b y i d en t i fyin g re gu l ar &
recurrent themes in information collected A8 Analysis of Quantitative Information
using qualitative assessment procedures Begin by conducting preliminary exploratory analyses
to assure the data’s correctness & to gain a greater
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor understanding of the data
Choose procedures appropriate for the evaluation
A6 Reliable Information questions and nature of the data
Identify and justify the type(s) & extent of reliability For each procedure specify how its key assumptions
claimed are being met
For each employed data collection device, specify the Report limitations of each analytic procedure,
unit of analysis including failure to meet assumptions
As feasible, choose measuring devices that in the past Employ multiple analytic procedures to check on
have shown acceptable levels of reliability for their consistency & replicability of findings
intended uses Examine variability as well as central tendencies
In reporting reliability of an instrument, assess & Identify & examine outliers & verify their correctness
report the factors that influenced the reliability, Identify & analyze statistical interactions
including the characteristics of the examinees, the data
Assess statistical significance & practical significance
collection conditions, & the evaluator’s biases
Use visual displays to clarify the presentation &
Check & report the consistency of scoring,
interpretation of statistical results
categorization, & coding
Train & calibrate scorers & analysts to produce 9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
consistent results
Pilot test new instruments in order to identify and
control sources of error
As appropriate, engage & check the consistency
between multiple observers
Acknowledge reliability problems in the final report
Estimate & report the effects of unreliability in the data
on the overall judgment of the program
96 Stufflebeam
A9 Analysis of Qualitative Information Participate in public presentations of the findings to

Focus on key questions help guard against & correct distortions by other
Define the boundaries of information to be used interested parties
Obtain information keyed to the important evaluation
questions
Verify the accuracy of findings by obtaining A12 M etaevaluation
confirmatory evidence from multiple sources,
including stakeholders Designate or define the standards to be used in judging
the evaluation
Choose analytic procedures & methods of
summarization that are appropriate to the evaluation Assign someone responsibility for documenting &
questions & employed qualitative information assessing the evaluation process & products
Employ both formative & summative metaevaluation
Derive a set of categories that is sufficient to
document, illuminate, & respond to the evaluation Budget appropriately & sufficiently for conducting the
questions metaevaluation
Test the derived categories for reliability & validity Record the full range of information needed to judge
Classify the obtained information into the validated the evaluation against the stipulated standards
analysis categories As feasible, contract for an independent
Derive conclusions & recommendations & metaevaluation
demonstrate their meaningfulness Determine & record which audiences will receive the
Report limitations of the referenced information, metaevaluation report
analyses, & inferences Evaluate the instrumentation, data collection, data
handling, coding, & analysis against the relevant
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor standards
Evaluate the evaluation’s involvement of and
A10 Justified Conclusions communication of findings to stakeholders against the
Focus conclusions directly on the evaluation questions relevant standards
Accurately reflect the evaluation procedures & Maintain a record of all metaevaluation steps,
findings information, & analyses
Limit conclusions to the applicable time periods,
contexts, purposes, & activities
Cite the information that supports each conclusion Scoring the Evaluation Model for ACCURACY
Identify & report the program’s side effects
Report plausible alternative explanations of the Add the following:
findings No. of Excellent ratings (0-12) x4=
Explain why rival explanations were rejected No. of Very Good (0-12) x3=
Warn against making common misinterpretations No. of Good (0-12) x2=
Obtain & address the results of a prerelease review of No. of Fair (0-12) x1=
the draft evaluation report
Report the evaluation’s limitations Total score: =
Strength of the Model’s Provisions for ACCURACY
A11 Impartial Reporting
45 (93%) to 48: Excellent
Engage the client to determine steps to ensure fair,
33 (68%) to 44: Very Good
impartial reports
Establish appropriate editorial authority 24 (50%) to 32: Good
Determine right-to-know audiences 12 (25%) to 23: Fair
Establish & follow appropriate plans for releasing 0 (0%) to 11: Poor
findings to all right-to-know audiences
Safeguard reports from deliberate or inadvertent (Total score) ÷ 48 = x 100 =
distortions
Report perspectives of all stakeholder groups
Report alternative plausible conclusions
Obtain outside audits of reports
Describe steps taken to control bias

Foundational Models For 21 Century Program Evaluation: The Evaluation Center Occasional Papers Series

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Foundational Models For 21 Century Program Evaluation: The Evaluation Center Occasional Papers Series

Uploaded by

Copyright:

Available Formats

Foundational Models for 21st Century

The Evaluation Center

III QUESTIONS/METHODS-ORIENTED EVALUATION APPROACHES . . . . . . . . . . . . . 11

IV IMPROVEMENT/ACCOUNTABILITY-ORIENTED EVALUATION APPROACHES . . 41

V SOCIAL AGENDA-DIRECTED (ADVOCACY) APPROACHES . . . . . . . . . . . . . . . . . . . 53

VI Best Approaches for 21st Century Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Craig Russon, Ph.D.

noncreditable approaches, some authors of evaluation researchers identify, examine, and

writings on the transdiscipline of evaluation Pseudoevaluations

a mixture of qualitative and quantitative Social Agenda/Directed (Advocacy) Models

Of the 22 program evaluation approaches My analyses reflect 35 years of experience in

I acknowledge, without apology, that the

information on shortcomings and problems, Evaluators need to be cautious in how they

Before addressing the next group of study

Questions/methods-oriented program The methods used in objectives-based studies

pace implementation in advance of validation, both freedom to innovate on procedures and

Approach 7: Performance Testing determine scoring rubrics; define standards for

operations, staff, program organization, Nevertheless, given modern database

Authoritative information on the benefit-cost Approach 11: Clarification Hearing

evaluation, evaluators often seem to promise However, if a relevant, defensible theory of

The main pitfall in pursuing the mixed

The mixed methods approach to evaluation

Tables 1 through 6 summarize the similarities

Table 1: Comparison of the 13 Quasi-Evaluation Approaches on Most Common ADVANCE ORGANIZERS

Evaluation Approaches (by identification number)*

Formative & summative evaluation U

Qualitative & quantitative methods U

Employee roles & U U

Evaluator expertise & sensitivities U

* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments,

Table 2: Comparison of the 13 Quasi-Evaluation Approaches on Primary EVALUATION PURPOSES

Evaluation Approaches (by identification number)*

Determine whether program U U U U U

Provide constituents with an U U U U U

Assure that results are positive U

Assess learning gains U

Pinpoint responsibility for U U U

Compare students’ test scores U

Compare students’ test U U U

Examine achievement trends U U

Direction for program U U U U

Determine cause and effect U U

Inform management decisions U

Assess investments and payoffs U

Provide balanced information U

Explicate & illuminate a U U

Describe & critically appraise a U

Assess a program’s theoretical U

* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8.

Table 3: Comparison of the 13 Quasi-Evaluation Approaches on Characteristic EVALUATION QUESTIONS

Evaluation Approaches (by identification number)*

To w hat extent w as each program U U U

D id the program effectively discharge U U

D id tested performance meet or exceed U

D id tested performance meet or exceed U U

W here does a group’s tested U U

Is a group’s present performance better U U U

W hat sectors of a system are U

W here are the shortfalls in specific U

At w hat grade levels are the strengths U

W hat value is being added by U

To w hat extent can students effectively U

W hat are a program’s effects on U U