Professional Documents
Culture Documents
Program Evaluation
by
Daniel L. Stufflebeam
The Evaluation Center
Western Michigan University
December 1, 1999
Foundational Models for 21st Century
Program Evaluation 12
In moving to a new millennium, it is an opportune time for evaluators to critically appraise their
program evaluation approaches and decide which ones are most worthy of continued application
and further development. It is equally important to decide which approaches are best abandoned.
In this spirit, this paper identifies and assesses 22 approaches often employed to evaluate
programs. These approaches, in varying degrees, are unique and comprise most program
evaluation efforts. Two of the approaches, reflecting the political realities of evaluation, are often
used illegitimately to falsely characterize a program’s value and are labeled pseudoevaluations.
The remaining 20 approaches are typically used legitimately to judge programs and are divided
into questions/methods-oriented approaches, improvement/ accountability approaches, and social
agenda/advocacy approaches. The best program evaluation approaches appear to be Outcomes
Monitoring/Value-Added Assessment, Case Study, Decision/Accountability, Consumer-Oriented,
Client-Centered, Constructivist, and Utilization-Based, with the new Democratic Deliberative
approach showing promise. The worst bets seem to be Politically Controlled, Public Relations,
Accountability (especially payment by results), Clarification Hearings, and Program Theory-
Based. The rest fall somewhere in the middle. All legitimate approaches are enhanced when
keyed to and assessed against professional standards for evaluations.
1
This paper was prepared for The Evaluation Center’s Occasional Papers Series. It is based on a presentation
in the State of the Evaluation Art and Future Directions in Educational Program Evaluation Invited Symposium at
the annual meeting of the American Educational Research Association; Montreal, Quebec, Canada; April 20, 1999.
2
Appreciation is extended to colleagues who critiqued prior drafts of this paper, especially Sharon
Barbour, Jerry Horn, Tom Kellaghan, Gary Miron, Craig Russon, James Sanders, Sally Veeder, Bill W iersma, and
Lori W ingate. While their valuable assistance is acknowledged, the author is responsible for the paper’s contents
and especially any flaws.
ii
Table of Contents
I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Overview of the Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Evaluation Models and Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The Nature of Program Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Need to Study Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Classifications Of Alternative Evaluation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Program Evaluation Defined . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Pseudoevaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Questions/Methods-Oriented Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Improvement/Accountability-Oriented Evaluations . . . . . . . . . . . . . . . . . . . . . . . 4
Social Agenda-Directed (Advocacy) Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
II PSEUDOEVALUATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Approach 1: Public Relations-Inspired Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Approach 2: Politically Controlled Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
iii
Approach 21: Deliberative Democratic Evaluation . . . . . . . . . . . . . . . . . . . . . . 58
Approach 22. Utilization-Focused Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Checklist for Rating Evaluation Approaches in Relationship to
The Joint Committee Program Evaluation Standards . . . . . . . . . . . . . . . . . . . . 89
iv
Editor’s Note:
The Occasional Paper Series is published by The Evaluation Center on the campus of
Western Michigan University. Its purpose is to advance the theory and practice of
evaluation by reporting on new developments in the profession. Authors who contribute
to the series retain copyright or their work. This allows them to publish early drafts of a
paper, obtain feedback from readers, make necessary modifications, and go on to publish
in other venues.
In this volume of the Series, published on the eve of a new millennium, Daniel
Stufflebeam reviews the evaluation models that have emerged and identified the models
that offer the greatest prospects for future success. Few in the profession are better able to
do this than Stufflebeam. During his career, which spans nearly four decades, he has
developed nearly 100 standardized tests, authored the CIPP evaluation model, was the
first Chair of the Joint Committee on Standards for Educational Evaluation, and
pioneered the concept for metaevaluation.
The reader is invited to join the ranks of authors who have published in the Occasional
Paper Series including Donald Campbell, Gene Glass, Arnold Love, James Sanders,
Michael Scriven, Lori Shephard, Robert Stake, and Daniel Stufflebeam. Manuscripts
should be 50-100 pages in length and significant to the field of evaluation. All
submissions are reviewed for acceptability by the editorial team made up of the staff of
The Evaluation Center.
v
I. INTRODUCTION
Overview of the Paper heart of societal reforms in the U.S., and the
U.S. society has repeatedly pressed educators to
Evaluators today have at their disposal many show through evaluation whether or not
more evaluation approaches than in 1960. As improvement efforts were succeeding.
evaluators prepare to surmount the Y2K
challenges and cross into the next century, it is The development of program evaluation as a
an opportune time to consider what 20th century field of professional practice was also spurred
evaluation developments are best to take along by a number of seminal writings. These
and which ones would best be left behind. I included, in chronological order, publications
have, in this paper, attempted to sort 22 by Tyler (1942, 1950), Campbell and Stanley
alternative evaluation approaches into what (1963), Cronbach (1963), Stufflebeam (1966),
fishermen sometimes call the “keepers” and the Tyler (1966), Scriven (1967), Stake (1967),
“throwbacks.” More importantly, I have Stufflebeam (1967), Suchman (1967), Alkin
attempted to characterize each approach; (1969), Guba (1969), Provus (1969),
identify its strengths and weaknesses; and Stufflebeam et al. (1971), Parlett and Hamilton
consider whether, when, and how each approach (1972), Eisner (1975), Glass (1975), Cronbach
is best applied. The reviewed approaches and Associates (1980), House (1980), and
emerged mainly in the U.S. between 1960 and Patton (1980). These and other authors/scholars
1999. began to project alternative approaches to
program evaluation. In the ensuing years a rich
Following a period of relative inactivity in the literature on a wide variety of alternative
1950s, a succession of international and program evaluation approaches developed [see,
national forces stimulated the development of for example, Cronbach (1982); Guba and
evaluation theory and practice. Main influences Lincoln (1981, 1989); Nave, Misch, and
were the efforts to vastly strengthen the U. S. Mosteller (1999), Nevo (1993); Patton (1982,
defense system spawned by the Soviet Union’s 1990, 1994, 1997); Rossi and Freeman (1993);
1957 launching of Sputnik I; the new U.S. laws Schwandt (1984); Scriven (1991, 1993, 1994a,
in the 1960s to equitably serve persons with 1994b, 1994c); Shadish, Cook, and Leviton
disabilities and minorities; the federal (1991); Smith, M. F. (1989); Smith, N. L.
evaluation requirements of the Great Society (1987); Stake (1975b, 1988, 1995); Stufflebeam
programs initiated in 1965; the U.S. movement (1997); Stufflebeam and Shinkfield (1985);
begun in the 1970s to hold educational and Wholey, Hatry, and Newcomer (1995);
social organizations accountable for both Worthen and Sanders (1987, 1997)].
prudent use of resources and achievement of
objectives; the stress on excellence in the 1980s Evaluation Models and Approaches
as a means of increasing U.S. international
competitiveness; and the trend in the 1990s for The chapter uses the term evaluation approach
various organizations, both inside and outside rather than evaluation model because, for one
the U.S., to employ evaluation to assure quality, reason, the former is broad enough to cover
competitiveness, and equity in delivering illicit as well as laudatory practices. Also,
services. Education has consistently been at the beyond covering both creditable and
2 Stufflebeam
Caveats
Because this paper is focused on describing and official project a convincing, positive public
assessing the state of the art in evaluation, it is image for a program, project, process,
necessary to discuss bad and questionable organization, leadership, etc. The guiding
practices, as well as the best efforts. questions are derived from the public relations
Evaluations can be viewed as threatening or specialists’ and administrators’ conceptions of
approached in opportunistic ways. In such which questions would be most popular with
cases, evaluators and their clients are sometimes their constituents. In general, the public
tempted to shade, selectively release, or even relations study seeks information that would
falsify findings. While such efforts may look most help an organization confirm its claims of
like sound evaluations, they are judged in this excellence and secure public support. From the
analysis to be psuedoevaluations, if they do not start, this type of study seeks not a valid
forthrightly attempt to produce and report to all assessment of merit and worth but information
right-to-know audiences valid assessments of needed to help the program “put its best foot
merit and worth. The first type of forward.” Such studies avoid gathering or
psuedoevaluation considered—Public Relations releasing negative findings.
approach—may meet the standard for
addressing all right-to-know audiences but fails Typical methods used in public relations studies
as a legitimate evaluation approach, because are biased surveys, inappropriate use of norms
typically it presents a program’s strengths (or an tables, biased selection of testimonials and
exaggerated view of them) but not its anecdotes, “massaging” of obtained information,
weaknesses. The second psuedoevaluation selective release of only the positive findings,
approach—Politically Controlled cover-up of embarrassing incidents, and the use
evaluation—may be quite strong in obtaining of “expert,” advocate consultants. In contrast to
valid information but fail as a sound evaluation the “critical friends” employed in Australian
by either withholding information from right-to- evaluations, public relations studies use
know audiences or releasing only those parts “friendly critics.” A pervasive characteristic of
that are advantageous to the client. the public relations evaluator’s use of dubious
methods is a biased attempt to nurture a good
Approach 1: Public Relations-Inspired Studies picture for the program being evaluated. The
fatal flaw of built-in bias to report only good
The public relations approach begins with an things offsets any virtues of this approach. If an
intention to use data to convince constituents organization substitutes biased reporting of only
that a program is sound and effective. Other positive findings for balanced evaluations of
names for the approach are “ideological strengths and weaknesses, it soon will
marketing” (see Ferguson, June 1999), demoralize evaluators who are trying to conduct
advertising, and infomercial. and report valid evaluations and may discredit
its overall practice of evaluation.
The advance organizer is the propagandist’s
information needs. The study’s purpose is to By disseminating only positive information on
help the program director/public relations a program’s performance while withholding
8 Stufflebeam
under legal contractual agreements, can plan, “public’s right to know” law, this type of study
conduct, and report an evaluation for private can degenerate into a pseudoevaluation.
purposes, while not disclosing the findings to
any outside party. The key to keeping client- For obvious reasons, persons have not been
controlled studies in legitimate territory is to nominated to receive credit as pioneers or
reach appropriate, legally defensible, advance, developers of the illicit, politically controlled
written agreements and to adhere to the study. To avoid the inference that this type of
contractual provisions concerning release of the study is imaginary, consider the following
study’s findings. Such studies also have to examples.
conform to applicable laws on release of
information. A superintendent of one of the nation’s largest
public school districts once confided that he
The advance organizers for a politically possessed an extensive notebook of detailed
controlled study include implicit or explicit information about each school building in his
threats faced by the client for a program district. The information included student
evaluation and/or objectives for winning achievement, teacher qualifications, racial mix
political contests. The client’s purpose in of teachers and students, average per-pupil
commissioning such a study is to secure expenditure, socioeconomic characteristics of
assistance in acquiring, maintaining, or the student body, teachers’ average length of
increasing influence, power, and/or money. The tenure in the system, and so forth. The
questions addressed are those of interest to the aforementioned data revealed a highly
client and special groups that share the client’s segregated district with uneven distribution of
interests and aims. The main questions of resources and markedly different achievement
interest to the client are, What is the truth, as levels across schools. When asked why all the
best can be determined, surrounding the notebook’s entries were in pencil, the
particular dispute or political situation? What superintendent replied it was absolutely
information would be advantageous in a essential that he be kept informed about the
potential conflict situation? What data might be current situation in each school; but he said it
used advantageously in a confrontation? was also imperative that the community-at-
Typical methods of conducting the politically large, the board, and special interest groups in
controlled study include covert investigations, the community, in particular, not have access to
simulation studies, private polls, private the information, for any of these groups might
information files, and selective release of point to the district’s inequities as a basis for
findings. Generally, the client wants obtained protest and even removing the superintendent.
information to be as technically sound as Hence, one special assistant kept the document
possible. However, he or she may also want to up-to-date; only one copy existed, and the
withhold findings that do not support his or her superintendent kept that locked in his desk. The
position. The approach’s strength is that it point of this example is not to negatively judge
stresses the need for accurate information. the superintendent’s behavior. Instead, the
However, because the client might release superintendent’s ongoing covert investigation
information selectively to create or sustain an and selective release of information was
erroneous picture of a program’s merit and decidedly not a case of true evaluation, for what
worth, might distort or misrepresent the he disclosed to the right-to-know audiences did
findings, might violate a prior agreement to not fully and honestly inform them about the
fully release the findings, or might violate a observed situation in the district. This example
may appropriately be termed a pseudoevaluation
10 Stufflebeam
because it both underinformed and misinformed these kinds of potential conflicts. Otherwise,
the school district’s stakeholders. they will be unwitting accomplices in efforts to
mislead through evaluation.
Cases like this undoubtedly led to the federal
and state sunshine laws in the United States. Such instances of misleading constituents
Under current U.S. and state freedom of through purposely biased reports or cover-up of
information provisions, most information findings, to which the public has a right,
obtained through the use of public funds must underscore the importance of having
be made available to interested and potentially professional standards for evaluation work,
affected citizens. Thus, there exist legal faithfully applying them, and periodically
deterrents to and remedies for illicit, politically engaging outside evaluators to assess one’s
controlled evaluations that use public funds. evaluation work. It is also prudent to develop
advance contracts and memoranda of
While it would be unrealistic to recommend that agreements to ensure that the sponsor and
administrators and other evaluation users not evaluator agree on procedures and safeguards to
obtain and selectively employ information for assure that the evaluation will comply with
political gain, they should not misrepresent their canons of sound evaluation and pertinent legal
politically controlled information-gathering and requirements. Despite these warnings, it can be
reporting activities as sound evaluation. legitimate for evaluators to give private
Evaluators should not lend their names and evaluative feedback to clients, provided that
endorsements to evaluations presented by their applicable laws, statutes, and policies are met
clients that misrepresent the full set of relevant and sound contractual agreements on release of
findings, that present falsified reports aimed at findings are reached and honored.
winning political contests, or that violate
applicable laws and/or prior formal agreements
on release of findings.
studies lead to terminal information that is of groups; school boards; and local, state, and
little use in improving a program or other national funding organizations. The main
enterprise; that this information often is far too question that the groups want answered
narrow in scope to constitute a sufficient basis concerns whether each involved service
for judging the object’s merit and worth; provider and organization charged with
relatedly, that they do not uncover positive and responsibility for delivering and improving
negative side effects; and that they may credit services is carrying out its assignments and
unworthy objectives. achieving all it should, given the investments of
resources to support the work.
Approach 4: Accountability, Particularly
Payment By Results Studies A wide variety of methods have been used to
ensure and assess accountability. These include
The accountability study became prominent in performance contracting; Program Planning and
the early 1970s. Its emergence seems to have Budgeting System (PPBS); Management By
been connected to widespread disenchantment Objectives (MBO); Zero Based Budgeting;
with the persistent stream of evaluation reports mandated “program drivers” and indicators;
indicating that almost none of the massive state program input, process, output databases;
and federal investments in educational and independent goal achievement auditors;
social programs were making any positive, procedural compliance audits; peer review;
statistically discernable difference. One merit pay for individuals and/or organizations;
proposed solution posited that accountability collective bargaining agreements; mandated
systems could be initiated to ensure both that testing programs; institutional report cards; self-
service providers would carry out their studies; site visits by expert panels; and
responsibilities to improve services and that procedures for auditing the design, process, and
evaluators would do a thorough job of results of self-studies. Also included are
identifying the effects of improvement programs mandated goals and standards, decentralization
and determining which persons and groups were and careful definition of responsibility and
succeeding and which were not. authority, payment by results, awards and
recognition, sanctions, takeover/intervention
The advance organizers for the accountability authority by oversight bodies, and competitive
study are the persons and groups responsible for bidding.
producing results, the service providers’ work
responsibilities, and the expected outcomes. Lessinger (1970) is generally acknowledged as
The study’s purposes are to provide constituents a pioneer in the area of accountability. Some of
with an accurate accounting of results, to ensure the people who have extended Lessinger’s work
that the results are primarily positive, and to are Stenner and Webster, in their development
pinpoint responsibility for good and bad of a handbook for conducting auditing
outcomes. Sometimes accountability programs activities,5 and Kearney, in providing leadership
administer both sanctions and rewards to the to the Michigan Department of Education in
responsible service providers, depending on the developing the first statewide educational
extent and quality of their services and accountability system. A recent major attempt
achievement. at accountability, involving sanctions and
rewards, was the ill-fated, heavily-funded
The questions addressed in accountability Kentucky Instructional Results Information
studies come from the program’s constituents System (Koretz & Barron, 1998). The failure of
and controllers, such as taxpayers; parent this program was clearly associated with fast
Questions/Methods-Oriented Evaluation Approaches 13
conveniently available, many educators have programs, the American College Testing
tried to use the results to evaluate the quality of Program, the National Merit Scholarship
special projects and specific school programs by Testing Program, and the General Educational
inferring that high scores reflect successful Development Testing Program, as well as the
efforts and that low scores reflect poor efforts. Measurement Research Center at the University
Such inferences can be erroneous if the tests of Iowa. Many people have contributed
were not targeted on particular project or substantially to the development of educational
program objectives or the needs of particular testing in America, including Ebel (1965),
target groups of students and if the students’ Flanagan (1939), Lord and Novick (1968), and
background characteristics were not taken into Thorndike (1971). In the 1990s a number of
account. persons innovated in such areas of testing as
item response theory (Hambleton &
Advance organizers for standardized Swaminathan, 1985) and value-added
educational tests include areas of the school measurement (Sanders & Horn, 1994; Webster,
curriculum and specified norm groups. The 1995).
main purposes of testing programs are to
compare the test performance of individual Virtually all public schools in the U.S. engage
students and groups of students to those of in one or more forms of standardized, objective
selected norm groups and/or to diagnose achievement testing. If the school’s personnel
shortfalls related to particular objectives. carefully select such tests and use them
Additionally, standardized test results are often appropriately to assess and improve student
used to compare the performance of different learning and report to the public, the involved
programs, schools, etc., and to examine expense and effort is highly justified. However,
achievement trends across years. Metrics used they should be careful not to rely on these
to make the comparisons typically are results for evaluating specially targeted projects
standardized individual and mean scores for the and programs. Student outcome measures for
total test and subtests. judging specific projects and programs must be
validated in terms of the particular objectives
The sources of questions addressed by testing and the characteristics and needs of the students
programs are usually test publishers and test being served by the program.
development/selection committees. The typical
question addressed by these tests concerns The main advantages of standardized-testing
whether the test performance of individual programs are that they are efficient in producing
students is at or above the average performance valid and reliable information on student
of local, state, and national norm groups. Other performance in many areas of the school
questions may concern the percentages of curriculum and that they are a familiar strategy
students who surpassed one or more cut-score at every level of the school program in virtually
standards, where the group of students ranks in all school districts in the United States. The
comparison with other groups, or whether the main limitations are that they provide data only
current year’s achievement is better than in prior about student outcomes; they reinforce students’
years. The main process involved in using multiple-choice test-taking behavior rather than
testing programs is to select, administer, score, their writing and speaking behaviors; they tend
interpret, and report the tests. to address only lower-order learning objectives;
and, in many cases, they are perhaps a better
Lindquist (1951), a major pioneer in this area, indicator of the socioeconomic levels of the
was instrumental in developing the Iowa testing students in a given program, school, or school
Questions/Methods-Oriented Evaluation Approaches 15
district than of the quality of the implicated between classes, curricular areas, grade levels,
teaching and learning. Stake (1971) and others teachers, schools, different size and resource
have argued effectively that standardized tests classifications of schools, districts, and different
often are poor approximations of what teachers areas of a state. This approach differs from the
actually teach. Moreover, as has been patently typical standardized achievement testing
clear in evaluations of programs for both program in its emphasis on uncovering and
disadvantaged students and gifted students, analyzing policy issues rather than only
norm-referenced tests often do not measure reporting on students’ progress. Otherwise, the
achievements well for the low and high scoring two approaches have much in common.
students. Unfortunately, program evaluators
often have made uncritical uses of standardized The advance organizers in monitoring
test results to judge a program’s outcomes, just outcomes and employing value-added analysis
because the results are conveniently available are the indicators of expected and possible
and have face validity to the public. Many outcomes and the scheme for classifying results
times the contents of such tests do not match the to examine policy issues and/or program effects.
program’s objectives. Also, they may measure The purposes of Outcomes Monitoring/Value-
well the differences between students in the Added Assessment systems are direction for
middle of the achievement distribution but policymaking, accountability to constituents,
poorly for the slow learners often targeted by and feedback for improving programs and
special education programs and high achievers. services. This approach also ensures
standardization of data for assessment and
Approach 6: Outcomes Monitoring/Value- improvement throughout a system. The source
Added Assessments of questions to be addressed by such
monitoring systems originate from funding
Recurrent outcomes/value-added assessment is organizations, policymakers, the system’s
a special case of the use of standardized testing professionals, and constituents.
to evaluate the effects of programs and policies.
The emphasis here is on annual testing in order Illustrative questions addressed by Outcomes
to assess trends and partial out effects of the Monitoring/Value-Added Assessment systems
different levels and components of an are To what extent are particular programs
educational system. Characteristic of this adding value to students’ achievement? What
approach is the cyclical collection of outcome are the cross-year trends in outcomes? In what
measures based on standardized indicators, sectors of the system is the program working
analysis of results in relation to policy best and poorest? What are key, pervasive
questions, and reporting of overall results plus shortfalls in particular program objectives that
specific policy-relevant analyses. The main require further study and attention? To what
interest is in an aggregate, not individual extent are program successes and failures
performance. A state education department associated with the system’s different
may regularly collect achievement data from all organizational levels?
students (at selected grade levels), as is the case
in the Tennessee Value-Added Assessment Developers of the Outcomes Monitoring/Value-
System. The evaluator may analyze the data to Added Assessment approach include especially
look at contrasting results related to particular William Sanders and Sandra Horn (1994);
objectives for schools using and not using William Webster (1995); Webster, Mendro, and
particular programs. These results may be Almaguer (1994); and Peter Tymms (1995).
further broken out to make comparisons These developers have used census data on
16 Stufflebeam
student achievement trends to diagnose areas for removed from the static, cut score standards.
improvement and look for effects of programs Also, Sanders and Horn (1994) have shown that
and policies. What distinguishes the Outcomes use of static cut scores may produce a “shed
Monitoring/Value-Added Assessment approach pattern,” in which students who began below
from the traditional standardized testing the cut score make the greatest gains while those
program is sophisticated analysis of data to who started above the cut score standard make
partial out effects of programs and policies and little progress. Like the sloping roof of a tool
to identify areas where new policies and shed, the gains are greatest for previously low
programs are needed. In contrast to these scoring students and progressively lower for the
applications, the typical standardized testing higher achievers. This suggests that teachers are
program is focused more on providing feedback concentrating mainly on getting students to the
on the performance of individual students and cut score standard but not beyond it and thus
groups of students, without the attendant policy- “holding back the high achievers.” This
oriented analysis. Probably the Outcomes approach makes efficient use of standardized
Monitoring/Value-Added Assessment approach tests; is amenable to analysis of trends at state,
is mainly feasible for well-endowed state district, school, and classroom levels; uses
education departments and large school districts students as their own controls; and emphasizes
where there is strong support from policy service to every student.
groups, administrators, and service providers to
make the approach work. It requires A major disadvantage of this approach is that it
systemwide buy-in; politically effective leaders is politically volatile, since it is used to identify
to continually explain and sell the program; a responsibility for successes and failures down to
smoothly operating, dynamic, computerized the levels of schools and teachers. Also, it is
baseline of relevant input and output constrained mainly to use quantitative
information; highly skilled technicians to make information such as that coming from
it run efficiently and accurately; complicated standardized, multiple choice achievement tests.
statistical analysis; and high-level commitment Consequently, the complex and powerful
to use the results for purposes of policy analyses are based on a limited scope of
development , accountabi lity, program outcome variables. Nevertheless, Sanders
evaluation, and improvement at all levels of the (1989) has argued that a strong body of
system. evidence supports the use of well-constructed,
standardized, multiple choice achievement tests.
The central advantage of Outcomes Beyond the issue of outcome measures, the
Monitoring/Value-Added Assessment is in the approach does not provide in-depth
systematization and institutionalization of a documentation of program inputs and processes
database of outcomes that can be used over time and makes little if any use of qualitative
and in a standardized way to study and find methods. Despite the advancements in
means to improve outcomes. Also, Outcomes objective measurement and the employment of
Monitoring/Value-Added Assessment is hierarchical mixed models to defensibly partial
conducive to using a standard of continuous out effects of a system’s organizational
progress across years for every student as components and individual staff members,
opposed to employing static cut scores. The critics of the approach argue that causal factors
latter, while prevalent in accountability are so complex that no measurement and
programs, basically fail to take into account analysis system can fairly fix responsibility to
meaningful gains by low or high achieving the level of teachers for the academic progress
students, since these gains usually are far of individual and collections of students.
Questions/Methods-Oriented Evaluation Approaches 17
Major disadvantages of the approach are heavy Kaplan’s (1964) famous warning against the so-
time requirements for administration; high costs called “law of the instrument,” whereby a given
of scoring; difficulty in achieving reliable method is equated to a field of inquiry. In such
scores; narrow scope of skills that can feasibly a case, the field of inquiry is restricted to the
be assessed; and lack of norms for questions that are answerable by the given
comparisons, especially at the national level. In method. Fisher (1951) specifically warned
general, performance tests are inefficient, costly, against equating his experimental methods with
and often of dubious reliability. Moreover, science. Similarly, experimental design is a
compared with multiple choice tests, method that can contribute importantly to
performance tests, in the same amount of testing program evaluation, as Nave, Misch, and
time, can cover only a much narrower range of Mosteller (1999) have demonstrated, but by
questions. itself it is often insufficient to address a client’s
full range of evaluation questions.
Approach 8: Experimental Studies
The advance organizers in experimental studies
In using controlled experiments, program are problem statements, competing treatments,
evaluators randomly assign subjects or groups hypotheses, investigatory questions, and
of subjects to experimental and control groups randomized treatment and comparison groups.
and then contrast the outcomes when the The usual purpose of the controlled experiment
experimental group receives a particular is to determine causal relationships between
intervention and the control group receives no specified independent and dependent variables,
special treatment or some different treatment. such as a given instructional method and student
This type of study was quite prominent in standardized-test performance. It is particularly
program evaluation during the late 1960s and noteworthy that the sources of questions
early 1970s, when there was a federal investigated in the experimental study are
requirement to assess the effectiveness of researchers, program developers, and policy
federally funded innovations. However, figures, and not usually a program’s constituents
e x p e r i me n t al p r o g r a m e va l u a t i o n s and practitioners.
subsequently fell into disfavor and disuse. (In
the 1990s, controlled experiments in education The frequent question in the experimental study
h a ve b e e n r a r e [ Na ve , M i s c h , & is, What are the effects of a given intervention
Mosteller,1999].) Apparent reasons for this on specified outcome variables? Typical
decline are that evaluators rarely can meet the methods used are experimental and quasi-
required experimental conditions and experimental designs. Pioneers in using
assumptions and the prevalent finding has been experimentation to evaluate programs are
“no statistically significant result.” Campbell and Stanley (1963), Cronbach and
Snow (1969), and Lindquist (1953). Other
This approach is labeled as a questions-oriented persons who have developed the methodology
or quasi-evaluation strategy because it starts of experimentation substantially for program
with questions and methodology that may evaluation are Boruch (1994); Glass and
address only a narrow set of the questions Maguire (1968); Nave, Misch, and Mosteller
needed to assess a program’s merit and worth. (1999); Suchman (1967); and Wiley and Bock
In the 1960s, Campbell and Stanley (1963) and (1967).
others hailed the true experiment as the only
sound means of evaluating interventions. This Evaluators should consider conducting a
piece of evaluation history reminds one of controlled experiment only when its required
Questions/Methods-Oriented Evaluation Approaches 19
conditions and assumptions can be met. Often information than schools or other organizations
this requires substantial political influence, often need to assess and strengthen their
substantial funding, and widespread programs. On this point, experimental studies
agreement–e.g., among the targeted educators, tend to provide terminal information that is not
parents, and teachers—to submit to the useful for guiding the development and
requirements of the experiment. Such improvement of programs and in fact need to
requirements typically include, among others, a thwart ongoing modifications of the treatments.
stabilized program that will not have to be studied
and modified during the evaluation; the ability to Approach 9: Management Information Systems
establish and sustain comparable program and
control groups; the ability to keep the program The management information system is like the
and control conditions separate and politically controlled approaches, except that it
uncontaminated; and the ability to obtain the supplies managers with the information they need
needed criterion measures from all or at least a to conduct and report on their programs, as
representative group of the members of the opposed to supplying them with the information
program and comparison groups. Evaluability they need to win a political advantage. The
assessment was developed as a particular management information approach is also like the
methodology for determining the feasibility of decision/accountability-oriented approach, which
moving ahead with an experiment (Smith, 1989; will be discussed later, except that the
Wholey, 1995). decision/accountability-oriented approach
provides information needed to both develop
Controlled experiments have a number of and defend a program’s merit and worth, which
advantages. They focus on results and not just goes beyond providing information that managers
intentions or judgments. They provide strong need to implement and report on their
methods for establishing relatively unequivocal management responsibilities.
causal relationships between treatment and
outcome variables; this ability can be especially The advance organizers in most management
significant when program effects are small but information systems include program
important. Moreover, because of the prevalent objectives, specified activities, and projected
use and success of experiments in such fields as program milestones or events. A management
medicine and agriculture, the approach has information system’s purpose, as already implied,
widespread credibility. is to continuously supply managers
with the information they need to plan, direct,
The above advantages are offset by serious control, and report on their programs or spheres
objections to experimenting on school students of responsibility.
and other subjects. It is often considered
unethical or even illegal to deprive the control The sources of questions addressed are the
group of the benefits of special funds for management personnel and their superiors. The
improving services. Likewise, many parents main questions they typically want answered
don’t want schools to experiment on their are, Are program activities being implemented
children by applying unproven interventions. according to schedule, according to budget, and
Typically, schools find it impractical and with the expected results? To provide ready
unreasonable to randomly assign students to access to information for addressing such
treatments and to hold treatments constant questions, these systems regularly store and make
throughout the study period. Also, experimental accessible up-to-date information on the
studies provide a much narrower range of program’s goals, planned operations, actual
20 Stufflebeam
of procedures: (1) cost analysis of program these to similar ratios for competing programs.
inputs, (2) cost-effectiveness analysis, and (3) Ultimately, benefit-cost studies seek conclusions
benefit-cost analysis. These may be looked at about the comparative benefits and costs of the
as a hierarchy. The first type, cost analysis of examined programs.
program inputs, may be done by itself. Such
analyses entail an ongoing accumulation of a Advance organizers for the overall benefit-cost
program’s financial history. These analyses are approach are associated with cost breakdowns
of use in controlling program delivery and for both program inputs and program outputs.
expenditures. The program’s financial history Program input costs may be delineated by line
can be used to compare the program’s actual items (e.g., personnel, travel, materials,
costs to the projected costs in the original equipment, communications, facilities,
budget and to the costs of similar programs. contracted services, overhead, etc.), by program
Also, cost analyses can be extremely valuable to components, by year, etc. In cost-effectiveness
outsiders who might be interested in replicating analysis, a program’s costs are examined in
the program. relation to each program objective, and these must
be clearly defined and assessed. The more
Cost-effectiveness analysis necessarily includes ambitious benefit-cost analyses look at costs
cost analysis of program inputs to determine the associated with main effects and side effects,
cost associated with the progress toward tangible and intangible outcomes, positive and
achieving each objective. Such analyses might negative outcomes, and short-term and long-
compare two or more programs’ costs and term outcomes—both inside and outside the
successes in achieving the same objectives. A program. Frequently, they also may break
program could be judged superior on cost- down costs by individuals and groups of
effectiveness grounds if it had the same costs beneficiaries. One may also estimate the costs
but superior outcomes as similar programs. Or of foregone opportunities and, sometimes,
the program could still be judged superior on political costs. Even then, the real value of
cost-effectiveness grounds if it achieved the benefits associated with human creativity or
same objectives as more expensive programs. self-actualization are nearly impossible to
Cost-effectiveness analyses do not require estimate. Consequently, the benefit-cost
conversion of outcomes to monetary terms but equation rests on dubious assumptions and
must be keyed to clear, measurable program uncertain realities.
objectives.
The purposes of these three levels of benefit-
Benefit-cost analyses typically build on a cost cost analysis are to gain clear knowledge of
analysis of program inputs and a cost- what resources were invested, how they were
effectiveness analysis. But the benefit-cost invested, and with what effect. In popular
analysis goes further. It seeks to identify a vernacular, cost-effectiveness and benefit-cost
broader range of outcomes than just those analyses seek to determine the program’s “bang
associated with program objectives. It for the buck.” There is great interest in
examines the relationship between the answering this type of question. Policy boards,
investment in a program and the extent of positive program planners, and taxpayers are especially
and negative impacts on the program’s interested to know whether program investments
environment. In doing so, it ascertains and are paying off in positive results that exceed or
places a monetary value on program inputs and are at least as good as those produced by similar
each identified outcome. It identifies a programs.
program’s benefit-cost ratios and compares
22 Stufflebeam
Based on the past uses of this approach, it can The main thrust of the case study approach is to
be judged as only marginally relevant to delineate and illuminate a program, not
program evaluation. By its adversarial nature, necessarily to guide its development and to
the approach prods the evaluators to present assess and judge its merit and worth. Hence,
biased arguments in order to win their cases. this paper characterizes the case study approach
The approach subordinates truth seeking to as a questions/methods-oriented approach rather
winning. Accuracy suffers in this process. The than an improvement/ accountability approach.
most effective debaters are likely to convince
the jury of their position even when it is poorly The advance organizers in case studies include
founded. Also, the approach is politically the definition of the program, characterization of
problematic, since it generates considerable its geographic and organizational environment,
acrimony. Despite the attractiveness of using the historical period in which it is to be
the law as a metaphor for program evaluation, examined, the program’s beneficiaries and their
with the law’s attendant rules of evidence, the assessed needs, the program’s underlying logic
promise of this application has not been of operation and productivity, and the key roles
fulfilled. There are few occasions in which it involved in the program. A case study program
makes practical sense for evaluators to apply evaluation’s main purpose is to provide
this approach. stakeholders and their audiences with an
authoritative, in-depth, well-documented
Approach 12: Case Study Evaluations explication of the program.
A case-study-based program evaluation is a The case study should be keyed to the questions
focused, in-depth description, analysis, and of most interest to the evaluation’s main
synthesis of a particular program or other audiences. The evaluator must therefore
object. The investigators do not control the identify and interact with the program’s
program in any way. Instead, they look at it as stakeholders. Along the way stakeholders will
it is occurring or as it occurred in the past. The be engaged in helping to plan the study and
study looks at the program in its geographic, interpret findings. Ideally, the audiences
cultural, organizational, and historical contexts. include the program’s oversight body,
It closely examines the program’s internal administrators, staff, financial sponsors,
operations and how it uses inputs and processes beneficiaries, and potential adopters of the
to produce outcomes. It examines a wide range program.
of intended and unexpected outcomes. It looks
at the program’s multiple levels and also Typical questions posed by some or all of the
holistically at the overall program. It above audiences are, What is the program in
characterizes both central, dominant themes and concept and practice? How has it evolved over
variations and aberrations. It defines and time? How does it actually operate to produce
describes the program’s intended and actual outcomes? What has it produced? What are the
beneficiaries. It examines beneficiaries’ needs shortfalls and negative side effects? What are the
and to what extent the program effectively positive side effects? In what ways and to
addressed the needs. It employs multiple what degrees do various stakeholders value the
methods to obtain and integrate multiple sources program? To what extent did the program
of information. While it breaks apart and effectively meet beneficiaries’ needs? What
analyzes a program along various dimensions, it were the most important reasons for the
also provides an overall characterization of the program’s successes and failures? What are the
program. program’s most important unresolved issues?
24 Stufflebeam
How much has it cost? What are the costs per accuracy issues by employing and triangulating
beneficiary, per year, etc.? What parts of the multiple perspectives, methods, and information
program have been successfully transported to sources. It employs all relevant methods and
other sites? How does this program compare information sources. It looks at programs
with what might be called critical competitors? within relevant contexts and describes
The above questions only illustrate the range of contextual influences on the program. It looks
questions that a case study might address, since at programs holistically and in depth. It
each case study will be tempered by the examines the program’s internal workings and
interests of the client and other audiences for the how it produces outcomes. It includes clear
study and the evaluator’s interests. procedures for analyzing qualitative information.
It can be tailored to focus on the audience’s
To conduct effective case studies, evaluators most important questions. It can be done
need to employ a wide range of qualitative and retrospectively or in real time. It can be
quantitative methods. These may include reported to meet given deadlines and
analysis of archives; collection of artifacts, such subsequently updated based on further
as work samples; content analysis of program developments.
documents; both independent and participant
observations; interviews; logical analysis of The main limitation of the approach is that
operations; focus groups; tests; questionnaires; some evaluators may mistake its openness and
rating scales; hearings; forums; and lack of controls as an excuse for approaching it
maintenance of a program database. Reports haphazardly and bypassing steps to assure that
may incorporate in-depth descriptions and findings and interpretations possess rigor as well
accounts of key historical trends; focus on as relevance. Also, because of a preoccupation
critical incidents, photographs, maps, testimony, with descriptive information, the case study
relevant news clippings, logic models, and evaluator may not collect sufficient judgmental
cross-break tables; and summarize main information to permit a broad-based assessment
conclusions. The case study report may include of a program’s merit and worth. Users of this
papers on key dimensions of the case, as approach might slight quantitative analysis in
determined with the audience, as well as an favor of qualitative analysis. By trying to
overall holistic presentation and assessment. produce a comprehensive description of a
Case study reports may involve audio and visual program, the case study evaluator may not
media as well as printed documents. produce timely feedback needed to help in
program development. To overcome these
Case study methods have existed for many potential pitfalls, evaluators using the case study
years and have been applied in such areas as approach should fully address the principles of
clinical psychology, law, the medical sound evaluation as related to accuracy, utility,
profession, and social work. Pioneers in feasibility, and propriety.
applying the method to program evaluation
include Campbell (1975), Lincoln and Guba Approach 13: Criticism and Connoisseurship
(1985), Platt (1992), Stake (1995), and Yin
(1992). The connoisseur-based approach was developed
pursuant to the methods of art criticism and
The case study approach is highly conducive to literary criticism. This approach assumes that
program evaluation. It requires no controls of certain experts in a given substantive area are
treatments and subjects and looks at programs as capable of in-depth analysis and evaluation that
they naturally occur and evolve. It addresses could not be done in other ways. Just as a
Questions/Methods-Oriented Evaluation Approaches 25
national survey of wine drinkers could produce accept and use any evaluation that Dr. Elliott
information concerning their overall preferences Eisner agreed to present, but there are not many
for types of wines and particular vineyards, it Eisners out there.
would not provide the detailed, creditable
judgments of the qualities of particular wines The main advantage of the connoisseur-based
that might be derived from a single connoisseur study is that it exploits the particular expertise
who has devoted a professional lifetime to the and finely developed insights of persons who
study and grading of wines and whose have devoted much time and effort to the study
judgments are highly and widely respected. of a precise area. They can provide an array of
detailed information that the audience can then
The advance organizer for the connoisseur-based use to form a more insightful analysis than
study is the evaluator’s special expertise and otherwise might be possible. The approach’s
sensitivities. The study’s purpose is to describe, disadvantage is that it is dependent on the
critically appraise, and illuminate a particular expertise and qualifications of the particular
program’s merits. The evaluation questions expert doing the program evaluation, leaving
addressed by the connoisseur-based evaluation room for much subjectivity.
are determined by expert evaluators—the critics
and authorities who have undertaken the Approach 14: Program Theory-Based Evaluation
evaluation. Among the major questions they
can be expected to ask are, What are the Program evaluations based on program theory
program’s essence and salient characteristics? begin with either (1) a well-developed and
What merits and demerits distinguish the validated theory of how programs of a certain
particular program from others of the same type within similar settings operate to produce
general kind? outcomes or (2) an initial stage to
approximate such a theory within the context
The methodology of connoisseurship includes of a particular program evaluation. The
the critics’ systematic use of their perceptual former of these conditions is much more
sensitivities, past experiences, refined insights, reflective of the implicit promises in a theory-
and abilities to communicate their assessments. based program evaluation, since the existence
The evaluator’s judgments are conveyed in of a sound theory means that a substantial
vivid terms to help the audience appreciate and body of theoretical development has produced
understand all of the program’s nuances. and tested a coherent set of conceptual,
hypothetical, and pragmatic principles, plus
Eisner (1975, 1983) has pioneered this strategy associated instruments to guide inquiry in the
in education.6 A dozen or more of Eisner’s particular area. Then, the theory can aid a
students have conducted research and program evaluator to decide what questions,
development on the connoisseurship approach, indicators, and assumed linkages between and
e.g., Vallance (1973) and Flinders and Eisner among program elements should be used to
(1994). evaluate a program covered by the theory.
This approach obviously depends on the Some well-developed theories for use in
qualifications of the particular expert chosen to evaluations exist, which gives this approach
do the program evaluation. The approach also some measure of viability. For example,
requires an audience that has confidence in and health education/behavior change programs
is willing to accept and use the connoisseur’s are sometimes founded on validated
report. The author of this paper would willingly theoretical frameworks, such as the Health
26 Stufflebeam
Belief Model (Becker, 1974; Mullen, Hersey, questions include, Is the program grounded in
& Iverson, 1987; Janz & Becker, 1984). an appropriate, well-articulated, and validated
Other examples are the PRECEDE- theory? Is the employed theory up to date and
PROCEED Model for health promotion reflective of recent research? Are the
planning and evaluation (Green & Kreuter, program’s targeted beneficiaries, design,
1991), Bandura’s (1977) Social Cognitive operation, and intended outcomes consistent
Theory, the Stages of Change Theory by with the guiding theory? How well does the
Prochaska and DiClemente (1992), and Peters program address and serve the full range of
and Waterman’s (1982) theory of successful pertinent needs of the targeted beneficiaries?
organizations. When such frameworks exist, If the program is consistent with the guiding
their use probably can enhance a program’s theory, are the expected results being
effectiveness and provide a structure for achieved? Are program inputs and operations
validly evaluating the program’s functioning. producing outcomes in the ways the theory
Unfortunately, however, few program areas predicts? What changes in the program’s
are buttressed by well-articulated and tested design or implementation might produce
theories. better outcomes? What elements of the
program are essential for successful
Thus, most theory-based evaluations begin by replication? Overall, was the program
setting out to develop a theory that theoretically sound, did it operate in
appropriately could be used to guide the accordance with an appropriate theory, did it
particular program evaluation. As will be produce the expected outcomes, were the
discussed later in this characterization, such hypothesized causal linkages confirmed, is the
ad hoc theory development efforts and their program worthy of continuation and/or
linkage to program evaluations are dissemination, and what program features are
problematic. In any case, let us look at what essential for successful replication?
the theory-based evaluator attempts to
achieve. The nature of these questions suggests that the
success of the theory-based approach is
The point of the theory development or dependent on a foundation of sound theory
selection effort is to identify advance development and validation. This, of course,
organi zer s t o guide the evaluation. entails sound conceptualization of at least a
Essentially, these are the mechanisms by context-dependent theory, formulation and
which program activities are understood to rigorous testing of hypotheses derived from
produce or contribute to program outcomes, the theory, development of guidelines for
along with the appropriate description of practical implementation of the theory based
context, specification of independent and on extensive field trials, and independent
dependent variables, and portrayal of key assessment of the theory. Unfortunately, not
linkages. The main purposes of the theory- many program areas in education and the
based program evaluation are to determine the social sciences are grounded in sound
extent to which the program of interest is theories. Moreover, evaluators wanting to
theoretically sound, to understand why it is employ a theory-based evaluation often find it
succeeding or failing, and to provide direction infeasible to conduct the full range of theory
for program improvement. development and validation steps and still to
get the evaluation done on time. Thus, in
Questions for the program evaluation are derived claiming to conduct a theory-based
from the guiding theory. Example
Questions/Methods-Oriented Evaluation Approaches 27
Pioneers in applying theory development Overall, there really isn’t much to recommend
procedures to program evaluation include theory-based program evaluation, since doing
Glaser and Strauss (1967) and Weiss (1972, it right is usually not feasible and since failed
1995). Other developers of the approach are or misrepresented attempts can be highly
Bickman (1990), Chen (1990), and Rogers (in counterproductive. Nevertheless, modest
press). attempts to model programs—labeled as
such—can be useful for identifying
In any program evaluation assignment, it is measurement variables, so long as the
reasonable for the evaluator to examine the evaluator doesn’t spend too much time on this
extent to which program plans and operations and so long as the model is not considered as
are grounded in an appropriate theory or fixed or as a validated theory. Also, in the
model. Also, it can be useful to engage in a rare case where an appropriate theory already
modicum of effort to network the program exists, the evaluator can make beneficial use
and thereby seek out key variables and of the theory to help structure and guide the
linkages. As noted previously, in the enviable evaluation and interpret the findings.
but rare situation where a relevant, validated
theory exists, the evaluator can beneficially
apply it in structuring the evaluation and
analyzing findings.
28 Stufflebeam
Approach 15: Mixed Methods Studies basically look at whether objectives were
achieved, but may look for a broader array of
In an attempt to resolve the longstanding outcomes. Qualitative and quantitative
debate about whether program evaluations methods are employed in combination to
should employ quantitative or qualitative assure depth, scope, and dependability of
methods, some authors have proposed that findings. This approach also applies to
evaluators should regularly combine these carefully selected single programs or to
methods in given program evaluations (for comparisons of alternative programs.
example, see t he National Science
Foundation’s 1997 User-Friendly Handbook The basic purposes of the mixed method
for Mixed Method Evaluations). Such approach are to provide direction for
recommendations, along with practical improving programs as they are evolving and
guidelines and illustrations, are no doubt to assess their effectiveness after they have
useful to many program staff members and to had time to produce results. Use of both
evaluators. But in the main, the quantitative and qualitative methods is
recommendation for a mixed method intended to assure dependable feedback on a
approach only highlights a large body of wide range of questions; depth of
longstanding practice of mixed-methods understanding of particular programs; a
program evaluation rather than proposing a holistic perspective; and enhancement of the
new approach. All seven approaches validity, reliability, and usefulness of the full
discussed in the remainder of this section of set of findings. Investigators look to
the paper employ both qualitative and quantitative methods for standardized,
quantitative methods. What sets them apart replicable findings on large data sets. They
from the mixed method approach is that their look to qualitative methods for elucidation of
first considerations are not the methods to be the program’s cultural context, dynamics,
employed but either the assessment of value meaningful patterns and themes, deviant
or the social mission to be served. The mixed cases, diverse impacts on individuals as well
methods approach is included in this section as groups, etc. Qualitative reporting methods
on questions/methods approaches, because it are applied to bring the findings to life,
is preoccupied with using multiple methods making them clear, persuasive, and
rather than using whatever methods are interesting. By using both quantitative and
needed to comprehensively assess a qualitative methods, the evaluator secures
program’s merit and worth. As with the other cross-checks on different subsets of findings
approaches in this section, the mixed methods and thereby instills greater stakeholder
approach may or may not fully assess a confidence in the overall findings.
program’s value; thus, it is classified as a
quasi-evaluation approach. The sources of evaluation questions are the
program’s goals, plans, and stakeholders. The
The advance organizers of the mixed methods stakeholders often include skeptical as well as
approach are formative and summative supportive audiences. Among the important
evaluations, qualitative and quantitative stakeholders are program administrators and
methods, and intra-case or cross-case analysis. staff, policy boards, financial sponsors,
Formative evaluations are employed to beneficiaries, taxpayers, and program area
examine a program’s development and assist experts.
in improving its structure and
implementation. Summative evaluations
Questions/Methods-Oriented Evaluation Approaches 29
The approach may pursue a wide range of presentations, and workshops. They should
questions. Examples of formative evaluation include a balance of narrative and numerical
questions are information.
• To what extent do program activities follow Considering his book on service studies in
the program plan, time line, and higher education, Ralph Tyler (Tyler et al.,
budget? 1932) was certainly a pioneer in the mixed
• To what extent is the program achieving method approach to program evaluation.
its goals? Other authors who have written cogently on
• W h a t p r o b l e m s i n d e s i gn o r the mixed methods approach are Guba and
implementation need to be addressed? Lincoln (1981), Kidder and Fine (1987),
Lincoln and Guba (1985), Miron (1998),
Examples of summative evaluation questions Patton (1990), and Schatzman and Strauss
are (1973).
• To what extent did the program achieve Basically, it is almost always appropriate to
its goals? consider using a mixed methods approach.
• Was the program appropriately effective Certainly, the evaluator should take advantage
for all beneficiaries? of opportunities to obtain any and all
• What interesting stories emerged? potentially available information that is
• What are program stakeholders’ relevant to assessing a program’s merit and
judgments of program operations, worth. Sometimes a study can be mainly or
processes, and outcomes? only qualitative or quantitative, but usually
• What were the important side effects? such studies would be strengthened by
• Is the program sustainable and including both types of information. The key
transportable? point is to choose methods because they can
effectively address the study’s questions, not
The approach employs a wide range of because they are either qualitative or
methods. Among the quantitative methods quantitative.
employed are surveys using representative
samples, both cohort and cross-sectional Key advantages of using both qualitative and
samples, norm-referenced tests, rating scales, qu a nt i t a t i ve methods are that they
quasi experiments, significance tests for main complement each other in ways that are
effects, and a posteriori statistical tests. The important to the evaluation’s audiences.
qualitative methods may include ethnography, Information from quantitative methods tends
document analysis, narrative analysis, to be standardized, efficient, amenable to
purposive samples, single cases, participant standard tests of reliability, easily summarized
observers, independent observers, key and analyzed, and accepted as “hard” data.
informants, advisory committees, structured Information from qualitative approaches adds
and unstructured interviews, focus groups, depth; can be delivered in interesting, story-
case studies, study of outliers, diaries, logic like presentations; and provides a means to
models, grounded theory development, flow explore and understand the more superficial
charts, decision trees, matrices, and quantitative findings. Using both types of
performance assessments. Reports may methods affords important cross-checks on
include abstracts, executive summaries, full findings.
r e p o r t s , o r a l b r i e f i n gs , c o n f e r e n c e
30 Stufflebeam
Program content/definition U U
Program rationale U
Context U
Treatments U
Time period U
Beneficiaries U
Comparison groups U
Norm groups U
Assessed needs U
Problem statements U
Objectives U U U U
Independent/dependent U U
Indicators/criteria U U
Life skills U
Performance tasks U
Questions/hypotheses/ U U
causal factors
Policy issues U
Tests in use U U
Program activities/milestones U
Costs U
Intra-case/cross-case analysis U
Diagnose program U U U U U
shortcomings
Compare performance of U U U U U
competing programs
Inform policymaking U U U U
Ensure standardization of U U
outcome measures
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management
information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-
based, 15. Mixed methods.
Questions/Methods-Oriented Evaluation Approaches 35
O perational objectives U U
C riterion-referenced test U U U U
Performance contracting U
M anagement by objectives U U U
Peer review U
Trial proceedings U
M andated testing U U
Self-studies U
Program audits U
Standardized testing U U U U
Performance measures U U U
Policy analysis U
Study of outliers U U U
System analysis U
Analysis of archives U U
C ollection of artifacts U U
Log diaries U
C ontent analysis U U
Key informants U U
Advisory committees U
Interview s U U
O perational analysis U
Focus group U U
Q uestionnaires U U
R ating scales U U
In-depth descriptions U
Photographs U
C ritical incidents U
Testimony U U U
Flow charts U
D ecision trees U
Logic models U U U
G rounded theory U U
N ew s clippings analysis U U
C ross-break tables U U U U
Expert critics U U U U
* 3. Objectives-based, 4. Accountability, 5. Objective testing, 6. Outcomes monitoring, 7. Performance testing, 8. Experiments, 9. Management
information systems, 10. Benefit-cost analysis, 11. Clarification hearing, 12. Case study, 13. Criticism & connoisseurship, 14. Program theory-
based, 15. Mixed methods.
Questions/Methods-Oriented Evaluation Approaches 37
Eliminates guessing U
Focuses on outcomes U U U U U U U
C an be done retrospectively or in U U U U
real time
Stresses complementarity of U U
qualitative & quantitative methods
M ay inappropriately &
counterproductively mix positivistic & U
postmodern paradigms
High cost U
Low feasibility U U U
This paper turns next to a set of approaches that Decision makers, decision situations, and
stress the need to fully assess a program’s merit program accountability requirements provide
and worth, whatever the required questions and useful advance organizers for
methods. These are the improvement/ decision/accountability-oriented studies. The
accountability-oriented evaluation approaches, approach emphasizes that decision makers
labeled Decisions/Accountability, Consumer- include not just top managers but stakeholders
Orientation, and Accreditation. Respectively, at all organizational levels of a program. From
these three approaches emphasize improvement the bottom up, such stakeholders may include
through serving program decisions, providing beneficiaries, parents and guardians, service
consumers with assessments of optional providers, administrators, support personnel,
programs and services, and helping consumers to policy boards, funding authorities, taxpayers,
gain assurances that given programs are etc. The generic decision situations to be served
professionally sound. may include formulation of goals and priorities,
identification and assessment of competing
Approach 16: Decision/Accountability-Oriented approaches, planning and budgeting program
Studies operations, staffing programs, carrying out
planned activities, judging outcomes,
The decision/accountability-oriented approach determining how best to use programs,
emphasizes that program evaluation should be recycling program operations, etc. Key classes
used proactively to help improve a program as of needed evaluative information are
well as retroactively to judge its merit and assessments of needs, problems, and
worth. As mentioned previously, the opportunities; identification and assessment of
decision/accountability-oriented approach competing program approaches; assessment of
should be distinguished from management program plans; asse ssment of staff
information systems and from politically qualifications and performance; assessment of
controlled studies because of the emphasis in program facilities and materials; monitoring and
decision/accountability-oriented studies on assessment of program implementation;
questions of merit and worth. The approach’s assessment of intended and unintended and
philosophical underpinnings include an short-range and long-range outcomes; and
objectivist orientation to finding best answers to assessment of cost-effectiveness.
context-limited questions and subscription to the
principles of a well-functioning democratic Basically, the purpose of decision/accountability
society, especially human rights, equity, studies is to provide a knowledge and value
excellence, conservation, and accountability. base for making and being accountable for
Practically, the approach is oriented to engaging decisions that result in developing, delivering,
stakeholders in focusing the evaluation; and making informed use of cost-effective
addressing their most important questions; services. Serving this purpose requires that
providing timely, relevant information to assist evaluators interact with representative members
decision making; and producing an of their audiences and supply them with
accountability record. relevant, timely, efficient, and accurate
evaluative feedback. A theme of this approach
42 Stufflebeam
is that the most important purpose of evaluation order to help define evaluation questions, shape
is not to prove but to improve. evaluation plans, review draft reports, and help
disseminate findings. This panel should include
The sources of questions addressed by the representatives of all stakeholder groups. The
decision/accountability-oriented approach are evaluator’s exchanges with this group involve
the concerned and involved stakeholders. These conveyance of evaluation feedback that may be
may include all persons and groups who must of use in program improvement and use, also
make choices related to initiating, planning, planning what future evaluation activities and
implementing, and using a program’s services. reports would be most helpful to program
Main questions addressed are, What beneficiary personnel and other stakeholders. Interim
needs should be addressed? What are the reports may also assist beneficiaries, program
available alternatives for addressing these needs, staff, and others to obtain feedback on the
and what are their comparative merits? What program’s merits and worth. By maintaining a
plan of services should be operationalized and dynamic baseline of evaluation information and
delivered? What facilities, materials, and ways that the information was applied, the
equipment are needed? Who should conduct evaluator can use this information to develop a
the program? What roles should the different comprehensive summative evaluation report, to
participants carry out? Is the program working present periodic feedback to the broad group of
and should it be revised in any way? Is the stakeholders, and to supply program personnel
program effectively reaching all the targeted with information they need to make their own
beneficiaries and meeting their needs? Were the accountability reports.
program staff members responsible and effective
in carrying out their responsibilities to Involvement of stakeholders, as a key feature of
implement the program and meet the this approach, is consistent with a key principle
beneficiaries’ needs? Is the program better than of the change process. An enterprise—read
competing alternatives? Is it sustainable? Is it evaluation here—can best help bring about
transportable? Is the program worth the required change in a target group’s behavior if that group
initial investment? Answers to these and related was involved in planning, monitoring, and
questions are to be based on the underlying assessing outcomes of the enterprise. By
standard of good programs, i.e., they must involving stakeholders throughout the
effectively reach and serve the beneficiaries’ evaluation process, decision-oriented evaluators
targeted needs at a reasonable cost and do so as lay the groundwork for bringing stakeholders to
well or better than reasonably available understand and value the evaluation process and
alternatives. apply the findings.
Many methods may be used in decision/ Cronbach (1963) first introduced educators to
accountability-oriented program evaluations. the idea that evaluation should be reoriented
Among others, these include surveys, needs from its objectives-based history to a concern
assessments, case studies, advocate teams, for helping program personnel make better
observations, interviews, resident evaluators, decisions about how to deliver effective
and quasi-experimental and experimental services. While he did not use the terms
designs. The point needs to be underscored that formative and summative evaluation, he
this approach involves the evaluator and a essentially defined the underlying concepts. In
representative body of stakeholders in regular discussing the distinctions between the
exchanges about the evaluation. Typically, the constructive, proactive orientation on the one
evaluator should establish and regularly interact hand and the retrospective, judgmental
with an evaluation advisory or review panel in orientation on the other, he argued for placing
Improvement/Accountability-Oriented Evaluation Approaches 43
more emphasis on the former—in contrast to the process to assure that their evaluation needs are
evaluation tradition of stressing well addressed and to encourage and support
retrospective outcomes evaluation. Later, I them to make effective use of evaluation
(Stufflebeam, 1966, 1967) introduced a findings. It is comprehensive in attending to
conceptualization of evaluation that was based context, inputs, process, and outcomes. It
on the idea that evaluation should help program balances the use of quantitative and qualitative
personnel make and defend decisions that are in methods. It is keyed to professional standards
the best interest of meeting beneficiaries’ needs. for evaluations. Finally, the approach
While I argued for an improvement orientation emphasizes that evaluations must be grounded
to evaluation, I also emphasized that evaluators in the democratic principles of a free society.
must both inform decisions and provide an
informational basis for accountability. I also A main limitation is that the collaboration
emphasized that the approach should interact required between an evaluator and stakeholders
with and serve the full range of stakeholders introduces opportunities for impeding the
who need to make judgments and choices about evaluation and/or biasing its results, especially
a program. Other persons who have contributed when the evaluative situation is politically
to the development of a decision/accountability charged. Also, when evaluators are actively
orientation to evaluation are Alkin (1969) and influencing the course of a program, they may
Webster.7 identify so closely with it that they lose some of
the independent, detached perspective needed to
The decision/accountability-oriented approach is provide objective, forthright reports. Moreover,
applicable in cases where program staffs and the approach may overemphasize formative
other stakeholders want and need both evaluation and give too little attention to
formative and summative evaluation. It can summative evaluation. External
provide the evaluation framework for both metaevaluation has been employed to
internal evaluation and external evaluation. counteract opportunities for bias and to assure
When used for internal evaluation, usually it is the proper balance of formative and summative
important to commission an independent evaluation. Though the charge is erroneous, this
metaevaluation of the inside evaluator’s work. approach carries the connotation that only top
In addition to application to program decision makers are served.
evaluations, this approach has proved useful in
evaluating personnel, students, projects, Approach 17: Consumer-Oriented Studies
facilities, and products.
In the consumer-oriented approach, the
A main advantage of the evaluator is the “enlightened surrogate
decision/accountability-oriented approach is that consumer.” He or she must draw direct
it encourages program personnel to use evaluative conclusions about the program being
evaluation continuously and systematically in evaluated. Evaluation is viewed as the process
their efforts to plan and implement programs of determining something’s merit and worth,
that meet beneficiaries’ targeted needs. It aids with evaluations being the products of that
decision making at all levels of a system and process. The approach regards a consumer’s
stresses program improvement. It also presents welfare as a program’s primary justification and
a rationale and framework of information for accords that welfare the same primacy in
helping program personnel to be accountable for program evaluation. Grounded in a deeply
their decisions and actions in implementing a reasoned view of ethics and the common good
program. It is heavily geared to involving the plus skills in obtaining and synthesizing
full range of stakeholders in the evaluation pertinent, valid, and reliable information, the
44 Stufflebeam
evaluator should help developers produce and Questions for the consumer-oriented study are
deliver products and services that are of derived from society, from program
excellent quality and of great use to consumers constituents, and especially from the evaluator’s
(e.g., students, their parents, teachers, and frame of reference. The general question
taxpayers). More importantly, the evaluator addressed is, Which of several alternative
should help consumers identify and assess the programs is the best choice, given their
merit and worth of competing programs, differential costs, the needs of the consumer
services, and products. group, the values of society at large, and
evidence of both positive and negative
Advance organizers include societal values, outcomes?
consumers’ needs, costs, and criteria of
goodness in the particular evaluation domain. Methods include checklists, needs assessments,
The purpose of a consumer-oriented program goal-free evaluation, experimental and quasi-
evaluation is to judge the relative merits and experimental designs, modus operandi analysis,
worths of the products and services of applying codes of ethical conduct, and cost
alternative programs and, thereby, to help analysis (Scriven, 1974). A preferred method is
taxpayers, practitioners, and potential for an external, independent consumer advocate
beneficiaries make wise choices. This approach to conduct and report findings of studies of
is objectivist in assuming an underlying reality publicly supported programs. The approach is
and positing that it is possible although often keyed to employing a sound checklist of the
extremely difficult to find best answers. This program’s key aspects. Scriven (1991)
approach looks at a program comprehensively in developed a generic “Key Evaluation
terms of its quality and costs, functionally Checklist” for this purpose. The main
regarding the assessed needs of the intended evaluative acts in this approach are grading,
beneficiaries, and comparatively considering scoring, ranking, apportioning, and producing
reasonably available alternative programs. the final synthesis (Scriven, 1994a).
Evaluators are expected to subject their program
evaluations to evaluations, what Scriven termed Scriven (1967) was a pioneer in applying the
metaevaluation. consumer-oriented approach to program
evaluation, and his work parallels the
The approach employs a wide range of concurrent work of Ralph Nader and the
assessment topics. These include program Consumers Union in the general field of
description, background and context, client, consumerism. Glass has supported and
consumers, resources, function, delivery developed Scriven’s approach.8 Scriven coined
system, values, standards, process, outcomes, the terms formative and summative evaluation.
costs, critical competitors, generalizability, He allowed that evaluations can be divergent in
statistical significance, assessed needs, bottom- early quests for critical competitors and
line assessment, practical significance, explorations related to clarifying goals and
recommendations, reports, and metaevaluation. making programs function well. However, he
The evaluation process begins with also maintained that ultimately evaluations must
consideration of a broad range of such topics, converge on summative judgments about a
continuously compiles information on all of program’s merit and worth. While accepting the
them, and ultimately culminates in a super- importance of formative evaluation, he also
compressed judgment of the program’s merit argued against Cronbach’s (1963) position that
and worth. formative evaluation should be given the major
emphasis. According to Scriven, the bottom-
line aim of a sound evaluation is to judge the
Improvement/Accountability-Oriented Evaluation Approaches 45
program’s merit, comparative value, and overall One disadvantage is that the approach can be so
worth. Scriven (1991, 1994a) sees evaluation as independent from practitioners that it may not
a transdiscipline encompassing all evaluations of assist them to do a better job of serving
various entities across all applied areas and consumers. If summative evaluation is applied
disciplines and comprised of a common logic, too early, it can intimidate developers and stifle
methodology, and theory that transcends their creativity. However, if summative
specific evaluation domains, which also have evaluation is applied only near a program’s end,
their unique characteristics. the evaluator may have great difficulty in
obtaining sufficient evidence to confidently and
The consumer-oriented study requires a highly credibly judge the program’s basic value. This
credible and competent expert plus either often iconoclastic approach is also heavily
sufficient resources to allow the expert to dependent on a highly competent, independent,
conduct a thorough study or other means to and “bulletproof” evaluator.
obtain the needed information. Often, a
consumer-oriented evaluator is engaged to Approach 18: Accreditation/Certification
evaluate a program after its formative stages are Approach
over. In these situations, the external consumer-
oriented evaluator is often dependent on being Most school districts and universities and many
able to access a substantial base of information professional organizations have periodically
that the program staff had accumulated. If no been the subject of an accreditation study, and
such base of information exists, the consumer- many professionals, at one time or another, have
oriented evaluator will have great difficulty in had to meet certification requirements for a
obtaining enough information to produce a given position. Such studies of institutions and
thorough, defensible summative program personnel are in the realm of accountability-
evaluation. oriented evaluations, and they have an
improvement element as well. Institutions,
One of this approach’s main advantages is that institutional programs, and personnel are studied
it is a hard-hitting, independent assessment to prove whether they are fit to serve designated
intended to protect consumers from shoddy functions in society; typically, the feedback
programs, services, and products and instead to reports include areas for improvement.
guide them to support and use those
contributions that best and most cost-effectively The advance organizers used in the
address their needs. Also, the approach’s stress accreditation/certification study usually are
on independent/objective assessment yields guidelines and criteria that some accrediting or
high credibility with consumer groups. The certifying body has adopted. As previously
approach directly attempts to achieve a suggested, the evaluation’s purpose is to
comprehensive assessment of merit and worth. determine whether institutions, institutional
This is aided by Michael Scriven’s (1991) Key programs, and/or personnel should be approved
Evaluation Checklist and his Evaluation to perform specified functions.
Thesaurus (in which he presents and explains
the checklist). The approach provides for a The source of questions for accreditation or
summative evaluation to yield a bottom-line certification studies is the accrediting or
judgment of merit and worth, preceded by a certifying body. Basically, they address this
formative evaluation to assist developers to help question: Are institutions and their programs
assure that their programs will succeed. and personnel meeting minimum standards, and
how can their performance be improved?
46 Stufflebeam
Typical methods used in the accreditation/ many opportunities for corruption and inept
certification approach are self-study and self- performance. As has been said for a number of
reporting by the individual or institution. In the the evaluation approaches described above, it is
case of institutions, panels of experts are prudent to subject accreditation and certification
assigned to visit the institution, verify a self- processes themselves to independent
report, and gather additional information. The metaevaluations.
basis for the self-studies and the visits by expert
panels are usually guidelines and criteria that The three improvement/accountability-oriented
have been specified by the accrediting agency. approaches emphasize the assessment of merit
and worth, which is the thrust of the definition
Accreditation of education was pioneered by the of evaluation used to classify the 22 approaches
College Entrance Examination Board around considered in this paper. Tables 7 through 12
1901. Since then, the accreditation function has summarize the similarities and differences
been implemented and expanded, especially by between the models in relationship to advance
the Cooperative Study of Secondary School organizers, purposes, characteristic questions,
Standards, dating from around 1933. methods, strengths, and weaknesses. The paper
Subsequently, the accreditation approach has turns next to the fourth and final set of program
been developed, further expanded, and evaluation approaches—those concerned with
administered by the North Central Association using evaluation to further some social agenda.
of Secondary Schools and Colleges, along with
their associated regional accrediting agencies
across the United States, and by many other
accrediting and certifying bodies. Similar
accreditation practices are found in medicine,
law, architecture, and many other professions.
Evaluation Approaches
Advance Organizers
16. 17. Consumer 18. Accreditation
Decision/ Orientation
Accountability
Decision makers/stakeholders U
Decision situations U
Program operations U U
Program outcomes U U U
Cost-effectiveness U U
Assessed needs U U
Societal values U
Table 8: Comparison of the Primary PURPOSES of the Three Improvement/Accountability Evaluation Approaches
Evaluation Approaches
Provide a knowledge
and value base for U
decisions
Judge alternatives U
Approve/
recommend U
professional services
Improvement/Accountability-Oriented Evaluation Approaches 49
Evaluation Approaches
Surveys U
Needs assessments U U
Case studies U
Advocate teams U
Observations U U
Interviews U U U
Resident evaluators U
Quasi experiments U U
Experiments U U
Checklists U U
Goal-free evaluations U
Cost analysis U
Self-study U
Table 11: Comparison of the Prevalent STRENGTHS of the Three Improvement/Accountability Evaluation
Approaches
Evaluation Approaches
Keyed to professional
standards U U
Stresses program U
improvement
Emphasizes democratic U U
principles
Stresses an independent U U
perspective
Produces a comprehensive
assessment of merit & worth U U U
Emphasizes cost-effectiveness U
Table 12: Comparison of the Prevalent WEAKNESSES of the Three Improvement/Accountability Evaluation
Approaches
Evaluation Approaches
May underemphasize U
outcome information
The Social Agenda/Advocacy approaches are administer, or directly operate the programs
heavily directed to making a difference in society under study and seek or need evaluators’
through program evaluation. These counsel and advice in understanding, judging,
approaches especially are employed to ensure and improving the programs. The approach
that all segments of society have equal access to charges evaluators to interact continuously with
educational and social opportunities and and respond to the evaluative needs of the
services. The approaches even have an various clients.
affirmative action bent toward giving
preferential treatment through program This approach contrasts sharply with Scriven’s
evaluation to the disadvantaged. If—as many consumer-oriented approach. Stake’s evaluators
persons have stated—information is power, then are not the independent, objective assessors as
this set of approaches could be said to be seen in Scriven’s approach. The client-centered
oriented toward employing program evaluation, study embraces local autonomy and helps
sometimes in a biased way, to empower the people who are involved in a program to
disenfranchised. By giving stakeholders the evaluate it and use the evaluation for program
authority for key evaluation decisions, related improvement. The evaluator in a sense is the
especially to interpretation and release of client’s handmaiden as they strive to make the
findings, evaluators empower these persons to evaluation serve their needs. Moreover, the
use evaluation to their best advantage; but they client-centered approach rejects objectivist
also may make the evaluation vulnerable to bias evaluation and instead subscribes to the
and other misuse. Nevertheless, there is much postmodernist view, wherein there are no best
to recommend these approaches, since they are answers or clearly preferable values and wherein
strongly oriented to democratic principles of subjective information is preferred. In this
equity and fairness and employ practical approach, the program evaluation may
procedures for involving the full range of culminate in conflicting findings and
stakeholders. conclusions, leaving interpretation to the eyes of
the beholders. Client-centered evaluation is
Approach 19: Client-Centered Studies (or perhaps the leading entry in the “relativistic
Responsive Evaluation) school of evaluation,” which calls for a
pluralistic, flexible, interactive, holistic,
The classic approach in this set is the client- subjective, constructivist, and service-oriented
centered study or what Robert Stake (1983) has approach. The approach is relativistic because
termed the responsive evaluation. The label it seeks no final authoritative conclusion, but
client-centered evaluation is used here, because instead interprets findings against stakeholders’
one pervasive theme is that the evaluator must different and often conflicting values. The
work with and for the support of a diverse client approach seeks to examine a program’s full
group including, for example, teachers, countenance and prizes the collection and
administrators, developers, taxpayers, legislators, reporting of multiple, often conflicting
and financial sponsors. They are the perspectives on the value of a program’s format,
Clients in the sense that they support, develop, operations, and achievements. Side effects and
54 Stufflebeam
incidental gains as well as intended outcomes This approach reflects a formalization of the
are to be identified and examined. longstanding practice of informal, intuitive
evaluation. It requires a relaxed and continuous
The advance organizers in client-centered exchange between the evaluator and clients.
evaluations are stakeholders’ concerns and The approach is more divergent than
issues in the program itself, also the program’s convergent. Basically, the approach calls for
rationale, background, transactions, outcomes, continuing communication between evaluator
standards, and judgments. The client-centered and audience for the purposes of discovering,
program evaluation may serve a wide range of investigating, and addressing a program’s
purposes. Some of these are helping people in issues. Designs for client-centered program
a local setting gain a perspective on the evaluations are relatively open-ended and
program’s full countenance; understanding the emergent, building to narrative description,
ways that various groups see the program’s rather than aggregating measurements across
problems, strengths, and weaknesses; and cases. The evaluator attempts to issue timely
learning the ways affected people value the responses to clients’ concerns and questions by
program plus the ways program experts judge it. collecting and reporting useful information,
The evaluator’s process goal is to carry on a even if the needed information hadn’t been
continuous search for key questions and to anticipated at the study’s beginning.
provide the clients with useful information as it Concomitant with the ongoing conversation
becomes available. with the clients, the evaluator attempts to obtain
and present a rich set of information on the
The client-centered/responsive approach has a program. This includes its philosophical
strong philosophical base: evaluators should foundation and purposes, history, transactions,
promote equity and fairness, help those with and outcomes. Special attention is given to side
little power, thwart the misuse of power, expose effects, the standards that various persons hold
the huckster, unnerve the assured, reassure the for the program, and their judgments of the
insecure, and always help people see things program.
from alternative viewpoints. The approach
subscribes to moral relativity and posits that for Depending on the evaluation’s purpose, the
any given set of findings there are potentially evaluator may legitimately employ a range of
multiple, conflicting interpretations that are different methods. Some of the preferred
equally plausible. methods are the case study, expressive
objectives, purposive sampling, observation,
Community, practitioner, and beneficiary adversary reports, story telling to convey
groups in the local environment plus external complexity, sociodrama, and narrative reports.
program area experts provide the questions Client-centered evaluators are charged to check
addressed by the client-centered study. In for existence of stable and consistent findings by
general, the groups usually want to know what employing redundancy in their data-collecting
the program has achieved, how it operated, and activities and replicating their case studies.
the ways in which it is judged by involved Evaluators are not expected to act as a
persons and experts in the program area. The program’s sole or final judges, but should
more specific evaluation questions emerge as collect, process, and report the opinions and
the study unfolds based on the evaluator’s judgments of the full range of the program’s
continuing interactions with the stakeholders stakeholders plus pertinent experts. In the end,
and their collaborative assessment of the the evaluator makes a comprehensive statement
developing evaluative information. of what the program is observed to be and
Social Agenda/Advocacy Approaches 55
may bog down in an unproductive quest for measurements and statistics. He or she employs
multiple inputs and interpretations. a relativist perspective to obtain and analyze
findings, stressing locality and specificity over
Approach 20: Constructivist Evaluation generalizability. The evaluator posits that there
can be no ultimately correct conclusions. He or
The constructivist approach to program she exalts openness and the continuing search
evaluation is heavily philosophical, service for more informed and illuminating
oriented, and paradigm driven. The constructions.
constructivist paradigm rejects the existence of
any ultimate reality and employs a subjectivist This approach is as much recognizable for what
epistemology. It sees knowledge gained as one it rejects as for what it proposes. In general, it
or more human constructions, uncertifiable and strongly opposes positivism as a basis for
constantly problematic and changing. It places evaluation, with its realist ontology, objectivist
the evaluators and program stakeholders at the epistemology, and experimental method. It
center of the inquiry process, employing all of rejects any absolutist search for correct answers.
them as the evaluation’s “human instruments.” It directly opposes the notion of value-free
The approach insists that evaluators be totally evaluation and attendant efforts to expunge
ethical in respecting and advocating for all the human bias. It rejects positivism’s deterministic
participants, especially the disenfranchised. and reductionist structure and its belief in the
Evaluators are authorized, even expected to possibility of fully explaining studied programs.
maneuver the evaluation to emancipate and
empower involved or affected disenfranchised The constructivist approach’s advance
people. Evaluators do this by raising organizers are basically the philosophical
stakeholders’ consciousness so that they are constraints placed on the study, as seen above,
energized, informed, and assisted to transform including the requirement of collaborative,
their world. The evaluator must respect the unfolding inquiry. A constructivist approach’s
participants’ free will in all aspects of the main purpose is to determine and make sense of
inquiry and should empower them to help shape the variety of constructions that exist among
and control the evaluation activities in their stakeholders. The approach keeps the inquiry
preferred ways. The inquiry process must be open to ongoing communication and to the
consistent with effective ways of changing and gathering, analysis, and synthesis of further
improving society. Thus, stakeholders must play constructions. One construction is not
a key role in determining the evaluation considered more true than others, but some may
questions and variables. Throughout the study, be judged as more informed and sophisticated
the evaluator regularly and continuously informs than others. All evaluation conclusions are
and consults the stakeholders in all aspects of viewed as indeterminate with the continuing
the study. The approach rescinds any special possibility of finding better answers. All
privilege of scientific evaluators to work in constructions are also context dependent. In this
secret and control/manipulate human subjects. respect, the evaluator does define boundaries on
In guiding the program evaluation, the what is being investigated.
evaluator balances verification with a quest for
discovery, balances rigor with relevance, and The questions addressed in constructivist
balances the use of quantitative and qualitative studies cannot be determined apart from the
methods. The evaluator also provides rich and participants’ interactions. Together, the
deep description in preference to precise evaluator and stakeholders identify the
questions to be addressed. These questions
Social Agenda/Advocacy Approaches 57
consistent with the principle of effective change informed about the issues being addressed in an
processes that people are more likely to value evaluation and thus are poor data sources. It can
and use something (read evaluation here) if they be unrealistic to expect that the evaluation can
are consulted and involved in its development. and will take the needed time to inform and then
It also seeks to directly involve the full range of meaningfully involve those who begin as
stakeholders who might be harmed or helped by basically ignorant of the program being
the evaluation as important, empowered partners assessed. Also, constructivist evaluations can be
in the evaluation enterprise. It is said to be greatly burdened by itinerant evaluation
educative for all the participants, whether or not stakeholders who come and go and who expect to
a consensus is reached. It also lowers reopen questions previously addressed and
expectations for what clients can learn about any consensus previously reached. In addition,
causes and effects. While it doesn’t promise some evaluation clients don’t take kindly to
final answers, it does move from a divergent evaluators who are prone to report competing,
stage, in which it searches widely for insights perspectivist answers without taking a stand
and judgments, to a convergent stage in which regarding the program’s merit and worth. Also,
some unified answers are sought. In addition, it many clients aren’t necessarily attuned to the
uses participants as instruments in the constructivist philosophy. Instead, they may
evaluation, thus taking advantage of their value reports that mainly include hard data on
relevant experiences, knowledge, and value outcomes and assessments of statistical
perspectives; this greatly reduces the burden of significance. Often, they also expect that reports
developing, field-testing, and validating should be based on relatively independent
information collection instruments before using perspectives that are free of program
them. The approach makes effective use of participants’ conflicts of interest. In addition,
qualitative methods and triangulates findings the constructivist approach is a countermeasure to
from different sources. assigning responsibility for successes and
failures in a program to certain individuals or
However, the approach is limited in its groups; many policy boards, administrators, and
applicability and has some disadvantages. financial sponsors might see this rejection of
Because of the need for full involvement and individual and group accountability as
ongoing interaction through both the divergent unworkable and unacceptable. It is easy to say
and convergent stages, it is often difficult to that all persons in a program should share the
produce the timely reports that funding agencies glory or the disgrace; but try to tell this to an
and decision makers demand. Also, to work exceptionally hardworking and effective teacher
well the approach requires the attention and in a school program where virtually no one else
responsible participation of a wide range of tries or succeeds.
stakeholders. The approach seems to be
unrealistically utopian in this regard. Approach 21: Deliberative Democratic
Widespread, grass-roots interest and Evaluation
participation are often hard to obtain and sustain
throughout a program evaluation. This can be Perhaps the newest entry in the program
exacerbated by a continuing turnover of evaluation models enterprise is the deliberative
stakeholders. While the process emphasizes and democratic approach advanced by House and
promises openness and full disclosure, some Howe (1998). The approach functions within
participants don’t want to tell their private an explicit democratic framework and charges
thoughts and judgments to the world. evaluators to uphold democratic principles in
Moreover, stakeholders sometimes are poorly reaching defensible evaluative conclusions. The
Social Agenda/Advocacy Approaches 59
willing to give up sufficient power to allow improving higher education. He would say that
inputs from a wide range of stakeholders, early changing any aspect of our university would
disclosure of preliminary findings to all require getting every professor to withhold her
interested parties, and opportunities for the or his veto. In view of the very ambitious
stakeholders to play an influential role in demands of the democratic dialogic approach,
reaching the final conclusions. Also, a House and Howe have proposed it as an ideal to
representative group of stakeholders must be be kept in mind even though evaluators will
willing to engage in open and meaningful seldom, if ever, be able to achieve this ideal.
dialogue and deliberation in all stages of the
study. Approach 22. Utilization-Focused Evaluation
This approach has many advantages associated The utilization-focused approach is explicitly
with any democratic process. It is a direct geared to assure that program evaluations make
attempt to make evaluations just. It assures impacts. It is a process for making choices
democratic participation of stakeholders in all about an evaluation study in collaboration with
stages of the evaluation. It strives to incorporate a targeted group of priority users, selected from
the views of all interested parties, including a broader set of stakeholders, in order to focus
insiders and outsiders, disenfranchised persons effectively on their intended uses of an
and groups, those who control the purse strings, evaluation. All aspects of a utilization-focused
etc. Meaningful democratic involvement should program evaluation are chosen and applied to
direct the evaluation to the issues that people help the targeted users obtain and apply
care about and incline them to respect and use evaluation findings to their intended uses and to
the evaluation findings. It employs dialogue to maximize the possibility they will do so. Such
examine and authenticate stakeholders’ inputs. studies are judged more for the difference they
A key advantage over some other advocacy make in improving programs and influencing
approaches is that the democratic deliberative decisions and actions than for their elegance and
evaluator reserves the right to rule out inputs technical excellence. No matter how good an
that are considered incorrect or unethical. The evaluation report is, if it only sits on the shelf
evaluator is open to all stakeholders’ views, gathering dust, then it contributed little if
carefully considers them, but then renders as anything to the evaluation’s success.
defensible a judgment of the program as
possible. He or she does not leave the The advance organizers of utilization-focused
responsibility for reaching a defensible final program evaluations are, in the abstract, the
a s se s s men t t o a maj or i t y vote of possible users and uses to be served. Working
stakeholders—some of whom are sure to have from this initial conception, the evaluator moves
conflicts of interest and be uninformed. In as directly as possible to identify in concrete
rendering a final judgment, the evaluator terms the actual users to be served. Through
ensures closure. careful and thorough analysis of stakeholders,
the evaluator identifies the multiple and varied
As House and Howe have acknowledged, the perspectives and interests that should be
democratic dialogic approach is, at this time, represented in the study. He or she then selects
unrealistic and often cannot be fully applied. a group that is willing to pay the price of
This approach—in offering and expecting full substantial involvement and that appropriately
democratic participation in order to make an represents the program’s stakeholders. The
evaluation work—reminds me of a colleague evaluator then engages this client group to
who used to despair of ever changing or clarify why they need the evaluation, how they
Social Agenda/Advocacy Approaches 61
In deliberating with the intended users, the All evaluation methods are fair game in the
evaluator emphasizes that the program utilization-focused program evaluation. The
evaluation’s purpose must be to give them the evaluator will creatively employ whatever
information they need to fulfill their objectives. methods are relevant, e.g., quantitative and
Such objectives include socially valuable qualitative, formative and summative,
aims such as combating problems of illiteracy, naturalistic and experimental. As much as
crime, hunger, homelessness, unemployment, possible, the utilization-focused evaluator puts
child abuse, spouse abuse, substance abuse, the client group in “the driver’s seat” in
illness, alienation, discrimination, determining evaluation methods, so that they
malnourishment, pollution, bureaucratic waste, will make sure the evaluator addresses their
etc. However, it is the targeted users who most important questions; collects the right
determine the program to be evaluated, what information; applies the relevant values;
information is required, how and when it must be addresses the key action-oriented questions;
reported, and how it will be used. uses techniques they respect; interprets the
findings against a pertinent theory; reports the
In this approach, the evaluator is no iconoclast, information in a form and at a time when it can
but instead is the intended users’ servant and a best be used; convinces stakeholders of the
facilitator. The evaluation should meet the full evaluation’s integrity and accuracy; and
range of professional standards for program facilitates the users’ study, application, and—as
evaluations, not just utility. The evaluator must appropriate—dissemination of the findings.
therefore be an effective negotiator, standing on The bases for interpreting evaluation findings
principles of sound evaluation, but working hard are the users’ values, with the evaluator
to gear a defensible program evaluation to the engaging in much values clarification to ensure
targeted users’s evolving needs. The utilization- that evaluative information and interpretations
focused evaluation is considered situational and serve the users’ purposes. The users are actively
dynamic. Depending on the circumstances, the involved in interpreting findings. Throughout
evaluator may play any of a variety of the evaluation process, the evaluator balances
roles—trainer, measurement expert, internal
62 Stufflebeam
the concern for utility with provisions for for both conducting the program evaluation and
validity and cost-effectiveness. the needed follow-through.
Evaluation users U
Evaluation uses U
Stakeholders’ concerns
& issues in the program U U U
itself
Background of the U
program
Transactions/
operations in the U
program
Outcomes U
Standards U
Judgments U
Collaborative, unfolding
nature of the inquiry U U U
Constructivist U
perspective
Rejection of positivism U
Democratic participation U U U U
Dialogue with U
stakeholders to validate
their inputs
Social Agenda/Advocacy Approaches 65
Employ democratic
participation in arriving at a U
defensible assessment of a
program
Were questions
negotiated with
stakeholders? U U U
How do various
stakeholders judge the U U U
program?
Case study U U
Expressive objectives U
Purposive sampling U U
Observation U U
Adversary reports U
Hermeneutics to identify U
alternative constructions
Dialectical exchange U
Consensus development U
Surveys U U
Debates U
As shown in the preceding parts, a variety of Accreditation approach results from its being a
evaluation approaches emerged during the 20th labor intensive, expensive approach; its
century. Nine of these approaches appear to be susceptibility to conflict of interest; its
strongest and most promising for continued use overreliance on self-reports and brief site visits;
and development beyond the year 2000. As and its insular resistance to independent
shown in the preceding analyses, the other 13 metaevaluations. Nevertheless, the distinctly
approaches also have varying degrees of merit, American and pervasive accreditation approach
but I chose in this section to converge attention is entrenched. All who will use it are advised to
to the most promising approaches. The ratings strengthen it in the areas of weakness identified
of these 9 approaches appear in Table 19. They in this paper. The Consumer-Oriented approach
are listed in order of merit, within the categories also deserves its special place, with its emphasis
of Improvement/Accountability, Social on independent assessment of developed
Mission/Advocacy, and Questions/Methods products and services. While this consumer
evaluation approaches. The ratings are in protection approach is not especially applicable
relationship to the Joint Committee Program to internal evaluations for improvement, it
Evaluation Standards and were derived by the complements such approaches with the
author using a special checklist keyed to the outsider, expert view that becomes important
Standards.3 when products and services are put up for
dissemination.
All nine of the rated approaches earned overall
ratings of Very Good, except Accreditation, The Case Study approach scored surprisingly
which was judged Good overall. The well, considering that it is focused on use of a
Utilization-Focused and Client-Centered particular technique. An added bonus of this
approaches received Excellent ratings in the approach is that it can be employed as a
standards areas of Utility and Feasibility, while component of any of the other approaches, or it
the Decision/Accreditation approach was judged can be used by itself. As mentioned previously
Excellent in provisions for Accuracy. The in this paper, the Democratic Deliberative
rating of Good in the Accuracy area for the approach is new and appears to be promising for
Outcomes Monitoring/Value-Added approach testing and further development. Finally, the
was due not to low merit of what this approach’s Constructivist approach is a well-founded,
techniques, but to the narrowness of questions mainly qualitative approach to evaluation that
addressed and information used; in its narrow systematically engages interested parties to help
sphere of application the Outcomes conduct both the divergent and convergent
Monitoring/Value-Added approach provides stages of evaluation. All in all, the nine
technically sound information. The approaches summarized in Table 19 bode well
comparatively lower ratings given to the for the future application and further
development of alternative program evaluation
approaches.
3
The checklist used to evaluate each approach against
the Joint Committee Program Evaluation Standards appears in
this paper’s appendix.
72
Table 19: RATINGS Strongest Program Evaluation Approaches
Within types, listed in order of compliance with The Program Evaluation Standards
Evaluation Approach Graph of overall merit Overall UTILITY FEASIBILITY PROPRIETY ACCURACY
Score & Rating Rating Rating Rating
Rating
0 100
š P š F š G š VG š E š
IMPROVEMENT/ACCOUNTABILITY
SOCIAL MISSION/ADVOCACY
,,,,,,,,,,,,,,,,,
Stufflebeam
Utilization-Based 87 (V G) 96 (E) 92 (E) 81 (V G) 79 (V G)
Constructivist ,,,,,,,,,,,,,,,, 80 (V G) 82 (V G) 67 (V G) 88 (V G) 83 (V G)
QUESTIONS/METHODS
The tests behind the ratings: The author rated each evaluation approach on each of the 30 Joint Committee program evaluation standards by judging whether the approach endorses each
of 10 key features of the standard. He judged the approach’s adequacy on each standard as follows: 9-10 Excellent, 7-8 Very Good, 5-6 Good, 3-4 Fair, 0-2 Poor. The score for the
approach on each of the 4 categories of standards (Utility, Feasibility, Propriety, Accuracy) was then determined by summing the following products: 4 x number of Excellent ratings, 3 x
number of Very Good ratings, 2 x number of Good ratings, 1 x number of Fair ratings. Judgments of the approach’s strength in satisfying each category of standards were then determined
according to percentages of the possible quality points for the category of standards as follows: 93%-100% Excellent, 68%-92% Very Good, 50% -67% Good, 25%-49% Fair, 0%-24%
Poor. This was done by converting each category score to the percent of the maximum score for the category and multiplied by 100. The 4 equalized scores were then summed, divided by
4, and compared to the total maximum value, 100. The approach’s overall merit was then judged as follows: 93-100 Excellent, 68-92 Very Good, 50-67 Good, 25-49 Fair, 0-24 Poor.
Regardless of the approach’s total score and overall rating, a notation of unacceptable would have been attached to any approach receiving a poor rating on the vital standards of P1 Service
Orientation, A5 Valid Information, A10 Justified Conclusions, A11 Impartial Reporting. The author’s ratings were based on his knowledge of the Joint Committee Program Evaluation
Standards, his many years of studying the various evaluation models and approaches, and his experience in seeing and assessing how some of these models and approaches worked in
practice. He chaired the Joint Committee on Standards for Educational Evaluation during its first 13 years and led the development of the first editions of both the program and personnel
evaluation standards. Nevertheless, his ratings should be viewed as only his personal set of judgments of these models and approaches. Also, his conflict of interest is acknowledged, since
he was one of the developers of the Decision/Accountability approach. The scale ranges in the above graphs are P =Poor, F=Fair, G=Good, VG=Very Good, E=Excellent.
Best Approaches for 21st Century Evaluation 73
applauded for their quest for equity as well as some of the authors’ hesitancy to accord the
excellence in the programs being studied. status of a model to their contributions or
They model their mission by attempting to inclination to label them as utopian. As also
make evaluation a participatory, democratic seen in the paper, there are some approaches
enterprise. Unfortunately, many pitfalls that in the main seem to be a waste of time or
attend such utopian approaches to evaluation. even counterproductive.
Especially, these include susceptibility to bias
and political subversion of the study and Theoreticians should diagnose strengths and
practical constraints on involving, informing, weaknesses of existing approaches, and they
and empowering all the stakeholders. should do so in more depth than demonstrated
here. They should use these diagnoses to
For the evaluation profession itself, the evolve better, more defensible approaches and
review of program evaluation models to help expunge the use of hopelessly flawed
underscores the importance of evaluation approaches; they should work with
standards and metaevaluations. Professional practitioners to operationalize and test the
standards are needed to obtain a consistently new approaches; and, of course, both groups
high level of integrity in uses of the various should collaborate in developing still better
program evaluation approaches. All approaches. Such an ongoing process of
legitimate approaches are enhanced when critical review and development is essential if
keyed to and assessed against professional the field of program evaluation is not to
standards for evaluations. In addition, stagnate, but instead is to provide vital
benefits from evaluations are enhanced when support for advancing programs and services.
they are subjected to independent review
through metaevaluations. Therefore, it is necessary, indeed essential,
that evaluators develop a repertoire of
As evidenced in this paper, the last half of the different program evaluation approaches so
20th century saw considerable development of they can selectively apply them individually or
program evaluation approaches. Many of the in combination to best advantage. Going out
approaches introduced in the 1960s and 1970s on the proverbial limb, but also based on
have been extensively refined and applied. the preceding analysis, the best approaches
The category of social agenda/advocacy seem to be decision/accountability,
models has emerged as a new and important utilization-based, client-centered, consumer-
part of the program evaluation cornucopia. oriented, case study, democratic deliberative,
There is among the approaches an constructivist, accreditation, and outcomes
increasingly balanced quest for rigor, monitoring. The worst bets, in my judgment,
relevance, and justice. Clearly, the are the politically controlled, public relations,
approaches are showing a strong orientation accountability (especially payment by results),
to stakeholder involvement and use of clarification hearings, and program theory-
multiple methods. based approaches. The rest fall somewhere in
the middle. While House and Howe’s (1998)
Recommendations democratic deliberative approach is new and
in their view utopian, it has many elements of
In spite of the progress described above, there a sound, effective evaluation approach and
is clearly a need for continuing efforts to merits study, further development, and trial.
develop and implement better approaches to
program evaluation. This is illustrated by
Best Approaches for 21st Century Evaluation 75
Bayless, D., & Massaro, G. (1992). Campbell, D. T., & Stanley, J. C. (1966).
Quality improvement in education today and Experimental and quasi-experimental designs
the future: Adapting W. Edwards Deming’s for research. Boston, MA: Houghton Mifflin.
quality improvement principles and methods
to education. Kalamazoo, MI: Center for Chen, H. (1990). Theory driven
Research on Educational Accountability and evaluations. Newbury Park, CA: Sage.
Teacher Evaluation.
Coffey, A., & Atkinson, P. (1996).
Becker, M. H. (Ed.) (1974). The health M a ki n g se n s e o f q ua l i t a t i v e da t a :
belief model and personal health behavior Complementary research strategies.
[Entire issue]. Health Education Monographs, Thousand Oaks, CA: Sage.
2, 324-473.
Cook, D. L. (1966). Program evaluation
Bhola, H. S. (1998). Program evaluation and review techniques, applications in
for program renewal: A study of the national education. U.S. Office of Education
literacy program in Namibia (NLPN). Studies Cooperative Monograph, 17 (OE-12024).
in Educational Evaluation, 24(4), 303-330.
Cronbach, L. J. (1963). Course
Bickman, L. (1990). Using program improvement through evaluation. Teachers
theory to describe and measure program quality. College Record, 64, 672-83.
In L. Bickman (Ed.), Advances in
Program Theory. New Directions in Program
Evaluation. San Francisco: Jossey-Bass.
Best Approaches for 21st Century Evaluation 77
Davis, H. R., & Salasin, S. E. (1975). The Fetterman, D. (1989). Ethnography: Step
utilization of evaluation. In E. L. Struening & by step. Applied Social Research Methods
M. Guttentag (Eds.), Handbook of evaluation Series, 17. Newbury Park, CA: Sage.
research, Vol. 1. Beverly Hills, CA: Sage.
Fetterman, D. (1994, February).
Debus, M. (1995). Methodological Empowerment evaluation. Evaluation
review: A handbook for excellence in focus Practice, 15(1).
group research. Washington, DC: Academy
for Educational Development. Fetterman, D., Shakeh, J. K., &
Wandersman, (Eds.). (1996). Empowerment
Deming, W. E. (1986). Out of the crisis. evaluation: Knowledge and tools for self-
Cambridge, MA: Center for Advanced assessment & accountability. Thousand Oaks,
Engineering Study, Massachusetts Institute of CA: Sage.
Technology.
Fisher, R .A. (1951). The design of
Denny, T. (1978, November). Story experiments (6th ed.) New York: Hafner.
telling and educational understanding.
Occasional Paper No. 12. Kalamazoo, MI: Flanagan, J. C. (1939). General
Evaluation Center, Western Michigan considerations in the selection of test items
University. and a short method of estimating the product-
moment coefficient from data at the tails of
Denzin, N. K., & Lincoln, Y. S. (Eds.). the distribution. Journal of Educational
(1994). Handbook of qualitative research. Psychology, 30, 674-80.
Thousand Oaks, CA: Sage.
Flexner, A. (1910). Medical education in
Ebel, R. L. (1965). Measuring the United States and Canada. Bethesda, MD:
educational achievement. Englewood Cliffs, Science and Health Publications.
NJ: Prentice-Hall.
Flinders, D. J., & Eisner, E. W. (1994,
Eisner, E. W. (1975, March). The December). Educational criticism as a form of
perceptive eye: Toward a reformation of qualitative inquiry. Research in the Teaching
educational evaluation. Invited address, of English, 28(4), 341-356.
Division B, Curriculum and Objectives,
78 Stufflebeam
Glass, G. V, & Maguire, T. O. (1968). Herman, J. L., Gearhart, M. G., & Baker,
Analysis of time-series quasi-experiments. E. L. (1993). Assessing writing portfolios:
(U.S. Office of Education Report No. 6- Issues in the validity and meaning of scores.
8329.) Boulder: Laboratory of Educational Educational Assessment, 1, 201-224.
Research, University of Colorado.
House, E. R. (1980). Evaluating with
Green, L. W., & Kreuter, M. W. (1991). validity. Beverly Hills, CA: Sage.
In He al t h pr om oti on planning: An
educational and environmental approach, 2nd House, E. R. (1983). Assumptions
Edition (pp. 22-30). Mountain View, CA: underlying evaluation models. In G. F.
Mayfield Publishing. Madaus, M. Scriven, & D. L. Stufflebeam
(Eds.), Evaluation models. Boston: Kluwer-
Greenbaum, T. L. (1993). The handbook Nijhoff.
of focus group research. New York:
Lexington Books. House, E. R. (1993). Professional
evaluation–Social impact and political
Guba, E. G. (1969). The failure of consequences. Newbury Park, CA: Sage.
e d u c a t i o n a l eva lu a t i o n . E d u c a t i o n a l
Technology, 9, 29-38. House, E. R., & Howe, K. R. (1998).
Deliberative democratic evaluation in
Guba, E. G. (1978). Toward a practice. Boulder: University of Colorado.
methodology of naturalistic inquiry in
evaluation. CSE Monograph Series in Janz, N. K., & Becker, M. H.. (1984).
Evaluation. Los Angeles: Center for the Study The health belief model: A decade later.
of Evaluation. Health Education Quarterly, 11, 1-47.
Stake, R. E. (1995). The art of case study Stufflebeam, D. L., Foley, W. J., Gephart,
research. Thousand Oaks, CA: Sage. W. J., Guba, E. G., Hammond, R. L.,
Merriman, H. O., & Provus, M. M. (1971).
Stake, R. E., & Easley, J. A., Jr. (Eds.) (1978). Educational evaluation and decision making.
Case studies in science education, Itasca, IL: Peacock.
1(2). NSF Project 5E-78-74. Urbana, IL:
CIRCE, University of Illinois College of Stufflebeam, D. L., & Shinkfield, A. J.
Education. (1985). Systematic evaluation. Boston:
Kluwer-Nijhoff.
Stake, R. E., & Gjerde, C. (1971). An
evaluation of TCITY: The Twin City Institute Suchman, E. A. (1967). Evaluative
for Talented Youth. Kalamazoo, MI: Western research. New York: Rus sell Sage
Michigan University Evaluation Center, Foundation.
Occasional Paper Series No. 1.
Swanson, D. B., Norman, R. N., & Linn,
Steinmetz, A. (1983). The discrepancy R. L. (1995 June/July). Performance-based
evaluation model. In G. F. Madaus, M. assessment: Lessons from the health
Scriven, & D. L. Stufflebeam (Eds.), professions. Educational Researcher, 24(5),
Evaluation models, pp. 79-100. Boston: 5-11.
Kluwer-Nijhoff.
Tennessee Board of Education. (1992).
Stillman, P. L., Haley, H. A., Regan, M. The master plan for Tennessee schools 1993.
B., Philbin, M. M., Smith, S. R., O’Donnell, Nashville: Author.
J., & Pohl, H. (1991). Positive effects of a
clinical performance assessment program. Thorndike, R. L. (1971). Educational
Academic Medicine, 66, 481-483. measurement (2nd ed.). Washington, DC:
American Council on Education.
Stufflebeam, D. L. (1966, June). A depth
study of the evaluation requirement. Theory Torrance, H. (1993). Combining
Into Practice, 5, 121-34. me as ur ement -d ri ve n i ns tr uc ti on wi th
a u t h e n t i c a s s e s s me n t : S o me i n i t i a l
Stufflebeam, D. L. (1967, June). The use observations of national assessment in
of and abuse of evaluation in Title III. Theory England and Wales. Educational Evaluation
Into Practice, 6, 126-33. and Policy Analysis, 15, 81-90.
METAEVALUATION CHECKLIST:
for Evaluating Evaluation Models against The Program Evaluation Standards
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
90 Stufflebeam
U7 Evaluation Impact
Maintain contact with audiences
Involve stakeholders throughout the evaluation
Encourage and support stakeholders’ use of the
findings
Show stakeholders how they might use the findings in
their work
Forecast and address potential uses of findings
Provide interim reports
Make sure that reports are open, frank, & concrete
Supplement written reports with ongoing oral
communication
Conduct feedback workshops to go over & apply
findings
Appendix 91
P2 Formal Agreements–Reach advance written 9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
agreements on:
P5 Complete and Fair Assessment
Evaluation purpose & questions
Assess & report the program’s strengths
Audiences
Assess & report the program’s weaknesses
Evaluation reports
Report on intended outcomes
Editing
Report on unintended outcomes
Release of reports
Give a thorough account of the evaluation’s process
Evaluation procedures & schedule
As appropriate, show how the program’s strengths
Confidentiality /anonymity of data could be used to overcome its weaknesses
Evaluation staff Have the draft report reviewed
Metaevaluation Appropriately address criticisms of the draft report
Evaluation resources
Acknowledge the final report’s limitations
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
Estimate & report the effects of the evaluation’s
limitations on the overall judgment of the program
P3 Rights of Human Subjects 9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
Appendix 93
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
Scoring the Evaluation Model for PROPRIETY
P7 Conflict of Interest
Identify potential conflicts of interest early in the Add the following:
evaluation No. of Excellent ratings (0-8) x4=
Provide written, contractual safeguards against No. of Very Good (0-8) x3=
identified conflicts of interest No. of Good (0-8) x2=
Engage multiple evaluators No. of Fair (0-8) x1=
Maintain evaluation records for independent review
As appropriate, engage independent parties to assess Total score: =
the evaluation for its susceptibility or corruption by
conflicts of interest
When appropriate, release evaluation procedures, data, Strength of the Model’s Provisions for PROPRIETY
& reports for public review
Contract with the funding authority rather than the 30 (93%) to 32: Excellent
funded program 22 (68%) to 29: Very Good
Have internal evaluators report directly to the chief 16 (50%) to 21: Good
executive officer 8 (25%) to 15: Fair
Report equitably to all right-to-know audiences 0 (0%) to 7: Poor
Engage uniquely qualified persons to participate in the
evaluation, even if they have a potential conflict of
interest; but take steps to counteract the conflict
(Total score) ÷ 32 = x 100 =
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
94 Stufflebeam
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor 9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
Appendix 95
9-10: Excellent; 7-8: V ery G ood; 5-6: G ood; 3-4: Fair; 0-2 Poor
96 Stufflebeam