Professional Documents
Culture Documents
www.annualreviews.org/aronline
Annu.Rev.Psychol.
1994,45.545-80
Copyright
1994byAnnualReviews
Inc.All rightsreserved
Thomas D. Cook
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.
William R. Shadish
Department of Psychology, MemphisState University, Memphis,Tennessee 38152
KEYWORDS:
causation, randomization,quasi-experiments,causal generalization, selection
modeling
CONTENTS
INTRODUCTION
..................................................................................................................... 546
CHANGES IN THEDOMINANT THEORY OF SOCIALEXPERIMENTATION .............. 547
TheoriesofCausation ........................................................................................ 547
Theoriesof Categorization ................................................................................ 548
RelabelingInternal Validity ............................................................................... 550
RelabelingExternal Validity .............................................................................. 552
Debates aboutPriorityamong ValidityTypes................................................... 553
Raisingthe Salienceof Treatment Implementation Issues................................. 554
TheEpistemiology of Ruling OutThreats .......................................................... 555
RANDOMIZEDEXPERIMENTS INFIELD SETTINGS ...................................................... 556
Mounting a Randomized Experiment ................................................................. 557
Maintaining a Randomized Experiment ............................................................. 560
QUASI-EXPERIMENTATION
................................................................................................ 561
DesignsEmphasizing Point-SpecificCausalHypotheses .................................. 562
DesignsEmphasizing Multivariate-Complex CausalPredictions..................... 565
Statistical Analysis of Datafrom the Basic NonequivalentControl Group
Design ........................................................................................................ 566
GENERALIZING
CAUSAL RELATIONSHIPS ..................................................................... 571
Random
Sampling .............................................................................................. 571
ProximalSimilarity ............................................................................................ 572
Probesfor RobustReplicationoverMultipleExperiments ................................ 572
Causal
Explanation ............................................................................................ 573
CONCLUSIONS
........................................................................................................................ 575
0066-4308/94/0201-0545505.00 545
Annual Reviews
www.annualreviews.org/aronline
INTRODUCTION
SOCIAL EXPERIMENTS547
Theories of Causation
Experimentationis predicated on the manipulability or activity theory of cau-
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
Theories of Categorization
The descriptive causal questions that experimenters seek to answer usually
specify particular cause and effect constructs or categories to whichgeneral-
ization is sought fromthe manipulationsor measuresactually used in a study.
Annual Reviews
www.annualreviews.org/aronline
the sameclass if they shared certain attributes that facilitate procreation. But
classification nowdependsmoreon similarity in DNA structure, so that plants
or animals with seeminglyincompatiblereproductive features but with similar
DNA,are placed in the same category. This historical change suggests that
seeking attributes that definitively determinecategory membershipis a chi-
mera. The attributes used for classification evolve with advanceselsewherein
a scholarly field, as the movefrom a Linnean to a more microbiological
classification systemillustrates in biology.
Categorizationis not just a logical process; it is also a complexpsychologi-
cal process (or set of processes) that wedo not yet fully understand. Ideally,
social experimentsrequire researchers to explicate a cause or an effect con-
struct, to identify its moreprototypical attributes, and then to construct an
intervention or outcomethat seemsto be an exemplarof that construct. Ex-
plicit here is a pattern-matching methodology(Campbell1966) based on prox-
imal similarity (Campbell1986). That is, the operation looks like what it
meant to represent. But the meaningsattached to similar events are often
context-dependent.Thus, if a studys independentvariable is a family income
guarantee of $20,000 per year, this would~probably meansomething quite
different if the guaranteewerefor three years versus twenty. In the first case
individuals would have to think much more about leaving a disliked job
because they do not have the samefinancial cushion as someonepromisedthe
guarantee for 20 years. Also, classifying two things together because they
share manysimilar attributes does not imply they are theoretically similar.
Dolphins and sharks have muchin common.They live exclusively in water,
swim,have fins, are streamlined, etc. But dolphinsbreathe air out of water and
give direct birth to their young--both features considered prototypical of
mammals.Hence, dolphins are currently classified as mammalsrather than
fish. But whyshould breathing out of water and giving birth to ones youngbe
considered prototypical attributes of mammals? Whyshould it not be the fit
betweenform and function that unites dolphins and fish? Anotherexampleof
this complexitycomesfrom social psychology. Receiving very little moneyto
do somethingone doesnt want to do seemsdifferent on the surface from being
Annual Reviews
www.annualreviews.org/aronline
SOCIAL EXPERIMENTS551
Gadenne 1976). Thus, labeling the cause and effect were smuggled in as
componentsof internal validity. Cronbach(1982) went even further, adding
that internal validity concernswhethera causal relationship occurs in particu-
lar identifiable classes of settings and with particular identifiable populations
of people--each an element of Campbellsexternal validity. To discuss valid-
ity matters exactly, Cronbachinvented a notational system. He used utos, to
refer to the instances or samplesof units u, treatments t, observations o, and
settings s achieved in a research project probinga causal hypothesis. He used
UTOS,1 to refer to the categories, populations,or universesthat these instances
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
1
We ignore here Cronbachs discussion of the difficulty of drawing inferences about populations
or settings.
Annual Reviews
www.annualreviews.org/aronline
552 COOK&SHADISH
SOCIAL EXPERIMENTS553
whetherthese standardsare as relevant for other groupsas they are for individ-
uals with a stake in the results of a social experiment.His ownexperience has
convinced him that manymembersof the policy-shaping communityare more
preparedthan scholars to tolerate uncertainty about whethera relationship is
causal, and they are willing to put greater emphasis than scholars do on
identifying the range of application of relationships that are manifestly still
provisional. Cronbachbelieves that Campbellspreference for internal over
external validity is parochial. Chelimsky(1987) has offered an empirical refu-
tation of Cronbachscharacterization of knowledgepreferences at the federal
level. But research on preferencesat the state or local levels is not yet avail-
able, and Cronbachs characterization maybe more appropriate there. Time
will tell.
Dunn(1982) has criticized the validity typology of Campbelland his col-
leagues becauseneither internal nor external validity refers to the importance
of the research questions being addressed in an experiment. He claims that
substantive importance can be explained just as systematically as the more
technical topics Campbellhas addressed, and he has proposed somespecific
threats that limit the importanceof research questions. Dunnis correct in
assumingthat there is no limit to the numberof possible validity types, as
Cook&Campbellpartially illustrated whenthey divided Campbellsearlier
internal validity into statistical conclusionvalidity (are the cause and effect
related?) and internal validity (is the relationship causal?), and whenthey
divided his external validity into construct validity (generalizing from the
research operations to cause and effect constructs) and external validity (gen-
eralizing to populations of persons and settings). Cook & Campbellalso
proposed new threats to validity under each of these headings, and their
validity typology is widely used (e.g. Wortman 1993). Thus, Campbelland his
colleagues agree with Dunn,only criticizing him on someparticulars about
threats to the importance of research questions (see especially Campbell
1982).
Thoughhis higher-order validity types cameunder attack, Campbellsspe-
cific list of threats to internal and external validity did not. Internal validity
Annual Reviews
www.annualreviews.org/aronline
554 COOK&SHADISH
tors are not willing to tolerate focusedinequities and they use whateverdiscre-
tionary resources they control to equalize what each group receives. This also
threatens to obscure true effects of a treatment. Finally, treatment diffusion
occurs when treatment providers or recipients lcam what other treatment
groups are doing and, impressed by the new practices, copy them. Onceagain,
this obscures planned treatment contrasts. These lbur threats have all been
observed in social experiments, and each threatens to bias treatment effect
estimates whencomparedto situations where the experimental units cannot
communicateabout the different treatments. Statisticians nowsubsumethem
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
556 COOK&SHADISH
been ruled out on the basis of statistical adjustments of the data that them-
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.
SOCIAL EXPERIMENTS557
find it moredifficult to justify their plans than evena decadeago. There are
still cases where a randomlycreated control group is impossible (e.g. when
studyingthe effects of a natural disaster), but the last decadehas witnessed
powerfulshift in scientific opinion towardrandomizedfield experimentsand
away from quasi-experiments or non-experiments. Indeed, Campbell &
Boruch(1975) regret the influence Campbellsearlier workhad in justifying
quasi-experiments where randomizedexperiments might have been possible.
In labor economics, manycritics nowadvocate randomizedexperiments over
quasi-experiments (Ashenfelter & Card 1985, Betsey et al 1985, LaLonde
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
ous monitoringof the selection process, and providing professionals with the
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.
chanceto discuss special cases they think should be excludedfrom the assign-
ment process.
Randomassignment can be complicated, especially when social programs
are being tested. In such cases, the organization providingbenefits could lose
incomeif all the places in the programare not filled. This meansa waiting list
mustbe kept; however,it is unrealistic to expect that all those on the waiting
list will suspendself-help activities in the hopethey will eventually enter the
program. Somewill join other programswith similar objectives or will have
their needs met more informally. Thus, even a randomlyformed waiting list
changes over time, often in different ways from the changes occurring in a
treatment group as it experiences attrition. Thoughno bias occurs if persons
from the waiting list are randomlyassigned to treatment and control status
whenprogramvacancies occur, it can be difficult to describe the population
that results whenpeople from the ever-changingwaiting list are addedto the
original treatment group. Bias only occurs with morecomplexstratification
designs where programneeds call tbr makingthe next treatment available to a
certain categoryof respondent(e.g. black malesunder25). If there are fewer
these in the pool than there are treatment conditions in the research design,
then programneeds dictate that this person enters the programeven if no
similar person can be assigned to the control status. If researchers belatedly
recruit someonewith these characteristics into a comparisongroup, this viti-
ates randomassignmentand introduces a possible selection bias. But if they
exclude these persons from the data analysis because there are no comparable
controls, this reducesexternal validity.
Outside the laboratory, randomassignmentoften involves aggregates larger
than individuals (e.g. schools, neighborhoods,cities, or cohorts of trainees).
This used to present problemsof data analysis since analysis at the individual
level fails to accountfor the effects of grouping, while analysis at the higher
level entails smaller samplesizes and reducedstatistical power.The advent of
hierarchical linear modeling (Bryk & Raudenbush1992) should help reduce
this problem, particularly when more user-friendly computer programsare
available. But such modelingcannot deal with the nonequivalencethat results
Annual Reviews
www.annualreviews.org/aronline
QUASI-EXPERIMENTATION
In quasi-experimentation,the slogan is that it is better to rule out validity
threats by design than by statistical adjustment. The basic design features for
doing this--pretests, pretest time series, nonequivalentcontrol groups, match-
ing, etc--were outlined by Campbell& Stanley (1963) and added to by Cook
& Campbell (1979). Thinking about quasi-experiments has since evolved
along three lines: 1. toward better understandingof designs that makepoint-
specific predictions; 2. towardpredictions about the multiple implications of a
Annual Reviews
www.annualreviews.org/aronline
SOCIAL EXPERIMENTS563
564 COOK&SHADISH
SOCIAL EXPERIMENTS565
increased use of this design in the medical context from whichtheir examples
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.
come.
566 COOK&SHADISH
quite close to the correct answer. But few social researchers whostudy the
matter closely nowbelieve they can identify the completely specified model,
and the 1980sand early 1990s were characterized by growingdisillusionment
about developing even adequate approximations. Nearly any observed effect
might be either nullified or even reversed in the population, depending, for
example, on the variables omitted from the model (whose importance one
could never knowfor certain) or on the pattern of unreliability of measurement
across the variablesin the analysis. Asa result, it proveddifficult to anticipate
with certainty the direction of bias resulting from the simpler types of non-
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
equivalent control group studies. But there has been muchwork on data
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.
568 COOK&SHADISH
SUGGESTIONS FROM LABOR ECONOMISTS At the same time, some labor econ-
omists developedan alternative tradition that soughtto create statistical models
that do not require full modelspecification and that wouldapply to all nonex-
perimental research whererespondentsare divided into levels on someindepen-
dent variable. The best developedof this class of analyses is the instrumental
variable approach associated with Heckman(Heckman1980, Heckmanet al
1987, Heckman& Hotz 1989). The probability of group membershipis first
modeledin a selection equation, with the predictors being a subset of variables
that are presumablyrelated to selection into the different treatment conditions.
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
correct selection modelhas been identified. The results of this modelare then
used as an instrumental variable in the main analysis to estimate treatment
effects by adjusting for the effects of selection. Conceptuallyand analytically,
this approachclosely resemblesthe regression discontinuity design because it
is based on full knowledgeand perfect measurementof the selection model.
However,in approacheslike Heckman s, the selection modelis not fully known.
It is estimated on the basis of fallible observedmeasuresthat serve to locate
moregeneral constructs. This limitation mayexplain whyempirical probes of
Heckrnanstheory have not produced confirmatory results. The most compel-
ling critiques comefrom studies wheretreatment-effect estimates derived from
Heckmansselection modelingprocedures are directly comparedto those from
randomizedexperiments--presumedto be the gold standard against which the
success of statistical adjustmentproceduresshould be assessed. In a reanalysis
of annual earnings data from a randomizedexperiment on job training, both
LaLonde(1986, LaLonde& Maynard1987) and Fraker & Maynard(1987)
shownthat (a) econometric adjustments from Heckmanswork provide many
different estimates of the training programs effects; and (b) none of these
estimates closely coincides with the estimate from randomizedcontrols. These
reanalyses have led some labor economists (e.g. Ashenfelter & Card 1985,
Betsey et al 1985)to suggest that unbiasedestimates can only be justified from
randomized experiments, thereby undermining Heckmansdecade-long work
on adjustmentsfor selection bias. Otherlabor economistsdo not go quite so far
and prefer what they call natural experiments--what Campbellcalls quasi-
experiments--over the non-experimentaldata labor economistspreviously used
so readily (e.g. Card 1990, Katz &Krueger1992, Meyer& Katz 1990; and see
especially Meyer1990for an examplethat reinvents regression-discontinuity).
In Heckman &Hotzs (1989, Heckman et al 1987) rejoinder to their critics,
they used the same job training data as LaLondeto argue that a particular
selection model--based on two separate pretreatment measures of annual in-
come--metcertain specification tests and generated an average causal effect
no different from the estimate provided by the randomizedexperiment. Unfor-
tunately, their demonstrationis not very convincing(Cook199 la, Coyleet al
Annual Reviews
www.annualreviews.org/aronline
Heckman &Hotz found to be better for their one population had been widely
advocated for at least a decade as a superior quasi-experimental design be-
cause the two pretest wavesallow estimation of the preintervention selection-
maturation differences between groups (Cook &Campbell 1979). Moffitts
(1991) rediscovery of the double-pretest design should be seen in the same
historical light that exemplifies the quasi-experimentalslogan: better to con-
trol through design than measurement and statistical adjustment. To conclude,
empiricalresults indicate that selection modelsusually producedifferent effect
estimates than do randomizedexperiments, and if researchers using such mod-
els were close to the mark in a particular project, they wouldnot knowthis
unless there were also a randomizedexperiment on the same topic or some
better quasi-experiment. But then the need for less interpretable selection
modelingwouldnot exist!
570 COOK&SHADISH
of units and settings and to target cause and effect constructs; and (b) can
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.
extrapolatedto times, persons, settings, causes, and effects that are manifestly
different from those sampledin the existing research. Campbells(1957) work
on external validity describes a similar but less articulated aspiration. But such
generalization is not easy. For compellinglogistical reasons, nearly all experi-
ments are conducted with local samples of convenience in places of conve-
nience and involve only one or two operationalizations of a treatment, though
there maybe more operationalizations of an outcome. Howare Cronbachs
two types of generalization to be promotedif most social experimentsare so
contextually specific?
Random Sampling
Proximal Similarity
A collection of experimentson the sameor related topics can allow even better
probes of the causal contingenciesinfluencing treatment effectiveness. This is
Annual Reviews
www.annualreviews.org/aronline
SOCIAL EXPERIMENTS573
Causal Explanation
SOCIAL EXPERIMENTS575
CONCLUSIONS
and history are often relevant, as are Cook & Campbells (1979) threats
treatment diffusion, resentful demoralization, compensatory rivalry, and com-
pensatory equalization. Statistical research on selection modeling has been
dominant in the past, and rightly so. But it should not crowd out research into
these other relevant threats to valid causal inference.
Work on the generalization of causal relationships has historically been less
prevalent than work on establishing causal relationships. There is ongoing
work on the generalization of causal inferences in two modes, One is meta-an-
alytic andemphasizes
the empiricalrobustnessof a particular causal connec-
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
constructs.Thesecondis morecausal-explanatory,
andit emphasizes identify-
ing the micro-mediatingprocessesthat causally connecta treatmentto an
outcome,usuallythrougha processof theoreticalspecification,measurement,
anddataanalysisratherthanthrough the sampling
strategythat characterizes
meta-analysis.
Causalgeneralizationis nowmoreof an issue in social experi-
mentsthanit wasfifteen yearsago (Cook1991a,1993;Friedlander &Guerin
1992).
Literature Cited
Aiken LS, West SG. 1991. Multiple Regres- and theory in impact evaluations. Prof.
sion: Testing and Interpreting Interac- Psychol. Res. Pract. 8:411-34
tions. NewburyPark, CA: Sage Boruch RF, McSweeneyAJ, Soderstrom EJ.
Ashenfelter O, Card D. 1985. Using the longi- 1978. Randomizedfield experiments for
tudinal structure of earnings to estimate the program planning developmentand evalua-
effect of training programs. Rev. Econ. tion. Eval. Q. 2:655-95
Stat. 67:648-60 Boruch RF, WothkeW, eds. 1985. Randomiza-
Barlow DH, Hersen M, eds. 1984. Single Case tion and Field Experimentation: NewDi-
Experimental Designs: Strategies for rections Jbr ProgramEvaluation, Vol. 28.
Studying Behavior Change. New York: San Francisco: Jossey-Bass
Pergamon. 2nd ed. Box GEP, Jenkins GM. 1970. Time-series
Berman JS, Norton NC. 1985. Does profes- Analysis: Forecasting and Control. San
sional training makea therapist moreeffec- Francisco: Holden-Day
tive? Psychol. Bull 98:401-7 BrownR. 1986. Social Psychology: The Sec-
Betsey CL, Hollister RE, Papageorgiou MR, ond Edition. NewYork: Free Press
eds. 1985. Youth Employed and Training BrunswikE. 1955. Perception and the Repre-
Programs: The YEDPAYears. Washing- sentative Design of Psychological Experi-
ton, DC:Natl. Acad. Press ments. Berkeley: Univ. Calif. Press. 2nd
BickmanL. 1987. Functions of program the- ed.
try. In Using ProgramTheory in Evalua- Bryk AS, RaudenbushSW. 1992. Hierarchical
tion: NewDirections for ProgramEvalua- Linear Models: Applications and Data
tion, ed. L Bickman, 33:5-18. San Fran- Analysis Methods. Newbury Park, CA:
cisco: Jossey-Bass Sage
Bickman L, ed. 1990. Advances in Program Cain GG.1975. Regression and selection mod-
Theory: New Directions for Program els to improve nonexpetimental co~npari-
Evaluation, Vol. 47. San Francisco: sons. In Evaluation and Experiment: Some
Jossey-Bass Critical Issues in Assessing Social Pro-
Boruch RF. 1975. On commoncontentions grams, ed. CA Bennett, AA Lumsdaine,
about randomizedexperiments. In Experi- pp. 297-317. NewYork: Academic
mental Tests of Public Policy, ed. RF Campbell DT. 1957. Factors relevant to the
Boruch, HWRiecken, pp. 107~12. Boul- validity of experimentsin social settings.
der, CO: Westview Psychol. BulL 54:297-312
Boruch RF, GomezH. 1977. Sensitivity, bias Campbell DT. 1966. Pattern matching as an
Annual Reviews
www.annualreviews.org/aronline
Campbell DT. 1986. Relabeling internal and ing About Them: NewDirections for Pro-
external validity for applied social scien- gram Evaluation, ed. L Sechrest, AG
tists. See Trochim1986, pp. 67-77 Scott, 57:39-82. San Francisco: Jossey-
Campbell DT. 1993. Quasi-experimental re- Bass
search designs in compensatoryeducation. Cook TD, Anson A, Walchli S. 1993. From
In Evaluating Intervention Strategies for causal description to causal explanation:
Children and Youth at Risk, ed. EMScott. improving three already good evaluations
Washington, DC: US Govt. Print. Office. of adolescent health programs. In Promot-
In press ing the Health of Adolescents: NewDirec-
Campbell DT, Boruch RF. 1975. Making the tions for the Twenty-First Century, ed.
case for randomizedassignment to treat- SGMillstein, ACPetersen, EO Nightin-
ments by considering the alternatives: six gale, pp. 339-74. NewYork: Oxford Univ.
ways in which quasi-experimental evalua- Press
tions tend to underestimateeffects. In Eval- Cook TD, Campbell DT. 1979. Quasi-Experi-
uation and Experience: SomeCritical Is- mentation: Design and Analysis Issues for
sues in Assessing Social Programs,ed. CA Field Settings. Chicago: Rand-McNally
Bennett, AA Lumsdaine, pp. 195-296. CookTD, Campbell DT. 1986. The causal as-
NewYork: Academic sumptions of quasi-experimental practice.
CampbellDT, Stanley JC. 1963. Experimental Synthese 68:141-80
and Quasi-Experimental Designs for Re- Cook TD, Campbell DT, Perrachio L. 1990.
search. Chicago: Rand-McNally Quasiexperimentation. In Handbookof In-
Card D. 1990. The impactof the Mariel boatlift dustrial and Organizational Psychology,
on the Miami labor market. Ind. Labor ed. MDDunnette, LMHough, pp. 491-
Relat. 43:245-57 576. Palo Alto, CA: Consult. Psychol.
Chelimsky E. 1987. The politics of program Press. 2nded.
evaluation. Soc. Sci. Mod.Soc. 25(1):24- Cook TD, Cooper H, Cordray D, Hartmann H,
32 HedgesL, Light R, Louis T, Mosteller F,
Chen HT, Rossi PH. 1983. Evaluating with eds. 1992. Meta-Analysis for Explanation:
sense: the theory-driven approach. Eval. A Casebook. New York: Russell Sage
Rev. 7:283-302 Found.
CohenP, CohenJ, Teresi J, Marchi M, Velez Coyle SL, Boruch RF, Turner CF, eds. 1991.
NC. 1990. Problems in the measurementof Evaluating AIDS Prevention Programs:
latent variables in structural equations ExpandedEdition. Washington, DC: Natl.
causal models. Appl. Psychol. Meas. Acad. Press
14(2): 183-96 Cronbach LJ. 1982. Designing Evaluations of
Collingwood RG. 1940. An Essay on Meta- Educational and Social Programs. San
physics. Oxford: Clarendon Francisco: Jossey-Ba~s
Connell DB, Turner RT, Mason EF. 1985. Cronbach LJ, SnowRE. 1976. Aptitudes and
Summary of findings of the school health Instructional Methods. NewYork: Irving-
education evaluation: health promotionef- ton
fectiveness, implementation,and costs. J. Devine E. 1992. Effects of psychoeducational
School Health 85:316-17 care with adult surgical patients: a theory-
CookTD. 1985. Post-positivist critical multi- .probing meta-analysisof intervention stud-
pllsm. In Social Science and Social Policy, ~es. See Cooket al 1992, pp. 35-84.
ed. RL Shotland, MMMark, pp. 21~52. Dunn WN. 1982. Reforms as arguments.
Beverly Hills, CA:Sage Knowl.Creation Diffus. Util. 3(3):293-326
CookTD. 1991a. Clarifying the warrant for Fraker T, MaynardR. 1987. Evaluating the ad-
Annual Reviews
www.annualreviews.org/aronline
equacy of comparison group designs for HollandPW.1986. Statistics and causal infer-
evaluation of employment-related pro- ence (with discussion). J. Am. Stat. Assoc.
grams. J. Hum.Res. 22:194-227 81:945-70
Freedman DA. 1987. A rejoinder on models, Holland PW, Rubin DB. 1988. Causal infer-
metaphors, and fables. J. Educ. Star. ence in retrospective studies. Eval. Rev.
12:206-23 12:203-31
Friedlander D, Gueron JM. 1992. Are high- Katz LF, KruegerAB. 1992. The effect of min-
cost services moreeffective than low-cost imumwageon the fast-food industry. Ind.
services? In Evaluating Welfare and Labor Relat. 46:6-21
Training Programs,ed. CFManski, I Gar- KershawD, Fair J. 1976. The NewJersey In-
finkel, pp. 143-98. Cambridge: Harvard come-Maintenance Experiment. Vol. 1 :
Univ. Press Operations, Surveys and Administration.
Gadenne V. 1976. Die Gultigkeit psy- New York: Academic
chologischer Untersuchungen. Stuttgart, Kish L. 1987. Statistical Design for Research.
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
comer. San Francisco: Jossey-Bass. In bias in observation studies for causal ef-
press fects. J. Educ. Psychol. 66:688-701
Mark MM.1986. Validity typologies and the Rubin DB. 1977. Assignment to treatment
logic and practice of quasi-experimenta- group on the basis of a covariate. J. Educ.
tion. See Trochim1986, pp. 47-66 Stat. 2:1 26
Medin DL. 1989. Concepts and conceptual Rubin DB. 1986. Which ifs have causal an-
structure. Am. Psychol. 44:1469-81 swers? J. Am. Stat. Assoc. 81:961-62
MeehlPE. 1978. Theoretical risks and tabular Sechrest L, West SG, Phillips MA,Redner R,
asterisks: Sir Karl, Sir Ronald,and the slow Yeaton W. 1979. Someneglected problems
progress of soft psychology. J. Consult. in evaluation research: strength and integ-
Clin. Psychol. 46:806-34 rity of treatments. In Evaluation Studies
Meyer BD. 1990. Unemployment insurance Review Annual, ed. L Sechrest, SGWest,
and unemploymentspells. Econometrica MAPhillips, R Redner, WYeaton, 4:15-
58:757-82 35. Beverly Hills, CA:Sage
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
MeyerBD, Katz LF. 1990. The impact of the Shadish WR.1989. Critical multiplism: a re-
potential duration of unemploymentbene- search strategy and its attendent tactics. In
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.
WortmanPM. 1993. Judging research quality. ZimmermannHJ, Zadeh LA, Gaines BR, eds.
In Handbookof Research Synthesis, ed. 1984. Fuzzy Sets and Decision Analysis.
HNICooper, LV Hedges. NewYork: Rus- NewYork: Elsevier
sell Sage Foundation.In press
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.
Annu. Rev. Psychol. 1994.45:545-580. Downloaded from arjournals.annualreviews.org
by UNIVERSITY OF IDAHO LIBRARY on 10/04/05. For personal use only.