You are on page 1of 13

514450

research-article2013
PPSXXX10.1177/1745691613514450Stroebe and StrackAlleged Crisis and Illusion of Exact Replication

Perspectives on Psychological Science

The Alleged Crisis and the Illusion of Exact 2014, Vol 9(1) 59­–71
© The Author(s) 2013
Reprints and permissions:
Replication sagepub.com/journalsPermissions.nav
DOI: 10.1177/1745691613514450
pps.sagepub.com

Wolfgang Stroebe1,2 and Fritz Strack3


1
Department of Psychology, Utrecht University, the Netherlands; 2Department of Social and
Organizational Psychology, University of Groningen, the Netherlands; and 3Department of
Psychology, University of Würzburg, Germany

Abstract
There has been increasing criticism of the way psychologists conduct and analyze studies. These critiques as well
as failures to replicate several high-profile studies have been used as justification to proclaim a “replication crisis” in
psychology. Psychologists are encouraged to conduct more “exact” replications of published studies to assess the
reproducibility of psychological research. This article argues that the alleged “crisis of replicability” is primarily due
to an epistemological misunderstanding that emphasizes the phenomenon instead of its underlying mechanisms. As a
consequence, a replicated phenomenon may not serve as a rigorous test of a theoretical hypothesis because identical
operationalizations of variables in studies conducted at different times and with different subject populations might
test different theoretical constructs. Therefore, we propose that for meaningful replications, attempts at reinstating the
original circumstances are not sufficient. Instead, replicators must ascertain that conditions are realized that reflect the
theoretical variable(s) manipulated (and/or measured) in the original study.

Keywords
replication, replicability crisis, null findings, scientific fraud, priming, epistemology, critical rationalism

At a time when social psychologists believed that they openscienceframework.org). In contrast to the prevalent
had every reason to be proud of their discipline came the sentiment, we will argue that the claim of a replicability
shattering news that Diederik Stapel, a prominent crisis is greatly exaggerated and that the hope that such
researcher in social psychology, had committed scientific a crisis (if it ever existed) could be solved by increasing
fraud on a major scale. Social psychologists had hardly the number of exact replications is misplaced.
recovered from this shock when two more colleagues
were accused of fraud and resigned from their positions.
Is the Claim of a Replicability Crisis
These events were particularly damaging, because they
coincided with the start of a discussion of trust in psycho- Exaggerated?
logical data (see Special Section on Replicability in There seems to have been two sets of events that fueled
Psychological Science: A Crisis of Confidence? Perspectives the crisis perception. First, there have been claims that
on Psychological Science, 2012). Even though this discus- some psychological researchers engage in “questionable
sion focused on methodological issues that were unre- research practices” that result in “false positive” findings
lated to fraud, this distinction was not always maintained (e.g., Bakker, van Dijk, & Wicherts, 2012; John, Loewenstein,
by the popular press. According to the introduction to & Prelec, 2012; LeBel et al., 2013; Simmons, Nelson, &
the special section, there is “currently a crisis of confi- Simonsohn, 2011; Vul, Harris, Winkielman, & Pashler,
dence in psychological science reflecting an unprece- 2009). Simmons et al. (2011) reported simulations that
dented level of doubt among practitioners about the showed that a number of research practices (e.g., stopping
reliability of research findings in the field” (Pashler & data collection on the basis of interim data analysis;
Wagenmakers, 2012, p. 528). Nosek and colleagues
started the “Reproducibility Project,” a large-scale, collab-
Corresponding Author:
orative effort to estimate the reproducibility of psycho- Wolfgang Stroebe, Department of Psychology, Utrecht University, P.O.
logical science, which involves replicating all studies Box 80140, 3508 TC, Utrecht, the Netherlands
published in three psychology journals in 2008 (http:// E-mail: w.stroebe@uu.nl

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


60 Stroebe and Strack

dropping experimental conditions from published reports) memory (e.g., Higgins, Rholes, & Jones, 1977; Meyer &
can result in an erroneous rejection of the null hypothesis. Schvaneveldt, 1971; Tulving & Schacter, 1990). In this tra-
To be sure, methodological discussions are important for dition, its impact on behavior was studied in a minor
any discipline, and both fraud and dubious research pro- subarea of research, whereas most social psychological
cedures are damaging to the image of any field and poten- priming studies investigated the impact of subliminal or
tially undermine confidence in the validity of social supraliminal primes on judgment. A meta-analysis of
psychological research findings. Thus far, however, no studies that investigated how trait primes influence
solid data exist on the prevalence of such research prac- impression formation identified 47 articles based on
tices in either social or any other area of psychology. 6,833 participants and found overall effects to be statisti-
In fact, the discipline still needs to reach an agreement cally highly significant (DeCoster & Claypool, 2004).
about the conditions under which these practices are
unacceptable.
Are Exact Replications the Answer?
Second, there have been a number of failures to repli-
cate high-profile experiments on social priming (Doyen, The claim of a replicability crisis in psychology is based
Klein, Pichon, & Cleeremans, 2012; Pashler, Coburn, & on a major misunderstanding. Particularly, the myopic
Harris, 2012; Shanks et al., 2013). For example, Doyen focus on “exact” replications neglects basic epistemologi-
et al. (2012) reported an exact replication of a study by cal principles. Exact replications are replications of an
Bargh, Chen, and Burrows (1996; Experiment 2a, 2b) that experiment that operationalize both the independent and
failed to reproduce the original results. Bargh et al. had the dependent variable in exactly the same way as the
repeatedly found that students who had been primed original study. (In contrast, conceptual replications try to
with words that triggered the stereotype of elderly walked operationalize the underlying theoretical variables using
more slowly down a corridor than students primed with different manipulations and/or different measures.)
words unrelated to the elderly stereotype. The Doyen In evaluating the usefulness of exact replications, one
et al. (2012) failure was soon followed by a report by has to distinguish between applied and basic research. A
Shanks et al. (2013) of a series of nine studies that failed scientist who wants to establish the efficiency of a spe-
to replicate the findings of another iconic experiment, the cific treatment or intervention is well advised to repeat-
“professor study” of Dijksterhuis and van Knippenberg edly apply exactly the same procedure. This is particularly
(1998). These authors had found that participants who relevant for clinical trials where a lack of reliability may
were primed with a category of persons who are consid- have fatal consequences. However, matters are different
ered highly intelligent (e.g., professors) performed better in basic research where empirical outcomes are mean-
on a task of trivial pursuit than did participants primed ingful only with respect to the theory being tested. In the
with a category of persons who are considered less intel- postbehaviorist era, psychological theories are based on
ligent (e.g., hooligans). These replication failures were internal mechanisms such that replications must be
the more astounding because of earlier publications of directed at the internal antecedents of such theories.
successful replications of both of these findings (summa- Although reproducibility of scientific findings is one of
rized in the Appendix). science’s defining features, the ultimate issue is the extent
Do these failures to replicate amount to a crisis of rep- to which a theory has undergone strict tests and has been
licability, as Pashler and Harris (2012) claimed in their supported by empirical findings. It would be a mistake to
contribution to the special section of Perspectives on assume that estimates of the reproducibility of empirical
Psychological Science on replications? We would argue findings are the same as estimates of the validity of a
that such a conclusion is premature. Failures to replicate specific theory. A finding may be eminently reproducible
are puzzling, but in social psychology, as in most sci- and yet constitute a poor test of a theory.
ences, empirical findings cannot always be replicated The fact that good experimental research is typically
(this was one of the reasons for the development of conducted with the aim to test theories throws a different
meta-analytic methods). It is therefore surprising that the light on the discussion of replicability. We will therefore
failures to replicate some social priming studies received briefly discuss the notion of theory and what researchers
such disproportionate attention. Furthermore, one must do when they test hypotheses derived from a theory.
wonder whether the response by Kahneman, who in a Theories consist of a set of abstract constructs and of
widely circulated letter to “priming researchers” warned hypotheses about the relationship between these con-
that there was a “train wreck looming” because of a structs. In conducting experiments to test such a hypoth-
“storm of doubt about the robustness of priming results,” esis, we develop empirical operationalizations to translate
was really justified given the state of the published litera- these constructs into variables that can be manipulated or
ture where priming is an entirely undisputed method that measured. Because most theoretical constructs are fairly
is widely used to test hypotheses about associative abstract and can be operationalized in multiple ways,

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


Alleged Crisis and Illusion of Exact Replication 61

researchers can never be sure whether they have chosen have failed to replicate the original finding? Let us address
a realistic or even optimal operationalization of a given each of these issues in the following sections.
construct. Because researchers can never be certain that
they properly operationalized the theoretical constructs A comparison of direct and
they are assessing and that they were successful in con-
trolling for all third variables that might have been
conceptual replications
responsible for their findings, a theory can never be The illusion of exact replication.  If one accepts that
proven to be “true” (Popper, 1959). However, as we will the true purpose of replications is a (repeated) test of a
discuss later, this same feature may also create problems theoretical hypothesis rather than an assessment of the
for falsifying a theory. reliability of a particular experimental procedure, a major
This reservation is less relevant for studies that are con- problem of exact replications becomes apparent: Repeat-
ducted with the aim of merely testing the efficacy of a ing a specific operationalization of a theoretical construct
specific treatment. For example, an exact replication of a at a different point in time and/or with a different popu-
study to demonstrate the efficacy of a drug or psychologi- lation of participants might not reflect the same theoreti-
cal intervention is informative, because with drugs or cal construct that the same procedure operationalized in
interventions (i.e., treatments), the main issue is that they the original study. This is less of a problem in studies
work and that they have no negative side effects. Although where both the independent and the dependent vari-
one might want to vary the subject population receiving ables are not culturally or socially mediated, for example,
the treatment to have a broader basis for one’s evaluation when the size or the brightness of stimuli is assessed by
of its efficacy, it would make no sense to use a different participants (who work in isolation) or when weight per-
drug or alter the intervention. After all, the question to be ception is related to the size of an object. Obviously, in
addressed is whether the original treatment was effective. such studies, one does not have to worry whether these
Exact replications are also important when studies variables have been properly realized, and exact replica-
produce findings that are unexpected and only loosely tions should yield the findings of previous studies. How-
connected to a theoretical framework. Thus, the fact that ever, in social psychological studies, the faithful
priming individuals with the stereotype of the elderly replication of an operationalization of a theoretical con-
resulted in a reduction of walking speed was a finding struct at a different point in time and with a different
that was unexpected. Furthermore, even though it was subject population may be dissociated from the theoreti-
consistent with existing theoretical knowledge, there was cal construct of the original study.
no consensus about the processes that mediate the Let us illustrate this point with some classic social psy-
impact of the prime on walking speed. It was therefore chological experiments. In their study of the effect of the
important that Bargh et al. (1996) published an exact rep- severity of initiation to a group on liking for that group,
lication of their experiment in the same paper. Similarly, Aronson and Mills (1959) operationalized the severe ini-
Dijksterhuis and van Knippenberg (1998) conducted four tiation by having female participants read aloud “12
studies in which they replicated the priming effects. obscene words, e.g., fuck, cock, and screw” as well “two
Three of these studies contained conditions that were vivid descriptions of sexual activity from contemporary
exact replications. In the fourth, they primed “intelligent” novels” (p. 178). If repeated with today’s female students,
directly with the trait rather than indirectly with the word this manipulation might trigger amusement rather than
“professor.” embarrassment. Similarly, it is likely that a researcher
Because it is standard practice in publications of new who tried to induce fear about toothbrushing in high
effects, especially of effects that are surprising, to publish school students by telling them that improper care of
one or two exact replications, it is clearly more condu- their teeth might result in “cancer, paralysis or other sec-
cive to the advancement of psychological knowledge to ondary diseases” ( Janis & Feshbach, 1953) might arouse
conduct conceptual replications rather than attempting disbelief rather than fear.
further duplications of the original study, unless plausible
conceptual replications have failed. Given that both Why outcomes of exact replications are often unin-
research time and money are scarce resources, the large- formative.  Let us now return to the failed replications
scale attempts at duplicating previous studies seem to us of behavior priming studies described earlier. To discuss
misguided (http://openscienceframework.org). these failures, readers need to be reminded of the theo-
The main criticism of conceptual replications is that retical rationale behind these studies. According to cur-
they are less informative than exact replications (e.g., rent theorizing (e.g., Loersch & Payne, 2011; Schröder &
Pashler & Harris, 2012). This raises two questions, namely Thagard, 2013; Strack & Deutsch, 2004), the priming acti-
(a) what information do we gain from a successful repli- vates concepts that spread activation to other concepts
cation of the original finding after faithfully repeating the that are episodically or semantically linked (e.g., “elderly”
original experiment, and (b) what do we learn when we → “walking slowly”; “professors” → “intelligent”). Then,

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


62 Stroebe and Strack

priming may affect behavior in a controlled fashion if a these findings are reported in most social psychology text-
behavioral decision is based on concepts whose activa- books and are therefore widely known among student
tion potential has been increased. Alternatively, these participants could have affected the results. Another likely
concepts may be directly linked with behavioral sche- reason for their failure could be their selection of knowl-
mata. Of course, such subtle influences depend on other edge items. If the effect of professor priming is motiva-
conditions, such as factors that facilitate or constrain the tional, then the knowledge items have to be selected in a
execution of such behaviors and the awareness of the way as to best reflect motivation effects. This is unlikely to
priming episode, which may cause an active correction be the case for questions that are so difficult that students
(see Strack & Hannover, 1996; Strack, Schwarz, Bless, are unlikely to know the answer or so easy that practically
Kübler, & Wänke, 1993). Thus, subtly priming or even everybody can answer them easily.
subliminal priming procedures may be more effective To be sure, these possibilities are speculative, but they
than blatantly directing people’s attention toward some illustrate that nonreplications are uninformative unless
content. one can demonstrate that the theoretically relevant con-
The theoretical variable “activation of concept X” is ditions were met. Faithfully replicating the original condi-
manipulated by exposing the person to some element of tions of an experiment does not guarantee that one
“X” or an element that is closely associated. Although the addresses the same theoretical construct as in the original
experimenter has control over the prime, this is not true study. In order to check if a given operationalization is
for the concept it activates. People differ in their beliefs successful in manipulating the intended theoretical con-
about the elderly. Furthermore, different beliefs are dif- struct, manipulation checks are necessary that are inde-
ferently accessible in different contexts (e.g., an old ath- pendent of the dependent variable. Therefore, instead of
lete vs. an old professor). Therefore, the same prime can conducting nine “exact” replications, Shanks et al. (2013)
activate different concepts in different people and/or would have been well advised to conduct at least some
under different conditions. empirical tests of whether their manipulation succeeded
It is crucial for replicating behavior priming studies in increasing the cognitive accessibility of the theoreti-
that the prime is successful in increasing the accessibility cally relevant concept (e.g., using a lexical decision task).
of the cognitive representation that is assumed to induce Such an approach would have provided the information
the behavior. In priming the stereotype of the elderly, the that exact replications are lacking.
theoretically targeted cognitive representation is that of
“walking slowly.” It is therefore possible that the priming The effective use of replications. Because experi-
procedure used in the Doyen et al. (2012) study failed in ments are typically conducted with the aim of testing a
this respect, even though Doyen et al. faithfully repli- theoretical hypothesis, the important question is not
cated the priming procedure of Bargh et al. (1996). First, whether the original finding can be duplicated but
the French translation of the words Bargh et al. used in whether it constituted a rigorous test of the postulated
their scrambled sentence test might have been associated mechanism. To take frustration–aggression theory as an
with different meanings in Belgium. Second, it is also example, any experimental test of that theory is based on
possible that the concept of “walking slowly” is not a the assumption that the experimental manipulation is a
central part of the stereotype of elderly in Belgium some valid operationalization of the theoretical construct “frus-
20 years later. With life expectancies increasing nearly tration” and that the measure of aggression is a valid
every year and people staying active to a much higher operationalization of the theoretical construct “aggres-
age (Stroebe, 2011), this construct might no longer form sion.” If we replicate the findings of an earlier frustration–
part of the elderly stereotype even in New York. Although aggression experiment by exactly repeating the procedure
Doyen et al. (2012) tested whether they succeeded in of that experiment, we have demonstrated that the study
priming the stereotype of elderly people, they failed to is reproducible, but we have only marginally increased
assess whether this prime increased the accessibility of our trust in the validity of the underlying theory. Con-
the cognitive representation of “walking slowly.”1 versely, if a conceptual replication using a different oper-
This criticism also applies to the failed replications of ationalization of both constructs had succeeded in
the Dijksterhuis and van Knippenberg (1998) “professor supporting the theoretical hypothesis, our trust in the
studies” reported by Shanks et al. (2013). Although it validity of the underlying theory would have been
seems likely that professors are considered more intelli- strengthened. “With every difference that is introduced
gent than soccer hooligans by the students who served as the confirmatory power of the replication increases,”
subjects in Shank’s research, it is still possible that even if because we have shown that the phenomenon does not
the participants considered professors as more intelligent hinge on a particular operationalization but “generalizes
than hooligans, the priming manipulation might have to a larger area of application” (Schmidt, 2009, p. 93).
failed to increase the cognitive representation of the con- An even more effective strategy to increase our trust
cept “intelligence.” It is even possible that the fact that in a theory is to test it using completely different

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


Alleged Crisis and Illusion of Exact Replication 63

manipulations. Let us illustrate this with an example from detrimental to research progress” (p. 1). Recognizing this
the social psychology of persuasion. According to dual problem, a group of social psychology graduate students
process theories of persuasion (i.e., the elaboration likeli- at the University of North Carolina at Chapel Hill in 1970
hood model of Petty and Cacioppo [1986] or the heuristic- had started the journal Representative Research in Social
systematic model of Chaiken [1980]), the impact of the Psychology, which gives special consideration to null
quality of the arguments contained in a communication is findings and replications (Chamberlin, 2000). Since 2002,
greater the more thoughtfully and deeply the communica- there has also been the free-access Journal of Articles in
tion is processed by a recipient. This prediction has been Support of the Null Hypothesis. Another free-access jour-
supported with very different manipulations of the pro- nal, PLoS One, also publishes articles reporting null
cessing depth, such as distraction (Petty, Wells, & Brock, results. Most recently, Pashler and colleagues created the
1976), personal relevance (Petty, Cacioppo, & Goldman, Web site www.psychfiledrawer.org, where short reports
1981), expectation to have to discuss the communication of null findings can be posted. Thus, there is no shortage
at a future meeting (Chaiken, 1980), and need for closure of outlets for the publication of negative findings.
(Klein & Webster, 2000). These models further predict that The problem is that these journals lack the prestige of
when recipients are unable or unmotivated to process a some of the high-impact journals in social psychology
communication, they will rely more on heuristic cues than that have typically rejected articles reporting null find-
on argument quality. This prediction was supported by ings. Therefore, strategies have been suggested to encour-
Petty et al. (1981) using communicator credibility as heu- age exact replications of published research findings
ristic cue and by Chaiken (1980) as well as by Klein and through changes in the incentive system of our discipline
Webster (2000) using the number of arguments. Thus, (e.g., Koole & Lakens, 2012). For example, the editor of
even though the various experiments were very different the Journal of Research in Personality now encourages
and used different experimental manipulations, different and publishes high-quality replications in the hope that
attitude issues, and different dependent measures, they all “JRP will be able to provide critical information about
tested the same underlying theory. If this theory had been which initial discoveries really hold up over time” (http://
less valid, further empirical studies guided by them would www.journals.elsevier.com/journal-of-research-in-
have been likely to fail, even if original studies had yielded personality/news/professor-richard-lucas-encouraging-
(false) positive results. replication-studies). Furthermore, the Open Science
Thus, one reason why exact replications are not very Collaboration—a group of more than 175 volunteer
interesting is that they contribute little to scientific knowl- researchers—initiated the Reproducibility Project to rep-
edge. If an exact replication reinstates the finding of the licate a sample of studies published in 2008 in three
original study, we have learned that the original outcome major psychology journals. The aim is to obtain an “esti-
was reproducible. We are not any wiser as to whether the mate of the reproducibility of current psychological sci-
original study was a good test of the theory to be tested ence” (Open Science Collaboration, 2012, p. 658). As one
because even though the experiment may have been scientist whose supportive comment is cited in that docu-
poorly designed, a faithful replication might result in the ment argued,
same finding. Conversely, if we succeed in supporting
the theoretical hypothesis with an experiment that opera- If [our] discoveries are true, other people in other
tionalized both the independent and the dependent vari- places should be able to rely on them, and build on
able differently and thus sampled different parts of the them, to push further, and build a cumulative body
same theoretical concept, we have gained additional of knowledge. These values are central to what it is,
information and increased our trust in the underlying for me, to be a scientist, and the Reproducibility
theory. Project expresses those values. . . . The project
exclusively attempts direct replications “repetition
Why null findings are not always that of an experimental procedure” in order to “verify a
piece of knowledge.” (Schmidt, 2009, pp. 92–93)
informative
Related to the drive for an increase in exact replications The illusion of theoretical verification. Statements
is the argument that we should publish more null find- such as this raise the question about what precisely is
ings. This idea has a long history. Already in 1975, meant by the “discovery” that needs to be “verified.” Is
Greenwald published an article on the “Consequences of the discovery of the Aronson and Mills (1959) study their
prejudice against the null hypothesis” in which he con- empirical finding that girls who had to read aloud four-
cluded “that research traditions and customs of discrimi- letter words evaluated a group discussion of the sex life
nation against accepting the null hypothesis may be very of lower animals more positively than girls who had to

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


64 Stroebe and Strack

read words that were related to sex but not obscene (e.g., If we accept that the ultimate aim of research is to test
prostitute, virgin, and petting), or is it the support of theories and if we further accept the epistemological
the hypothesis derived from dissonance theory that position of critical rationalism (Popper, 1959) that theo-
members who had gone through a severe initiation ries can only be falsified but never be proven true, then
should overestimate the attractiveness of their group to such negative findings should be influential, because
reduce dissonance? they help us to falsify theories. However, our earlier dis-
If it is the empirical finding, then other people in other cussion of the conflict surrounding the professor-priming
places should better not rely on the results, unless the studies also signals a major problem with Popper’s posi-
other people are other Americans. It is doubtful that the tion, namely, that it is difficult to decide when a theory
manipulation would have worked outside the United should be considered falsified. The theory underlying the
States or Northern Europe even in 1959 (Arnett, 2008; see professor studies is that priming people with the concept
also Stroebe & Nijstad, 2009).2 Furthermore, as we argued of professor will increase in their minds the cognitive
earlier, even Americans would be unlikely to reproduce accessibility of “clever” or “intelligent.” According to some
these findings with this particular experimental manipu- not yet specified process, the increase in the cognitive
lation today. If the discovery that needs to be “verified” is accessibility of the cognitive representations of these
the theoretical hypothesis, then other people in other concepts will improve their performance on a knowl-
places should not rely on it too much either, because the edge test.
bad news is that theories cannot be verified or even dem- The nonreplications published by Shanks and col-
onstrated as probable (Popper, 1959). And with regard to leagues (2013) cannot be taken as a falsification of that
the effect of the severity of initiation on the attractiveness theory, because their study does not explain why previ-
of a group, some revision of the original hypothesis ous research was successful in replicating the original
seems certainly necessary (e.g., Lodewijkx & Syroit, 1997; findings of Dijksterhuis and van Knippenberg (1998).
Lodewijkx, van Zomeren, & Syroit, 2005). Thus, although the Shanks et al. (2013) study signals a
potential problem with the stability of the professor-
What can we learn from null findings? If knowl- priming effect, it offers no explanation for the discrep-
edge about null findings is so important, why does our ancy between their findings and that of earlier studies.
discipline seem to be so uninterested? One reason is that Even multiple failures to replicate an established finding
not all null findings are interesting. For example, just would not result in a rejection of the original hypothesis,
before his downfall, Stapel published an article on how if there are also multiple studies that supported that
disordered contexts promote stereotyping and discrimi- hypothesis. Because failures of exact replications do not
nation. In this publication, Stapel and Lindenberg (2011) tell us why findings cannot be replicated, they are ulti-
reported findings showing that litter or a broken-up side- mately not very informative. The believers will keep on
walk and an abandoned bicycle can increase social dis- believing, pointing at the successful replications and der-
crimination. These findings, which were later retracted, ogating the unsuccessful ones, whereas the nonbelievers
were judged to be sufficiently important and interesting will maintain their belief system drawing on the failed
to be published in the highly prestigious journal Science. replications for support of their rejection of the original
Let us assume that Stapel had actually conducted the hypothesis. Thus, one reason why null findings are not
research described in this paper and failed to support his very interesting is because they tell us only that a finding
hypothesis. Such a null finding would have hardly mer- could not be replicated but not why this was the case.
ited publication in the Journal of Articles in Support of the This conflict can be resolved only if researchers develop
Null Hypothesis. It would have been uninteresting for the a theory that could explain the inconsistency in findings.
same reason that made the positive result interesting, Such a theory should allow one to identify the factor that
namely, that (a) nobody expected a relationship between determined whether the priming effect occurs. Thus,
disordered environments and prejudice and (b) there instead of testing a theory against the null hypothesis, a
was no previous empirical evidence for such a relation- more successful strategy is to test it against an alternative
ship. Similarly, if Bargh et al. (1996) had found that prim- hypothesis.
ing participants with the stereotype of the elderly did
not influence walking speed or if Dijksterhuis and Nonreplications as interaction effects.  In the ongo-
van Knippenberg (1998) had reported that priming par- ing discussion, “failures to replicate” are typically taken
ticipants with “professor” did not improve their perfor- as a threat to the existence of the phenomenon. Method-
mance on a task of trivial pursuit, nobody would have ologically, however, nonreplications must be understood
been interested in their findings. Thus, null findings are as interaction effects in that they suggest that the effect
interesting only if they contradict a central hypothesis of the crucial influence depends on the idiosyncratic
derived from an established theory and/or are discrepant conditions under which the original experiment was
with a series of earlier studies. conducted. Of course, interaction effects are highly

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


Alleged Crisis and Illusion of Exact Replication 65

informative if the additional variables are psychologically for a variety of research findings, Tetlock and Manstead
meaningful. However, the conditions of “exact replica- (1985) concluded that neither side emerged as clear win-
tions” are psychologically unspecified and include an ner. They suggested that a more profitable strategy was
infinite number of influences that may come from the “to abandon the search for crucial experiments and to
cultural circumstances, the experimental setting, charac- focus on clarifying the points of similarity and dissimilar-
teristics of the participants, and so forth. The resulting ity between rival intrapsychic and impression manage-
ambiguity prevents anything to be learned from such an ment theories” (p. 71). Both theories should be considered
interaction. As a consequence, the mere coexistence of as special cases of an integrative theoretical framework.
exact replications that are both successful and unsuccess- Similarly, dissonance theory (Festinger, 1957) and self-
ful is likely to leave researchers helpless about what to perception theory (Bem, 1965) were established as com-
conclude from such a pattern of outcomes. Conducting peting theories. And yet decades of research attempting
exact replications in a registered and coordinated fashion to demonstrate the superiority of one over the other
by different laboratories does not remove the described resulted in conflicting findings. To resolve this conflict,
shortcomings. This is also the case if exact replications it was suggested that both theories were valid but applied
are proposed as a means to estimate the “true size” of an only under certain conditions (e.g., Fazio, Zanna, &
effect. As the size of an experimental effect always Cooper, 1977; Stroebe & Diehl, 1988).
depends on the specific error variance that is generated If we look at the history of social psychology, theories
by the context, exact replications can assess only the effi- have rarely been abandoned because of failed replica-
ciency of an intervention in a given situation but not the tions. Theories are often abandoned because researchers
generalized strength of a causal influence. simply lose interest (Greenwald, 2012). The reason for
Instead of leaving the circumstances unspecified, the this loss has often been that they no longer matched the
interaction can be informative if the conditions of replica- prevailing theoretical paradigm (Kuhn, 1996; for exam-
tion attempts are psychologically defined. This can be ple, the rise of social cognition research resulted in a loss
illustrated with the classic example of dissonance aroused of interest in the motivations explanations offered by
by counterattitudinal advocacy. Festinger and Carlsmith consistency theories). As Hilton (2012) observed, “social
(1959) argued that a person will experience dissonance if psychology does seem to have a disquieting tendency to
the person believes “X” but has, as a result of pressure forget theories rather than disprove them” (p. 69). But
brought on the individual, publicly stated that he or she more frequently, competing theories are not abandoned
believes “not X.” They further proposed that the magni- but reduced in their generality. That is, what originally
tude of the dissonance would be maximal if the prom- seemed to be a simple main effect turned out to be a
ised rewards or threatened punishments were just barely more complex interaction.
enough to induce the person to say “not X.” Although
their theoretical analysis was valid, it took a decade
Replications as fraud detectors
before researchers were able to reliably replicate the
findings reported by Festinger and Carlsmith (1959). This leads us to a last argument in justification of exact
Replication became possible only once they realized that replications, namely, that they facilitated the detection of
for counterattitudinal behavior to arouse dissonance, the fraudulent research. For example, Crocker and Cooper
actor had to be made to feel responsible for telling a lie (2011, p. 1182) argued: “Despite the need for reproducible
(i.e., freedom of choice; Linder, Cooper, & Jones, 1967) results to drive progress, studies that replicate or fail to
and that that lie had negative consequences for the other replicate others’ findings are almost impossible to publish
person (Cooper & Worchel, 1970). This strategy was in in top scientific journals. This disincentive means fraud can
line with the argument of Greenwald, Pratkanis, Leippe, go undetected, which was the case with Stapel.” Similarly,
and Baumgardner (1986), that to reduce a needlessly Roediger (2012, p. 27) stated “that if others had tried to
overgeneralized theory, one needs to identify the condi- replicate his [Stapel’s] work soon after its publication, his
tions under which this theory applies. misdeeds might have been uncovered much more quickly.”
Along the same lines, Chambers and Sumner (2012) wrote:
Resolving conflicts between competing theories. 
The search for moderator variables that determine the Replication is our best friend because it keeps us
conditions under which theories apply is also the most honest. In science, false results have a short (albeit
effective strategy for resolving conflicts between compet- potentially damaging) lifespan because regardless
ing theories that offer explanations for the same psycho- of how they come about, other scientists won’t be
logical phenomena. After discussing the various empirical able to reproduce them. On the other hand, true
strategies that researchers used to distinguish between results will be replicated time and time again by
impression management and intrapsychic explanations different scientists. (para. 10)

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


66 Stroebe and Strack

Finally, Mummendey (2012, p. 7) went even further We had no a priori expectation that nationality
and suggested: would moderate priming effects; moreover, there is
little reason to believe that the underlying
Scientific journals could expand their already high psychological principles responsible for assimilation,
standards of the peer review system by adding the anchoring, or correction effects would differ cross-
requirement for a thorough external replication. culturally. We therefore suspect that these effects
Authors submit their manuscript together with their simply reflect idiosyncratic differences between
data. Once the publication has been approved by U.S./Canadian labs and those elsewhere. (p. 10)
a preliminary group of reviewers, the editors
invite suitable experts to attempt a replication of With hindsight, one striking difference is that Stapel
the results. After this has been accomplished, both was first author on 7 of the 10 assimilation studies, on 6
the original manuscript and the replication study of the 7 anchoring contrast studies, and on 2 of the 3 cor-
are published together. rection contrast studies that were responsible for this
nationality effect. Should one have suspected fraud?
Replications are poor fraud detectors. Existing evi- Probably not. It is quite possible that European (student)
dence, however, sheds doubt on the success of such strat- participants take experiments more seriously or differ in
egies. In a recent article, Stroebe, Postmes, and Spears other ways from American undergraduates, who serve as
(2012) analyzed 40 fraud cases to identify the way by subjects in most social psychological research.
which the fraud had been detected. Because this is rarely The findings of the meta-analysis of DeCoster and
reported in official reports, they had to rely on fraud cases Claypool (2004) indicate another problem with replica-
that were sufficiently significant to be discussed in local or tions as indicator of fraud, namely, that except for their
national papers, because unlike agencies such as the U.S. effect sizes, the findings of the priming research reported
Office of Research Integrity, journalists typically like to by Stapel and colleagues were quite consistent with
know how fraudsters were discovered. To their own sur- multiple studies conducted in the United States. It is
prise, Stroebe and colleagues found that replications typical for successful fraudsters that they avoid produc-
hardly played any role in the discovery of these fraud ing unexpected findings and that their results are in line
cases. As was the case with Stapel (Levelt, Noort, & Drenth, with the expectations of the field. Some of the Stapel
2012), most fraud cases are detected by close colleagues articles that had been included in this meta-analysis
or research or graduate student assistants of the fraudster, were identified as fraudulent in the final report of
who act as whistleblowers (Stroebe et al., 2012). Fraud- the three committees that had been appointed to inves-
sters are also frequently identified by outside colleagues, tigate the Stapel affair (Levelt et al., 2012). And yet
who work in the same field and find inconsistencies in even though Stapel may have been somewhat overen-
journal articles that are often editorial in nature (e.g., thusiastic in improving the data, his findings have been
the same figure for different outcomes). For example, the replicated by other authors. As Stapel wrote in his auto-
fraud of the Norwegian cancer researcher John Sudbo, like biography, he was always pleased when his invented
Stapel a young star player in his field, was discovered by findings were replicated: “What seemed logical and was
the head of the epidemiology division at the Norwegian fantasized became true” (Stapel, 2012). Thus, neither
Institute of Public Health, who realized that the cancer can failures to replicate a research finding be used as
patient database used in the study had not yet been avail- indicators of fraud, nor can successful replications be
able at the time of the study (http://en.wikipedia.org/wiki/ invoked as indication that the original study was hon-
Jon_Sudb%C3%B8). estly conducted.
In social psychology, as in many areas of medicine
and even physics, replication failures are due to incom-
Conclusion
plete description of the methodology that was used in a
study or because important moderators were not spelled Psychologists have no reason for complacency. The fact
out by a theory. Given that fraud (as far as we know) is that one of our colleagues committed one of the greatest
extremely rare, it is not surprising that failed replications research frauds in recent history should be a wake-up
are not diagnostic about fraud. call. Therefore, even though we are (or at least should
be) aware of the methodological standards of good
Meta-analyses are poor fraud detectors. That this research, it is helpful to be reminded that certain proce-
limitation even applies to meta-analyses can be illustrated dures can substantially distort the results of the statistical
with the findings of a meta-analysis of priming studies by tests (Simmons et al., 2011).
DeCoster and Claypool (2004). They reported that the As a field, we have reacted (or are at least planning to
sizes of priming effects were larger in studies conducted react) to these challenges by instituting measures that
in countries outside the United States and Canada. The will increase the risk of discovery for people who still
authors wrote: engage in these practices. Thus, if the rule will be
Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015
Alleged Crisis and Illusion of Exact Replication 67

instituted that for all published studies the materials used that contained either words that evoked the elderly ste-
as well as the data are stored at a publicly accessible Web reotype or words that were unrelated. After the experi-
site, many questionable practices will become publicly ment had apparently been finished, a second experimenter
visible. Although this regulation cannot prevent people measured how fast participants were walking down a
from submitting only a selection of their measures and corridor. Participants primed with the elderly stereotype
from “cleaning up” their data before submission, it should walked significantly slower than those primed with unre-
act as a deterrent. And it will certainly allow interested lated traits. Moreover, a replication of that study by the
parties to reanalyze data to see whether the original con- same authors using different participants resulted in
clusions are indeed supported. Unfortunately, there has nearly identical results.
been a great deal of resistance from major journal pub- Several successful conceptual replications have been
lishers (e.g., the American Psychological Association, the published (Aarts & Dijksterhuis, 2002; Cesario, Plaks, &
Association for Psychological Science) against instituting Higgins, 2006; Dijksterhuis, Spears, & Lepinasse, 2001;
such a rule, but most Dutch and a few U.S. universities Kawakami, Young, & Dovidio, 2002; Ku, Wang, &
have now done so at the university level. Galinsky, 2010; Macrae et al., 1998). For example, the
Another promising improvement is the Web site for participants of the study of Ku et al. (2010, Study 3)
reporting successful and unsuccessful replications (www were shown a photograph of an older individual and
.psychfiledrawer.org). But whereas it will certainly be were subsequently asked to write a narrative about a day
useful to be informed about studies that are difficult to in the life of that person. Half the participants were
replicate, we are less confident about whether the invest- instructed to take the perspective of that person in their
ment of time and effort of the volunteers of the Open story writing, while the other half were told to write as
Science Collaboration is well spent on replicating studies objectively as possible. In line with the original predic-
published in three psychology journals. The result will be tions, participants afterward walked more slowly if they
a reproducibility coefficient that will not be greatly infor- had been given the instructions to take the elderly per-
mative, because of justified doubts about whether the spective than to be objective. Dijksterhuis et al. (2001)
“exact” replications succeeded in replicating the theoreti- and Kawakami et al. (2002) showed that elderly primes
cal conditions realized in the original research. slowed down reactions to a lexical decision task. Macrae
As social psychologists, we are particularly concerned et al. (1998) found that participants read a word list faster
that one of the outcomes of this effort will be that results when primed with “Michael Schumacher” (at that time
from our field will be perceived to be less “reproducible” Formula One world champion) than with a neutral prime.
than research in other areas of psychology. This is to be Finally, Mussweiler (2006) demonstrated the reverse
expected because for the reasons discussed earlier, effect, namely, that moving more slowly led participants
attempts at “direct” replications of social psychological to ascribe more elderly-stereotypic characteristics to a
studies are less likely than exact replications of experi- target (Experiment 2).
ments in psychophysics to replicate the theoretical condi- Cesario et al. (2006, Experiment 2) conducted another
tions that were established in the original study. conceptual replication that differed from the original
Although psychologists should not be complacent, study only in the method of priming (i.e., subliminal pre-
there seem to be no reasons to panic the field into sentation of pictures of elderly [and young people] rather
another crisis. Crises in psychology are not caused by than words in scrambled sentences). Cesario and col-
methodological flaws but by the way people talk about leagues were only partially successful in replicating the
them (Kruglanski & Stroebe, 2012). Although the discus- findings of Bargh et al. (1996). Participants were found to
sion of research practices in social psychology is healthy walk more slowly and took more time to walk to the exit
and may ultimately lead to an improvement of the rules after having been subliminally primed with pictures of
that guide us on the way we analyze, report, and store older people than participants primed with pictures of
our data, magnifying the present problems into a “repli- young people, but walking speed in both conditions did
cability crisis” is likely to be counterproductive in the not differ significantly from an unprimed control group.
long run. Thus, although the study demonstrated that priming with
social categories can influence walking speed, it did not
replicate the finding of Bargh et al. (1996) that priming
Appendix: Failures to Replicate Social with the elderly stereotype reduces walking speed in
Priming Studies comparison to an unprimed control group. This differ-
Bargh and Colleagues’ priming ence could have been due to the different type of prim-
ing used.
walking speed The only exact replication of the study was conducted
Bargh, Chen, and Burrows (1996, Experiment 2a, 2b) by Doyen et al. (2012), who repeated the original
primed their participants with a scrambled-sentence test study, with the only difference that walking speed was

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


68 Stroebe and Strack

measured objectively by infrared sensor rather than with a impair stereotype-related performance. Some years later,
stopwatch. They did not find any difference between the Wheeler, Jarvis, and Petty (2001) showed this effect to
two priming conditions in their first experiment. In their operate not only with self-stereotyping but also with ste-
second experiment, they did observe an effect but only if reotyping of others. One such mechanism has recently
experimenters were explicitly instructed to expect the been examined in a functional MRI study in which par-
primed participants to walk more slowly. This raised the ticipants had to perform a memory task (n − 2 back task
possibility that the original findings had been caused by for letters) after having been primed with either the word
the experimenters’ expectations and that the second “clever” or “stupid” (Bengtsson, Dolan, & Passingham,
experimenter measuring walking speed with a stopwatch 2011). Findings showed that participants who had been
was somehow aware of the experimental condition. primed with “clever” spent more time making a response
However, in their method section, Bargh et al. (1996) following an error. At the same time, this resulted in
clearly stated that experimenters and observers were blind increased activation in the anterior paracingulate cortex
to experimental conditions. Furthermore, experimenter on error trials. This finding is consistent with a role of the
expectation affected walking speed even when it was anterior paracingulate that is considered to be involved
measured objectively in the Doyen et al. (2012) study. in monitoring task performance.
Thus, it remains unclear how experimenter expectations
could have influenced the participants’ speed of walking. Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with
respect to their authorship or the publication of this article.
The “professor studies” of Dijksterhuis
and van Knippenberg
Notes
Dijksterhuis and van Knippenberg (1998) primed partici- 1. It must be acknowledged that such manipulation checks
pants either with a category of persons who are usually are missing not only in replications but also in the original
considered highly intelligent (e.g., professor) or with a experiments.
category of persons who are considered less intelligent 2. This was one of the major arguments that Gergen (1973)
(e.g., hooligan). The priming manipulation consisted of used to argue “that social psychology is primarily an historical
participants having to think of a typical professor (or inquiry. Unlike the natural sciences, it deals with facts that are
hooligan) and then list lifestyle, behavior, and appear- largely nonrepeatable and which fluctuate markedly over time”
ance attributes of the typical member of the primed cat- (p. 310).
egory. In an apparently unrelated second experiment,
participants had to answer a number of general knowl- References
edge questions taken from trivial pursuit. Dijksterhuis Aarts, H., & Dijksterhuis, A. (2002). Category activation effects
and van Knippenberg found that thinking about a cate- in judgment and behaviour: The moderating role of per-
gory of intelligent persons resulted in better performance ceived comparability. British Journal of Social Psychology,
41, 123–128.
of the trivial pursuit task than thinking of a category of
Arnett, J. J. (2008). The neglected 95%: Why American psychol-
less intelligent persons and conducted four replications
ogy needs to become less American. American Psychologist,
of the finding. Conceptual replications have been 63, 602–614.
reported by several different laboratories from different Aronson, E., & Mills, J. (1959). The effect of severity of initia-
countries, with most articles containing multiple studies tion on liking for a group. Journal of Abnormal and Social
(e.g., Bry, Follenfant, & Meyer, 2008; Galinsky, Wang, & Psychology, 59, 177–181.
Ku, 2008; Haddock, Macrae, & Fleck, 2002; Hansen & Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The rules
Wänke, 2009; LeBoeuf & Estes, 2004; Lowery, Eisenberger, of the game called psychological science. Perspectives
Hardin, & Sinclair, 2007; Nussinson, Seibt, Häfner, & on Psychological Science, 7, 543–554. doi:10.1177/
Strack, 2010). However, recently Shanks and colleagues 1745691612459060
(2013) conducted nine exact replication studies. In none Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity of
social behavior: Direct effects of trait construct and stereo-
of the studies did the manipulation yield a statistically
type-activation on action. Journal of Personality and Social
significant effect on performance.
Psychology, 71, 230–244.
The findings reported by Shanks et al. (2013) are not Bem, D. J. (1965). An experimental analysis of self-persuasion.
only inconsistent with the successful replications of the Journal of Experimental Social Psychology, 1, 199–218.
professor studies, they are also inconsistent with other Bengtsson, S. L., Dolan, R., & Passingham, R. E. (2011). Priming
findings showing that stereotypes can influence individ- for self-esteem influences monitoring of one’s own perfor-
ual performance on intellectual tasks. For example, Steele mance. Scan, 6, 417–425. doi:10.1093/scan/sq048
and Aronson (e.g., 1995) have repeatedly demonstrated Bry, C., Follenfant, A., & Meyer, T. (2008). Blonde like me:
that the mere salience of a stereotype is sufficient to When self-construals moderate stereotype priming effects

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


Alleged Crisis and Illusion of Exact Replication 69

on intellectual performance. Journal of Experimental Social Greenwald, A. G., Pratkanis, A. R., Leippe, M. R., &
Psychology, 44, 751–757. Baumgardner, M. H. (1986). Under what conditions does
Cesario, J., Plaks, J. E., & Higgins, T. E. (2006). Automatic social theory obstruct research progress? Psychological Review,
behavior as motivated preparation to interact. Journal of 93, 216–229.
Personality and Social Psychology, 90, 893–910. Haddock, G., Macrae, C. N. & Fleck, S. (2002). Syrian
Chaiken, S. (1980). Heuristic versus systematic information pro- science and smart supermodels: On the when and how of
cessing and the use of source versus message cues in per- perception–behavior effects. Social Cognition, 20, 461–481.
suasion. Journal of Personality and Social Psychology, 39, Hansen, J., & Wänke, M. (2009). Think of capable others and
725–766. you can make it! Self-efficacy mediates the effect of stereo-
Chamberlin, J. (2000, May). A student publishing tradition. APA type activation on behavior. Social Cognition, 27, 76–88.
Monitor on Psychology, 31, 36. Retrieved from http://www Higgins, T. E., Rholes, W. S., & Jones, C. R. (1977). Category
.apa.org/monitor/may00/rrsp.aspx accessibility and impression formation. Journal of
Chambers, C., & Sumner, P. (2012, September 14). Replication Experimental Social Psychology, 13, 141–154.
is the only solution to scientific fraud. The Guardian. Hilton, D. (2012). The emergence of cognitive social psychol-
Retrieved from http://www.guardian.co.uk/commentis ogy: A historical analysis. In A. Kruglanski & W. Stroebe
free/2012/sep/14/solution-scientific-fraud-replication (Eds.), Handbook of the history of social psychology (pp.
Cooper, J., & Worchel, S. (1970). Role of undesirable con- 45–79). New York, NY: Psychology Press.
sequences in arousing cognitive dissonance. Journal of Janis, I. L., & Feshbach, S. (1953). Effects of fear-arousing com-
Personality and Social Psychology, 16, 199–206. munications. Journal of Abnormal and Social Psychology,
Crocker, J., & Cooper, L. (2011, December 2). Editorial: 48, 78–92.
Addressing scientific fraud. Science, 334, 1182. doi:10.1126/ John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the
science.1216775. prevalence of questionable research practices with incen-
DeCoster, J., & Claypool, H. M. (2004). A meta-analysis of priming tives for truth telling. Psychological Science, 23, 524–532.
effects on impression formation supporting a general model doi:10.1177/095679611430953
of informational bias. Personality and Social Psychology Kawakami, K. L., Young, H., & Dovidio, J. F. (2002). Automatic
Review, 8, 2–27. doi:10.1207/S15327957PSPR0801_1 stereotyping: Category, trait, and behavioral activations.
Dijksterhuis, A., Spears, R., & Lepinasse, V. (2001). Reflecting Personality and Social Psychology Bulletin, 28, 3–15.
and deflecting stereotypes: Assimilation and contrast in Klein, C. T. F., & Webster, D. M. (2000). Individual differences
impression formation and automatic behavior. Journal of in argument scrutiny as motivated by need for cognitive
Experimental Social Psychology, 37, 286–299. doi:10.1006/ closure. Basic and Applied Psychology, 22, 119–129.
jesp.2000.1449 Koole, S. L., & Lakens, D. (2012). Rewarding replications: A sure
Dijksterhuis, A., & van Knippenberg, A. (1998). The relation simple way to improve psychological science. Perspectives
between perception and behavior or how to win a game of on Psychological Science, 7, 608–614.
trivial pursuit. Journal of Personality and Social Psychology, Kruglanski, A. W., & Stroebe, W. (2012). The making of social
74, 865–877. psychology. In A. W. Kruglanski & W. Stroebe (Eds.),
Doyen, S., Klein, O., Pichon, C.-L., & Cleeremans, A. (2012). Handbook of the history of social psychology (pp. 3–18).
Behavioral priming: It’s all in the mind, but whose mind? New York, NY: Psychology Press.
PLoS ONE, 7, e29081. doi:10.137/journal.pone.0029081 Ku, G., Wang, C. S., & Galinsky, A. D. (2010). Perception
Fazio, R. H., Zanna, M. P., & Cooper, J. (1977). Dissonance through a perspective-taking lens: Differential effects on
and self-perception: An integrative view of each theory’s judgments and behavior. Journal of Experimental Social
proper domain of application. Journal of Experimental Psychology, 46, 792–798.
Social Psychology, 13, 464–479. Kuhn, T. S. (1996). The structure of scientific revolutions
Festinger, L. (1957). A theory of cognitive dissonance. Stanford, (3rd ed.). Chicago, IL: University of Chicago Press.
CA: Stanford University Press. LeBel, E. P., Borsboom, D., Giner-Sorolla, R., Hasselman,
Festinger, L., & Carlsmith, J. M. (1959). Cognitive consequences F., Peters, K. R., Ratliff, K. A., & Tucker Mith, C. (2013).
of forced compliance. Journal of Abnormal and Social PsychDisclosure.org: Grassroots support for reforming
Psychology, 58, 203–210. Retrieved from http://ori.hhs.gov/ standards in psychology. Perspectives on Psychological
sites/default/files/gallup_finalreport.pdf Science, 8, 424–432.
Galinsky, A. D., Wang, C. S., & Ku, G. (2008). Perspective- LeBoeuf, R. A., & Estes, Z. (2004). “Fortunately, I’m no Einstein”:
takers behave more stereotypically. Journal of Personality Comparison relevance as a determinant of behavioral
and Social Psychology, 95, 404–419. assimilation and contrast. Social Cognition, 22, 607–636.
Gergen, K. J. (1973). Social psychology as history. Journal of Levelt, P., Noort, E., & Drenth, P. (2012). Flawed science: The
Personality and Social Psychology, 26, 309–320. fraudulent research practices of social psychologist Diederik
Greenwald, A. G. (1975). Consequences of prejudice against Stapel. Retrieved from http://www.tilburguniversity.edu/
the null-hypothesis. Psychological Bulletin, 82, 1–20. upload/3ff904d7-547b-40ae-85fe-bea38e05a34a_Final%20
Greenwald, A. G. (2012). There is nothing so theoretical as report%20Flawed%20Science.pdf
a good method. Perspectives on Psychological Science, 7, Linder, D. E., Cooper, J., & Jones, E. E. (1967). Decision free-
99–108. dom as a determinant of the role of incentive magnitude in

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


70 Stroebe and Strack

attitude change. Journal of Personality and Social disruption versus effort justification. Journal of Personality
Psychology, 6, 245–254. and Social Psychology, 34, 874–884.
Lodewijkx, H. F. M., & Syroit, J. E. M. M. (1997). Severity of Popper, K. R. (1959). The logic of scientific discovery. London,
initiation revisited: Does severity of initiation increase England: Hutchinson.
attractiveness in real groups? European Journal of Social Roediger, H. L. (2012, February). Psychology’s woes and a
Psychology, 27, 275–300. partial cure: The value of replication. Observer, 25(2), 9,
Lodewijkx, H. F. M., van Zomeren, M., & Syroit, J. E. M. M. 27–29.
(2005). The anticipation of a severe initiation: Gender dif- Schmidt, S. (2009). Shall we really do it again? The powerful
ferences in effects on affiliation tendency and group attrac- concept of replication is neglected in the social sciences.
tion. Small Group Research, 36, 237–262. doi:10.1177/ Review of General Psychology, 13, 90–100. doi:10.1037/
1046496404272381 a0015108
Loersch, C., & Payne, B. K. (2011). The situated inference Schröder, T., & Thagard, P. (2013). The affective meanings of
model: An integrative account of the effects of primes automatic social behaviors: Three mechanisms that explain
on perception, behavior, and motivation. Perspectives on priming. Psychological Review, 120, 255–280.
Psychological Science, 6, 234–252. Shanks, D. R., Newell, B. R., Lee, E. H., Balakrishnan, D.,
Lowery, B. S., Eisenberger, N. I., Hardin, C. D., & Sinclair, S. Ekelund, L., Cenac, Z., . . . Moore, C. (2013). Priming intel-
(2007). Long-term effects of subliminal priming on aca- ligent behavior: An elusive phenomenon. PLoS ONE, 8,
demic performance. Basic and Applied Social Psychology, e56515. doi:10.1371/journal.pone.0056515
29, 151–157. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-
Macrae, C. N., Bodenhausen, G. V., Milne, A. B., Castelli, L., positive psychology: Undisclosed flexibility in data collec-
Schloerscheidt, A. M., & Greco, S. (1998). On activating tion and analysis allows presenting anything as significant.
exemplars. Journal of Experimental Social Psychology, 34, Psychological Science, 22, 1359–1366.
330–354. Stapel, D. (2012). Ontsporing [Derailment]. Amsterdam, the
Meyer, D. E., & Schvaneveldt, R. W. (1971). Facilitation in rec- Netherlands: Prometheus.
ognizing pairs of words: Evidence of dependence between Stapel, D. A., & Lindenberg, S. (2011, April 8). Coping with
retrieval operations. Journal of Experimental Psychology, chaos: How disordered contexts promote stereotyping and
90, 227–234. discrimination. Science, 332, 251–252.
Mummendey, A. (2012). Scientific misconduct in social psy- Steele, C. M., & Aronson, J. (1995). Stereotype threat and
chology—Towards a currency reform in science. European the intellectual test performance of African Americans.
Bulletin of Social Psychology, 24, 4–7. Journal of Personality and Social Psychology, 69, 797–
Mussweiler, T. (2006). Doing is for thinking! Stereotype activa- 811.
tion by stereotypic movement. Psychological Science, 17, Strack, F., & Deutsch, R. (2004). Reflective and impulsive
17–21. determinants of social behavior. Personality and Social
Nussinson, R., Seibt, B., Häfner, M., & Strack, F. (2010). Come Psychology Review, 8, 220–247.
a bit closer: Approach motor actions lead to feeling similar Strack, F., & Hannover, B. (1996). Awareness of influence as
and behavioral assimilation. Social Cognition, 28, 40–58. a precondition for implementing correctional goals. In
Open Science Collaboration. (2012). An open, large-scale, col- P. Gollwitzer & J. A. Bargh (Eds.), The psychology of
laborative effort to estimate the reproducibility of psycho- action: Linking cognition and motivation to behavior
logical science. Perspectives on Psychological Science, 7, (pp. 579–595). New York, NY: Guilford.
657–660. Strack, F., Schwarz, N., Bless, H., Kübler, A., & Wänke, M.
Pashler, H., Coburn, N., & Harris, C. R. (2012). Priming of (1993). Awareness of the influence as a determinant of
social distance? Failure to replicate effects on social and assimilation versus contrast. European Journal of Social
food judgments. PLoS ONE, 7, e42510. doi:10.1371/journal Psychology, 23, 53–62.
.pone.0042510 Stroebe, W. (2011). Social psychology and health. Maidenhead,
Pashler, H., & Harris, C. R. (2012). Is the replicability crisis England: Open University Press.
overblown? Three arguments examined. Perspectives on Stroebe, W., & Diehl, M. (1988). When social support fails:
Psychological Science, 7, 531–536. doi:10.1371/journal Supporter characteristics in compliance induced attitude
.pmed.0020124 change. Personality and Social Psychology Bulletin, 14,
Pashler, H., & Wagenmakers, E.-J. (2012). Editors’ introduction 136–144.
to the special section on replicability in psychological sci- Stroebe, W., & Nijstad, B. (2009). Do our psychological laws
ence: A crisis of confidence? Perspectives on Psychological apply only to Americans? American Psychologist, 64, 569.
Science, 7, 528–530. doi:10.1037/a0016090
Petty, R. E., & Cacioppo, J. T. (1986). Communication and per- Stroebe, W., Postmes, T., & Spears, R. (2012). Scientific
suasion. New York, NY: Springer. misconduct and the myth of self-correction in sci-
Petty, R. E., Cacioppo, J. T., & Goldman, R. (1981). Personal ence. Perspectives on Psychological Science, 7, 670–688.
involvement as a determinant of argument-based persuasion. doi:10.1177/1745691612460687
Journal of Personality and Social Psychology, 41, 847–855. Tetlock, P. E., & Manstead, A. S. R. (1985). Impression manage-
Petty, R. E., Wells, G. L., & Brock, T. C. (1976). Distraction ment versus intrapsychic explanations in social psychology:
can enhance or reduce yielding to propaganda: Thought A useful dichotomy? Psychological Review, 92, 59–77.

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015


Alleged Crisis and Illusion of Exact Replication 71

Tulving, E., & Schacter, D. L. (1990, January 19). and social cognition. Perspectives on Psychological Science,
Prim­ i ng and human memory systems. Science, 247, 4, 274–288.
301–306. Wheeler, S. C., Jarvis, W. B. G. & Petty, R. E. (2001). Think onto
Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzling others: The self-destructive impact of negative racial stereo-
high correlations in fMRI studies of emotion, personality, types. Journal of Experimental Social Psychology, 37, 173–180.

Downloaded from pps.sagepub.com by Gerald Haeffel on January 8, 2015