Jan de Ruiter posted a preprint of his response to the review of the replication debate by Rolf Zwaan, Alexander Etz, Richard Lucas, and Brent Donnellan, that will be published in Behavioral and Brain Sciences. I had been working on a blog post with my own comments, but De Ruiter has made one of the points I wanted to make and done so much better than I would have been able to. I will just add a few bits and pieces to his response and then make my other point.
De Ruiter makes an important distinction between effects, “the statistical generalization of an observed difference”, and claims, “an effect believed to be generalizable to the population and context of interest” (Ruiter, 2018a, p. 1). A claim is a generalization: the effect that occurred in our lab will also occur in other, similar circumstances. When we replicate a claim, which is what we usually call replication, we check that generalization. As De Ruiter formulated it in a tweet, but unfortunately not in the response itself, “every replication is a conceptual replication” (Ruiter, 2018b). We try to replicate the meaning of the effect. The effect itself is irremediably local; hence there is no such thing as an exact replication, as everyone in the replication debate agrees.
Depending on what we take to be the claim, our replication will be more or less different from the original experiment. The more the claim goes beyond the effect, the more different the replication can be. Using De Ruiter’s example, suppose the original effect was that Computer Science undergrads at our university performed better at a math test when they had coffee than when they didn’t. If our claim refers to undergrads, coffee and a specific math test it is less general than when it speaks of people, stimulants, and puzzle solving. If we replicate the more general claim, we have more options (kinds of people, stimulant and puzzle) than if we replicate the more narrow claim.
Vice versa, we should evaluate every replication by how it differs from the original, and what sort of claim it could therefore be considered to be testing. In other words, rather than dividing replications into ‘direct’ and ‘conceptual’ – much too coarse a classification – , we should look at each replication study individually and see how it differs from the original and evaluate what conclusions we can draw from it, given its result and those differences.
This evaluation will always have theoretical aspects. The claim links the effect to a theory: it is the theory that gives meaning to what happens in the experiment. Whether the claim is narrow or more general, it is the theory that determines (or should determine) whether it matters if your participants are students or not, if you give them espresso or latte, or if you use a math test or the Raven’s progressive matrices. You need concepts to deal with sameness and difference, and the theory provides those concepts.
None of this entails that replications that differ very little from the original (‘direct replications’) are not of fundamental importance. One could even argue that more narrow claims – ‘when you follow our experimental procedure you should get a very similar effect’ – should have priority over more general ones when it comes to doing replications.
And now for my other point. Zwaan, Etz, Lucas, and Donnellan (ZELD from now on) extensively discuss the infamous contextual variability issue. Stroebe and Strack (2014) and Crandall and Sherman (2016), among others, have argued that in social psychology, ‘direct’ replications are of limited use, because the phenomena that are studied are so sensitive to the social, cultural, and historical context that we cannot expect them to reliably reoccur when we simply follow the same experimental procedure. The meaning of that procedure depends on the variable context. For example, an experimental manipulation intended to make participants slightly anxious – a picture of a zombie, say – may completely fail to do so twenty years later, because people have become habituated to such images after decades of zombie-movies. And so on for all the other aspects of the experiment.
The contextual variability issue becomes especially problematic if the variability is fundamentally unpredictable, if it cannot be reduced to known regularities. ZELD argue that such “strong forms of the context sensitivity argument are scientifically problematic”, because they ”prevent the accumulation of knowledge within a domain of study.” (Zwaan, Etz, Lucas, & Donnellan, 2017, p. 20) They add that context sensitivity makes the application of knowledge difficult. You can never be sure that the intervention you developed will work, because the circumstances might be different in a way that matters. “(T)here is little reason to expect that findings that emerge from a noncumulative perspective will have practical relevance” (Zwaan et al., 2017, p. 20).
These are somewhat strange ways of formulating the problem. It would not be the context sensitivity argument or the noncumulative perspective that is to blame. It would be the putative sensitivity itself. It looks as if ZELD refuse to consider the possibility that strong context sensitivity might be real: that there really are phenomena that cannot be guaranteed to repeat because they depend on a variable context, the variations of which we cannot reduce to underlying regularities. In terms of De Ruiter, these would be effects for which no claim can be formulated such that that claim can always be replicated. Such phenomena would not be lawful.
I can imagine that ZELD are reticent about allowing the possibility of such radically contingent phenomena. They probably rather stick to what Popper called a “well justified methodological rule – the scientist’s decision never to abandon his search for laws.” (Popper, 2002, p. 245) But it would be good to realise that it is a decision, that the belief in laws is, in Popper’s terms, metaphysical. As replication studies in psychology keep producing results incompatible with the original, it’s becoming more and more urgent to look again at this belief and consider to what extent (when, where) it is useful in psychology.
Crandall, C. S., & Sherman, J. W. (2016). On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology, 66, 93–99. https://doi.org/10.1016/j.jesp.2015.10.002
Popper, K. R. (2002). The logic of scientific discovery (2nd ed.). London: Taylor & Francis.
Ruiter, J. P. de. (2018a). BBS commentary by De Ruiter on Zwaan et al. Open Science Framework. https://doi.org/10.17605/OSF.IO/Q3AGY
Ruiter, J. P. de. (2018b, January 6). Here’s a preprint of my commentary on @RolfZwaan et al.’s BBS paper, in which I argue that every replication is a conceptual replication, and why such replications are legitimate and necessary.https://osf.io/q3agy [Tweet]. Retrieved January 15, 2018, from https://twitter.com/JPdeRuiter
Stroebe, W., & Strack, F. (2014). The Alleged Crisis and the Illusion of Exact Replication. Perspectives on Psychological Science, 9(1), 59–71. https://doi.org/10.1177/1745691613514450
Zwaan, R., Etz, A., Lucas, R. E., & Donnellan, B. (2017). Making Replication Mainstream. PsyArXiv. https://doi.org/10.17605/OSF.IO/4TG9C