Anyone who has talked to me in the last year would have heard me complain about my 8-times-failure-to-replicate which nobody wants to publish. The preprint, raw data and analysis scripts are available here, so anyone can judge for themselves if they think the rejections to date are justified. In fact, if anyone can show me that my conclusions are wrong – that the data are either inconclusive, or that they actually support an opposite view – I will buy them a bottle of drink of their choice*. So far, this has not happened.
I promise to stop complaining about this after I publish this blog post. I think it is important to be aware of the current situation, but I am, by now, just getting tired of debates which go in circles (and I’m sure many others feel the same way). Therefore, I pledge that from now on I will stop writing whining blog posts, and I will only write happy ones – which have at least one constructive comment or suggestion about how we could improve things.
So, here goes my last ever complaining post. I should stress that the sentiments and opinions I describe here are entirely my own; although I’ve had lots of input from my wonderful co-authors in preparing the manuscript of my unfortunate paper, they would probably not agree with many of the things I am writing here.
Why is it important to publish failures to replicate?
People who haven’t been convinced by the arguments put forward to date will not be convinced by a puny little blogpost. In fact, they will probably not even read this. Therefore, I will not go into details about why it is important to publish failures to replicate. Suffice it to say that this is not my opinion – it’s a truism. If we combine a low average experimental power with selective publishing of positive results, we – to use Daniel Lakens’ words – get “a literature that is about as representative of real science as porn movies are representative of real sex”. We get over-inflated effect sizes across experiments, even if an effect is non-existent; or, in the words of Michael Inzlicht, “meta-analyses are fucked”.
The interested reader can look up further details of our study in the OSF folder I linked above (https://osf.io/myfk3/). The study is about the Psycholinguistic Grain Size Theory (Ziegler & Goswami, 2005)**. If you type the name of this theory into google – or some other popular search terms, such as “dyslexia theory”, “reading across languages”, or “reading development theory” – you will see this paper on the first page. It has 1650 citations, at the time of writing of this blogpost. In other words, this theory is huge. People rely on it to interpret their data, and to guide their experimental designs and theories in diverse topics of reading and dyslexia.
The evidence for the Psycholinguistic Grain Size Theory is summarised in the preprint linked above; the reader can decide for themselves if they find it convincing. During my PhD, I decided to do some follow-up experiments on the body-N effect (Ziegler & Perry, 1998; Ziegler et al., 2001; Ziegler et al., 2003). Why? Not because I wanted to build my career on the ruins of someone else’s work (which is apparently what some people think of replicators), but because I found the theory genuinely interesting, and I wanted to do further work to specify the locus of this effect. So I did study after study after study – blaming myself for the messy results – until I realised: I had conducted eight experiments, and the effect just isn’t there. So I conducted a meta-analysis on all of our data, plus an unpublished study by a colleague with whom I’d talked about this effect, wrote it up and submitted it.
Surely, in our day and age, journals should welcome null-results as much as positive results? And any rejections would be based on flaws in the study?
Well, here is what happened:
Submission 1: Relatively high-impact journal for cognitive psychology
Here is a section directly copied-and-pasted from a review:
“Although the paper is well-written and the analyses are quite substantial, I find the whole approach rather irritating for the following reasons:
1. Typically meta-analyses are done one [sic] published data that meet the standards for publishing in international peer-reviewed journals. In the present analyses, the only two published studies that reported significant effects of body-N and were published in Cognition and Psychological Science were excluded (because the trial-by-trial data were no longer available) and the authors focus on a bunch of unpublished studies from a dissertation and a colleague who is not even an author of the present paper. There is no way of knowing whether these unpublished experiments meet the standards to be published in high-quality journals.”
Of course, I picked the most extreme statement. Other reviewers had some cogent points – however, nothing that would compromise the conclusions. The paper was rejected because “the manuscript is probably too far from what we are looking for”.
Submission 2: Very high-impact psychology journal
As a very ambitious second plan, we submitted the paper to one of the top journals in psychology. It’s a journal which “publishes evaluative and integrative research reviews and interpretations of issues in scientific psychology. Both qualitative (narrative) and quantitative (meta-analytic) reviews will be considered, depending on the nature of the database under consideration for review” (from their website). They have even announced a special issue on Replicability and Reproducibility, because their “primary mission […] is to contribute a cohesive, authoritative, theory-based, and complete synthesis of scientific evidence in the field of psychology” (again, from their website). In fact, they published the original theoretical paper, so surely they would at least consider a paper which argues against this theory? As in, send it out for review? And reject it based on flaws, rather than the standard explanation of it being uninteresting to a broad audience? Given that they published the original theoretical article, and all? Right?
Wrong, on all points.
Submission 3: A well-respected, but not huge impact factor journal in cognitive psychology
I agreed to submit this paper to a non-open-access journal again, but only under the condition that at least one of my co-authors would have a bet with me: if it got rejected, I would get a bottle of good whiskey. Spoiler alert: I am now the proud owner of a 10-year aged bottle of Bushmills.
To be fair, this round of reviews brought some cogent and interesting comments. The first reviewer provided some insightful remarks, but their main concern was that “The main message here seems to be a negative one.” Furthermore, the reviewer “found the theoretical rationale [for the choice of paradigm] to be rather simplistic”. Your words, not mine! However, for a failure to replicate, this is irrelevant. As many researchers rely on what may or may not be a simplistic theoretical framework which is based on the original studies, we need to know whether the evidence put forward by the original studies is reliable.
I could not quite make sense of all of the second reviewer’s comment, but somehow they argued that the paper was “overkill”. (It is very long and dense, to be fair, but I do have a lot of data to analyse. I suspect most readers will skip from the introduction to the discussion, anyway – but anyone who wants the juicy details of the analyses should have easy access to them.)
Next step: Open-access journal
I like the idea of open-access journals. However, when I submitted previous versions of the manuscript I was somewhat swayed by the argument that going open access would decrease the visibility and credibility of the paper. This is probably true, but without any doubt, the next step will be to submit the paper to an open-access journal. Preferably one with open review. I would like to see a reviewer calling a paper “irritating” in a public forum.
At least in this case, traditional journals have shown – well, let’s just say that we still have a long way to go in improving replicability in psychological sciences. For now, I have uploaded a pre-print of the paper on OSF and on researchgate. On researchgate, the article has over 200 views, suggesting that there is some interest in this theory; the finding that the key study is not replicable seems relevant to researchers. Nevertheless, I wonder if the failure to provide support for this theory will ever gain as much visibility as the original study – how many researchers will put their trust into a theory that they might be more sceptical about if they knew the key study is not as robust as it may seem?
In the meantime, my offer of a bottle of beverage for anyone who can show that the analyses or data are fundamentally flawed, still stands.
* Beer, wine, whiskey, brandy: You name it. Limited only by my post-doc budget.
** The full references of all papers cited throughout the blogpost can be found in the preprint of our paper.
Edit 30/6: Thanks all for the comments so far, I'll have a closer look at how I can implement your helpful suggestions when I get the chance!
Please note that I will delete comments from spammers and trolls. If you feel the urge to threaten physical violence, please see your local counsellor or psychologist.