Monday, June 27, 2016

What happens when you try to publish a failure to replicate in 2015/2016

Anyone who has talked to me in the last year would have heard me complain about my 8-times-failure-to-replicate which nobody wants to publish. The preprint, raw data and analysis scripts are available here, so anyone can judge for themselves if they think the rejections to date are justified. In fact, if anyone can show me that my conclusions are wrong – that the data are either inconclusive, or that they actually support an opposite view – I will buy them a bottle of drink of their choice*. So far, this has not happened.

I promise to stop complaining about this after I publish this blog post. I think it is important to be aware of the current situation, but I am, by now, just getting tired of debates which go in circles (and I’m sure many others feel the same way). Therefore, I pledge that from now on I will stop writing whining blog posts, and I will only write happy ones – which have at least one constructive comment or suggestion about how we could improve things.

So, here goes my last ever complaining post. I should stress that the sentiments and opinions I describe here are entirely my own; although I’ve had lots of input from my wonderful co-authors in preparing the manuscript of my unfortunate paper, they would probably not agree with many of the things I am writing here.

Why is it important to publish failures to replicate?

People who haven’t been convinced by the arguments put forward to date will not be convinced by a puny little blogpost. In fact, they will probably not even read this. Therefore, I will not go into details about why it is important to publish failures to replicate. Suffice it to say that this is not my opinion – it’s a truism. If we combine a low average experimental power with selective publishing of positive results, we – to use Daniel Lakens’ words – get “a literature that is about as representative of real science as porn movies are representative of real sex”. We get over-inflated effect sizes across experiments, even if an effect is non-existent; or, in the words of Michael Inzlicht, “meta-analyses are fucked”.

Our study

The interested reader can look up further details of our study in the OSF folder I linked above ( The study is about the Psycholinguistic Grain Size Theory (Ziegler & Goswami, 2005)**. If you type the name of this theory into google – or some other popular search terms, such as “dyslexia theory”, “reading across languages”, or “reading development theory” – you will see this paper on the first page. It has 1650 citations, at the time of writing of this blogpost. In other words, this theory is huge. People rely on it to interpret their data, and to guide their experimental designs and theories in diverse topics of reading and dyslexia.

The evidence for the Psycholinguistic Grain Size Theory is summarised in the preprint linked above; the reader can decide for themselves if they find it convincing. During my PhD, I decided to do some follow-up experiments on the body-N effect (Ziegler & Perry, 1998; Ziegler et al., 2001; Ziegler et al., 2003). Why? Not because I wanted to build my career on the ruins of someone else’s work (which is apparently what some people think of replicators), but because I found the theory genuinely interesting, and I wanted to do further work to specify the locus of this effect. So I did study after study after study – blaming myself for the messy results – until I realised: I had conducted eight experiments, and the effect just isn’t there. So I conducted a meta-analysis on all of our data, plus an unpublished study by a colleague with whom I’d talked about this effect, wrote it up and submitted it.

Surely, in our day and age, journals should welcome null-results as much as positive results? And any rejections would be based on flaws in the study?

Well, here is what happened:

Submission 1: Relatively high-impact journal for cognitive psychology

Here is a section directly copied-and-pasted from a review:

“Although the paper is well-written and the analyses are quite substantial, I find the whole approach rather irritating for the following reasons:

1. Typically meta-analyses are done one [sic] published data that meet the standards for publishing in international peer-reviewed journals. In the present analyses, the only two published studies that reported significant effects of body-N and were published in Cognition and Psychological Science were excluded (because the trial-by-trial data were no longer available) and the authors focus on a bunch of unpublished studies from a dissertation and a colleague who is not even an author of the present paper. There is no way of knowing whether these unpublished experiments meet the standards to be published in high-quality journals.”

Of course, I picked the most extreme statement. Other reviewers had some cogent points – however, nothing that would compromise the conclusions. The paper was rejected because “the manuscript is probably too far from what we are looking for”.

Submission 2: Very high-impact psychology journal

As a very ambitious second plan, we submitted the paper to one of the top journals in psychology. It’s a journal which “publishes evaluative and integrative research reviews and interpretations of issues in scientific psychology. Both qualitative (narrative) and quantitative (meta-analytic) reviews will be considered, depending on the nature of the database under consideration for review” (from their website). They have even announced a special issue on Replicability and Reproducibility, because their “primary mission […] is to contribute a cohesive, authoritative, theory-based, and complete synthesis of scientific evidence in the field of psychology” (again, from their website). In fact, they published the original theoretical paper, so surely they would at least consider a paper which argues against this theory? As in, send it out for review? And reject it based on flaws, rather than the standard explanation of it being uninteresting to a broad audience? Given that they published the original theoretical article, and all? Right?

Wrong, on all points.

Submission 3: A well-respected, but not huge impact factor journal in cognitive psychology

I agreed to submit this paper to a non-open-access journal again, but only under the condition that at least one of my co-authors would have a bet with me: if it got rejected, I would get a bottle of good whiskey. Spoiler alert: I am now the proud owner of a 10-year aged bottle of Bushmills.

To be fair, this round of reviews brought some cogent and interesting comments. The first reviewer provided some insightful remarks, but their main concern was that “The main message here seems to be a negative one.” Furthermore, the reviewer “found the theoretical rationale [for the choice of paradigm] to be rather simplistic”. Your words, not mine! However, for a failure to replicate, this is irrelevant. As many researchers rely on what may or may not be a simplistic theoretical framework which is based on the original studies, we need to know whether the evidence put forward by the original studies is reliable.

I could not quite make sense of all of the second reviewer’s comment, but somehow they argued that the paper was “overkill”. (It is very long and dense, to be fair, but I do have a lot of data to analyse. I suspect most readers will skip from the introduction to the discussion, anyway – but anyone who wants the juicy details of the analyses should have easy access to them.)

Next step: Open-access journal

I like the idea of open-access journals. However, when I submitted previous versions of the manuscript I was somewhat swayed by the argument that going open access would decrease the visibility and credibility of the paper. This is probably true, but without any doubt, the next step will be to submit the paper to an open-access journal. Preferably one with open review. I would like to see a reviewer calling a paper “irritating” in a public forum.

At least in this case, traditional journals have shown – well, let’s just say that we still have a long way to go in improving replicability in psychological sciences. For now, I have uploaded a pre-print of the paper on OSF and on researchgate. On researchgate, the article has over 200 views, suggesting that there is some interest in this theory; the finding that the key study is not replicable seems relevant to researchers. Nevertheless, I wonder if the failure to provide support for this theory will ever gain as much visibility as the original study – how many researchers will put their trust into a theory that they might be more sceptical about if they knew the key study is not as robust as it may seem?

In the meantime, my offer of a bottle of beverage for anyone who can show that the analyses or data are fundamentally flawed, still stands.


* Beer, wine, whiskey, brandy: You name it. Limited only by my post-doc budget.
** The full references of all papers cited throughout the blogpost can be found in the preprint of our paper.


Edit 30/6: Thanks all for the comments so far, I'll have a closer look at how I can implement your helpful suggestions when I get the chance!

Please note that I will delete comments from spammers and trolls. If you feel the urge to threaten physical violence, please see your local counsellor or psychologist.


  1. There could be other reasons your article's not being published. Can you find any other examples of a null-results study in the journals you're submitting them to? If so, those would be the ones I'd focus on. But there are still other factors to consider in the world of academic journal publications that may not be apparent.

  2. If you can truly see the flaws in present-day science, you should advocate open science and seek to publish in an open-access journal first (with as much data as possible). Don't you think?
    Both and welcome null and negative results, and I'm sure that there are others. Impact factor is not the holy grail anymore.
    Surely some dinosaur colleagues will raise an eyebrow at anything not published in one of the old world's top notch journals, but it's the fate of dinosaurs to become irrelevant, no matter how big and dominant they seem right now. I'm saying that with a lot of respect to the wise and experienced, but this respect can't cloud our judgement as we strive to make science better, more transparent, more public, and actually closer to its original intentions.
    An open-access + open-data publication is several times more visible than one behind a paywall. It might take a bit of extra work on your side to make it as popular, but in case you missed it: Science is a social activity, and this has ramifications on which and whose science is the most fashionable at the moment.
    Making the pre-print available is a step in the right direction, of course.

    It sounds like what you're looking for is a hybrid review model, where a few pre-publication reviewers stamp your work as valid and everyone else are welcome to publish criticism post-publication.

  3. Xenia, I must admit that I became immediately prejudiced against you as soon as I saw the Word file with the results :).

    It's 2016, we can do better than this. I suggest you switch to knitr + LaTeX for publicizing your R code. Also, I didn't like the fact that I have to download each file separately from the OSF website; there should be a zip archive with everything in it.

  4. As you mention, the results are a bit dense. Have you considered adding some graphical summaries to the results and removing the text descriptions of some of the stats? It might make it a bit more appealing and easier to read. Think Nature and Science style of reporting. If I was in your shoes, I'd try to shorten it, summarize the results more succinctly with pretty R ggplot2 style graphs, remove some of the statistical text descriptions in the Method section, and “sell” the paper a bit more. The best piece I’ve seen written on this last point is here: You may also want to try open-access journals like PLOS-ONE. Good luck!

  5. Not sure why anyone would care whether a document is published in Word, Latex or anything else.

    1. The purpose of the document is to spread knowledge. If I write it on the back of a postage stamp, in a language that went extinct two thousand years ago, that's clearly a problem. The document will be difficult for people to read, and difficult for them to extract information from (be it by hand, or by some electronic copy and paste, or for the purposes of reprinting or reformatting, or the extraction of subsets of data by some script or program).

      If I choose to write the document in some other way, that is much more accessible and easy to read and deal with, then there is a much larger chance that people will read it.

      Microsoft word documents are a pain to deal with, and they place onto the reader the onus of having a programme that can handle it. Different versions of Word interact badly, and after a few iterations it's not unusual to have a document soup. This is a bad thing.

      So that's why it matters. Because the whole point is to have the information easily accessible to people, and easy for them to work with, and if you choose to present that information in a way that makes it harder, you're decreasing the likelihood that people will read it.

      I've used an explicit example of Microsoft Word here, but the point isn't the specifics. The point is the principle that I have just explained.

    2. In my experience, when someone lacks the imagination to comment on something substantial and interesting, they'll fall back to piffling trivia such as file format, or similar. Sad, really.

    3. I think the concept here is that people are in support of the work and are trying to suggest ways to circumvent the possible journal publication barrier.
      Sadder to see a troll here really.

  6. Actually you can reframe this work in a positive manner. I noticed (on a very brief scan and it isn't my field) that you have evidence for an "inhibitory N-body" effect. Did the previous work see that effect and mis-interpret it? You might actually be illuminating the mechanism in a deeper manner - and as the result of more careful experimentation.

    1. This is a very nice suggestion. You could reframe and tell the story from this perspective while still presenting the null results, etc. There clearly is a lot in your manuscript, but the "story" could be told a bit differently, and made to be a bit more visually appealing and easier to read.

  7. This comment has been removed by the author.

  8. This comment has been removed by a blog administrator.