You are working on a
theoretical paper about the proposed relationship between X and Y. A
two-experiment study has previously shown that X and Y are correlated, and you
are trying to explain the cognitive mechanisms that drive this correlation.
This previous study makes conclusions based on partial correlations which take
into account a moderator that has not been postulated a priori; raw
correlations are not reported. The p-values for each of the two partial
correlations are < 0.05, but > 0.04. In a theoretical paper, you stress
that although it makes theoretical sense that there would be a correlation
between these variables, we cannot be sure about this link.
In a different
paradigm, several studies have found a group difference in a certain task. In
most studies, this group difference has a Cohen’s d of around 0.2. However, three
studies which all come from the same lab report Cohen’s ds ranging between 0.8 and 1.1. You
calculate that it is very unlikely to obtain three huge effects such as these
by chance alone (probability < 1%).
For a different
project, you fail to find an effect which has been reported by a previously
published experiment. The authors of this previous study have published their
raw data a few years after the original paper came out. You take a close look
at this raw data, and find some discrepancies with the means reported in the
paper. When you analyse the raw data, the effect disappears.
What would you do in each of the scenarios above? I would be
very happy to hear about it in the comments!
From each of these scenarios, I would draw two conclusions:
(1) The evidence reported by these studies is not strong, to say the least, and
(2) it is likely that the authors used what we now call questionable research
practices to obtain significant results. The question is what we can conclude
in our hypothetical paper, where the presence or absence of the effect
is critical. Throwing around accusations of p-hacking
can turn ugly. First, we cannot be absolutely sure that there is something fishy. Even if you calculate that the likelihood of obtaining a certain
result is minimal, it is still greater than zero – you can never be completely
sure that there really is something questionable going on. Second, criticising
someone else’s work is always a hairy issue. Feelings may get hurt, and the
desire for revenge may arise; careers can get destroyed. Especially as an
early-career researcher, one wants to stay clear of close-range combat.
Yet, if your work rests on these results, you need to make something of them. One could just ignore
them – not cite these papers, pretend they don’t exist. It is difficult to draw
conclusions from studies with questionable research practices, so they may as
well not be there. But ignoring relevant published work would be childish and
unscientific. Any reader of your paper who is interested in the topic will
notice this omission. Therefore, one needs to at least explain why one thinks
the results of these studies may not be reliable.
One can’t explain why one doesn’t trust a study without
citing it – a general phrase such as: “Previous work has shown this effect, but
future research is needed to confirm its stability” will not do. We could
remain general in our accusations: “Previous work has shown this effect (Lemmon & Matthau, 2000), but future research is needed to confirm its stability”. This,
again, does not sound very convincing.
There are therefore two possibilities: either we drop the
topic altogether, or we write down exactly why the results of the published
studies would need to be replicated before we would trust them, kind of like what I
did in the examples at the top of the page. This, of course, could be
misconstrued as a personal attack. Describing
such studies in my own papers is an exercise involving very careful phrasing
and proofreading for diplomacy by very nice colleagues. Unfortunately, this
often leads to the watering down of arguments, and tip-toeing around the real
issue, which is the believability of a specific result. And when we think about
it, this is what we are criticising – not the original researchers. Knowledge
about questionable research practices is spreading gradually; many researchers
are still in the process of realising that they can really damage a research
area. Therefore, judging researchers for what they have done in the past would
be neither productive, nor wise.
Should we judge a scientist for having used questionable research
practices? In general, I don’t think so. I am convinced that the majority of
researchers don’t intend to cheat, but they are convinced that they have
legitimately maximised their chance to find a very small and subtle effect. It is, of course, the responsibility of a criticiser to make it clear that a problem is with the study, not with the researcher who conducted it. But the researchers whose work is being criticised should also consider whether the criticism is fair, and respond accordingly. If they are prepared to correct any mistakes – publishing file-drawer studies, releasing untrimmed data, conducting a replication, or in more extreme cases publishing a correction or even retracting a paper – it is unlikely that they will be judged negatively by the scientific community, quite on the contrary.
But
there are a few hypothetical scenarios where my opinion of the researcher would
decrease: (1) If the questionable research practice was data fabrication rather
than something more benign such as creative outlier removal, (2) if the
researchers use any means possible to suppress studies which criticise or fail
to replicate their work, or (3) if the researchers continue to engage in
questionable research practices, even after they learn that it increases their
false-positive rate. This last point bears further consideration, because
pleading ignorance is becoming less and less defensible. By now, a researcher
would need to live under a rock if they have not even heard about the
replication crisis. And a good, curious researcher should follow up on hearing
such rumours, to check whether issues in replicability could also apply to
them.
In summary, criticising existing studies is essential for
scientific progress. Identifying potential issues with experiments will save
time as researchers won’t go off on a wild-goose-chase for an effect that
doesn’t exist; it will help us to narrow down on studies which need to be
replicated before we consider that they are backed up by evidence. The
criticism of a study, however, should not be conflated with criticism of the
researcher – either by the criticiser or by the person being criticised. A strong distinction between the criticism of a study versus criticism of a researcher would result in a climate where discussions about reproducibility of specific
studies will lead to scientific progress rather than a battlefield.
Excellent post, captures the problems with this sort of criticism nicely. Criticism is essential but I agree people can take it personally (especially if it's their idea or the study that has made them famous). We need to try and be objective (which is obviously difficult). Perhaps before publishing you could try post-publication peer review (if the journal it's published in allows it). Alternatively you could contact the authors politely outlining your concerns with the paper? Then you can publish your clear but not overly harsh criticisms confident you've engaged in a dialogue with them (or at least tried). Someone (can't remember who, will try and find out) argued we should have a "year zero" rule: all past instances of QRP's are forgiven and we start afresh. I like this idea and it will get more people to be open about their past use of QRP's. I agree that 2 and 3 would also lower my estimation of a researcher. I would also add anyone who refuses to admit they used QRP's, even when there is evidence to the contrary. Of course there will be some who legitimately haven't but they are some common and so easy to do I'd be surprised.
ReplyDeleteAs a very, very early career researcher, I find a good part of the discussion about QRP intimidating, and I greatly appreciate your point of view! I know it is a lot to ask, but if I were one of the authors in the examples above, I would probably appreciate being contacted as well. The fact that I published something (has yet to happen in the field of my thesis, so purely hypothetical) doesn't mean that I stop thinking about it. As it is likely that I still work on the same topic, I might have found room for improvement myself. A direct discussion of potential issues could thus be useful for your paper or even the research topic in question in general, while also giving me the possibility to save my face, if only "behind the scenes".
ReplyDelete