There is a lot of bullshit out there. Every day, we are
faced with situations where we need to decide whether a given piece of
information is trustworthy or not. This is a difficult task: A lot of
information that we encounter requires a great deal of expertise in a specific
field, and nobody is able to become an expert on all issues which we encounter
on a day-to-day basis (just to name a few: politics, history, nutrition,
medicine, psychology, educational sciences, physics, law, artificial intelligence).
In the current blogpost, I will focus on educational
sciences. This is an area where it is very important – to everyone ranging from
parents and teachers through to education researchers – to be able to
distinguish bullshit from trustworthy information. Believing in bullshit can,
in the best case, lead to a waste of time and money (parents investing into
educational methods that don’t work; researchers building on studies which turn
out to be non-replicable). In the worst case, children will undergo educational
practices or interventions for developmental disorders which distract from more
effective, evidence-based methods or may even be harmful in the long run.
Many people are interested in science and the scientific
method. These people mostly know that the first question you ask if you
encounter something that sounds dodgy is: “But has this study been
peer-reviewed?” We know that peer-review is fallible: This can be shown simply
by taking the example of predatory journals, which will publish anything, under
the appearance of a peer-reviewed paper, for a fee. While it is often (but not
always) obvious to experts in a field that a given journal is predatory, this will
be a more difficult task for someone without scientific training. In this
blogpost, I will mainly focus on a thought-experiment: What if, instead, we
(researchers, as well as the general public), asked: “But has this study been
pre-registered?”
I will discuss the advantages and potential pitfalls of this
shift in mind-set. But first, because I’m still working on convincing some of my colleagues that pre-registration is important for educational sciences and developmental psychology, I describe two examples that demonstrate how important it is to be
able to tell between trustworthy and untrustworthy research. These are
real-life examples that I encountered in the last few weeks, but I changed some
of the names: while the concepts described raise a lot of red flags associated with pseudoscience, I don’t have the time or resources to conduct a thorough investigation
to show that they are not trustworthy, and I don’t want to get into any fights
(or law-suits) about these particular issues.
Example 1: Assessment
and treatment for healthy kids
The first example comes from someone I know who asked me for
advice. They had found a centre which assesses children on a range of tests, to
see if they have any hidden developmental problems or talents. After a thorough
assessment session (and, as I found out through a quick google search, a $300
fee), the child received a report of about 20 pages. As the centre specialises
in children who have both a problem and a talent, it is not surprising that the
child was diagnosed with both a problem and a talent (although, interestingly,
a series of standardised IQ tests showed no problems). The non-standardised
assessments tested for disorders that, during 7 years of study and 4 years of
working as a post-doc in cognitive psychology, I had never heard of before. A
quick google search revealed that there was
some peer-reviewed literature on these disorders. But the research on a
given disorder came always from one-and-the-same person or “research” group,
mostly with the affiliation of an institute that made money by selling
treatments for this disorder.
The problem with the above assessment is: Most skills are
normally distributed, meaning that, on a given test, some children will be very
good, and some children will be very bad. If you take a single child and give
them a gazillion tests, you will always find a test on which they perform
particularly badly and one on which they perform particularly well. One child
might be particularly fast at peeling eggs, for example. A publication could
describe a study where 200 children were asked to peel as many eggs as possible
within 3 minutes, and there was a small number of children who were shockingly
bad at peeling eggs (“Egg Peeling Disorder”, or EPD for short). This does not
mean that this ability will have any influence whatsoever on their academic or
social development. But, in addition, we can collect data on a large number of
variables that are indicative of children’s abilities: five reading tests, five
mathematics tests, tests of fine motor skills, gross motor skills, vocabulary,
syntactic skills, physical strength, the frequency of social interactions – the
list goes on and on. Again, by the laws of probability, as we increase the
number of variables, we increase the probability that at least one of them will
be correlated with the ability to peel an egg, just by chance.
Would it help to ask: “Has this studies been
pre-registered?” Above, I described a way in which any stupid idea can be
turned into a paper showing that a given skill can be measured and correlates
with real-life outcomes. By maximising the number of variables, the laws of
probability give us a very good chance to find a significant result. In a
pre-registered report, the researchers would have to declare, before they
collect or look at the data, which tests they plan to use, and where they
expect to find significant correlations. This gives less wiggle-space for
significance-fishing, or combing the data for significant results which likely
just reflect random noise.
Example 2: Clinical
implications of ghost studies
The second example is from the perspective of a researcher. A
recent paper I came across reviewed studies on a certain clinical population performing
tasks tapping a cognitive skill – let’s call it “stylistical turning”. The
review concluded that clinical groups perform, on average, worse than control
groups on stylistical turning tasks, and suggests stylistical turning training
to improve outcomes in this clinical population. Even disregarding the correlation-causation
confusion, the conclusion of this paper is problematic, because in this
particular case, I happen to know of two well-designed unpublished studies
which did not find that the clinical group performed worse than a control group
– in fact, both found that the stylistical turning task used by the original
study doesn’t even work! Yet, as far as I know, neither has been published
(even though I’d encouraged the researchers behind these studies to submit).
So, the presence of unpublished “ghost” studies, which cannot be found through
a literature search, has profound consequences for a question of clinical
importance.
Would it help in this case to demand that studies are
pre-registered? Yes, because pre-registration involves creating a record, prior
to the collection of data, that this study will be conducted. In the case of
our ghost studies, someone conducting a literature review would at least be
able to find the registration plan. Even if the data did not end up being
published, the person conducting the literature review could (and should)
contact the authors and ask what became of these studies.
Is pre-registration
really the holy grail?
As for most complex issues, it would be overly simplistic to
conclude that pre-registration would fix everything. Pre-registration should
help combat questionable research practices (fishing for significance in a
large sea of data, as described in Example 1), and publication bias (selective
publication of positive results, leading to a distorted literature, as
described in Example 2). These are issues that standard peer-review cannot
address: When I review the manuscript of a non-pre-registered paper, I cannot
possibly know if the authors collected data on 50 other variables and report
only the ones that came out as significant. Similarly, I cannot possibly know
if, somewhere else in the world, a dozen other researchers conducted the exact
same experiment and did not find a significant result.
What would happen if we – researchers and the general public
alike – began to demand that studies must be pre-registered if they are to be
used to inform practice? Luckily, medical research is ahead of educational
sciences on this front: Indeed, pre-registration
seems to decrease the number of significant findings, which probably
reflects a more realistic view of the world.
So, what could possibly go wrong with pre-registration?
First, if we start demanding pre-registered reports now, we can pretty much
throw away everything we’ve learned about educational sciences so far. There
is a lot of bullshit out there for sure, but there are also effective methods,
which show consistent benefits across many studies. But, as pre-registration
has not really kicked off yet in educational sciences, none of these studies
have been pre-registered. This raises important questions: Should we repeat all
existing studies in a pre-registered format, even when is a general consensus
among researchers that a given method is effective? On the one hand, this would
be very time- and resource-consuming. On the other hand, some things that we
think we know turn out to be false. And besides, even when a method is, in
reality, effective, the
selective publication of positive results makes it looks like the method is
much more effective than it really is. In addition to knowing what works
and what doesn’t, we also need to make decisions about which method works
better: this requires a good understanding of the extent to which a method
helps.
It is also clear that we need to change the research
structure before we unconditionally demand pre-registered reports: at this
stage, it would be unfair to judge a paper as worthless if it has not been
pre-registered, because pre-registration is just not done in educational
sciences (yet). If more journals offered the registered
report format, and researchers were incentivised for publishing
pre-registered studies rather than mass-producing significant results, this
would set the conditions for the general public to start demanding that practice
is informed by pre-registered studies only.
As a second issue, there is one thing we have learned from
peer-review: When there are strong incentives, people learn to play the system.
At this stage, peer-review is the major determining factor of which studies are
considered trustworthy by fellow researchers, policy-makers and stakeholders. This has resulted in
a market for predatory journals, which publish anything under the appearance of
a peer-reviewed paper, if you give them money. Would it be possible to play the
system of pre-registered reports?
It is worth noting that there are two ways to do a
pre-registration. One way is for a researcher to write the pre-registration
report, upload it by themselves as a time-stamped, non-modifiable document, and
then to go ahead with the data collection. In the final paper, they add a
link to the pre-registration report. Peer-review occurs at the stage when the
data has already been collected. Both the peer-reviewers and the readers can
download the pre-registration report and compare the authors’ plans with what
they actually did. It is possible to cheat with this format: Run the study,
look at the result, and write the pre-registered report retrospectively, based
on the results that have already come out as significant. The final paper can
then be submitted with a link to the fake pre-registered report, and with a bit
of luck, the study would appear as pre-registered in a peer-reviewed journal.
This would be straight-out cheating as opposed to being in a moral grey-zone,
which is the current status of questionable research practices. But it could be
a real concern when there are strong incentives involved.
The second way to do a pre-registration is the so-called
registered report (RR) format. Here, journals conduct peer-review of the
pre-registered report rather than the final paper. This means that the paper is
evaluated based on its methodological soundness and the strength of the
proposed analyses. After the reviewers approve of the pre-registered plan, the
authors get the thumbs-up to start data collection. Cheating by submitting a plan
for a study that has already been conducted becomes difficult in this format,
because reviewers are likely to propose some changes to the methodology: if the
data has already been collected, the cheating authors would be put in a
checkmate position because they would need to collect new data after making the
methodological changes.
For both formats, there are more subtle ways to maximise the
chances of supporting your theory (let’s say, if you have a financial interest
in the results coming out in a certain way). A bad pre-registration report
could be written in a way that is vague: As we saw in Example 1, this would
still give the authors room to wiggle with their analyses until they find a
significant result (e.g., “We will test mathematical skills”, but neglecting to
mention that 5 different tests will be used, and all possible permutations of
these tests will be used to calculate an average score until one of them turns
out to be significant). This would be less likely to happen with the RR format
than with non-peer-reviewed pre-registration, because a good peer-reviewer
should be able to pick up on this vagueness, and demand that the authors
specify exactly which variables they will measure, how they will measure them,
and how they will analyse them. But the writer of the registered report could
hope for inattentive reviewers, or submit to many different journals until one
finally accepts the sloppily-written report. To circumvent this problem, then,
it is necessary to combine RRs with rigorous peer-review. From this
perspective, the most important task of the reviewer is to make sure that the
registered report is written in a clear and unambiguous manner, and that the
resulting paper closely follows what the authors said they would do in the
registered report.
Conclusion
So, should we start demanding that educational practice is
based on pre-registered studies? In an ideal world: Yes. But for now, we need
top-down changes inside the academic system, which would encourage researchers
to conduct pre-registered studies.
Is it possible to cheat with pre-registered reports in such
a way that we don’t end up solving the problems I outlined in this blogpost?
Probably yes, although a combination of the RR format (where the pre-registered
report rather than the final paper is submitted to a journal) and rigorous
peer-review should minimise such issues.
What should we do in the meantime? My proposed course of
action will be to focus on making it more common among education researchers to
pre-register their studies. One way to achieve this is to encourage
as many journals as possible to adopt the RR format. To have good
peer-review for RRs, we also need to spread awareness among researchers about
what to look out for when reviewing a RR. Some journals which publish RRs, such
as Cortex, have very
detailed guidelines for reviewers. In addition, perhaps workshops about how
to review a RR could be useful.