Xenia Schmalz's blog: August 2018

There is a lot of bullshit out there. Every day, we are faced with situations where we need to decide whether a given piece of information is trustworthy or not. This is a difficult task: A lot of information that we encounter requires a great deal of expertise in a specific field, and nobody is able to become an expert on all issues which we encounter on a day-to-day basis (just to name a few: politics, history, nutrition, medicine, psychology, educational sciences, physics, law, artificial intelligence).

In the current blogpost, I will focus on educational sciences. This is an area where it is very important – to everyone ranging from parents and teachers through to education researchers – to be able to distinguish bullshit from trustworthy information. Believing in bullshit can, in the best case, lead to a waste of time and money (parents investing into educational methods that don’t work; researchers building on studies which turn out to be non-replicable). In the worst case, children will undergo educational practices or interventions for developmental disorders which distract from more effective, evidence-based methods or may even be harmful in the long run.

Many people are interested in science and the scientific method. These people mostly know that the first question you ask if you encounter something that sounds dodgy is: “But has this study been peer-reviewed?” We know that peer-review is fallible: This can be shown simply by taking the example of predatory journals, which will publish anything, under the appearance of a peer-reviewed paper, for a fee. While it is often (but not always) obvious to experts in a field that a given journal is predatory, this will be a more difficult task for someone without scientific training. In this blogpost, I will mainly focus on a thought-experiment: What if, instead, we (researchers, as well as the general public), asked: “But has this study been pre-registered?”

I will discuss the advantages and potential pitfalls of this shift in mind-set. But first, because I’m still working on convincing some of my colleagues that pre-registration is important for educational sciences and developmental psychology, I describe two examples that demonstrate how important it is to be able to tell between trustworthy and untrustworthy research. These are real-life examples that I encountered in the last few weeks, but I changed some of the names: while the concepts described raise a lot of red flags associated with pseudoscience, I don’t have the time or resources to conduct a thorough investigation to show that they are not trustworthy, and I don’t want to get into any fights (or law-suits) about these particular issues.

Example 1: Assessment and treatment for healthy kids

The first example comes from someone I know who asked me for advice. They had found a centre which assesses children on a range of tests, to see if they have any hidden developmental problems or talents. After a thorough assessment session (and, as I found out through a quick google search, a $300 fee), the child received a report of about 20 pages. As the centre specialises in children who have both a problem and a talent, it is not surprising that the child was diagnosed with both a problem and a talent (although, interestingly, a series of standardised IQ tests showed no problems). The non-standardised assessments tested for disorders that, during 7 years of study and 4 years of working as a post-doc in cognitive psychology, I had never heard of before. A quick google search revealed that there was some peer-reviewed literature on these disorders. But the research on a given disorder came always from one-and-the-same person or “research” group, mostly with the affiliation of an institute that made money by selling treatments for this disorder.

The problem with the above assessment is: Most skills are normally distributed, meaning that, on a given test, some children will be very good, and some children will be very bad. If you take a single child and give them a gazillion tests, you will always find a test on which they perform particularly badly and one on which they perform particularly well. One child might be particularly fast at peeling eggs, for example. A publication could describe a study where 200 children were asked to peel as many eggs as possible within 3 minutes, and there was a small number of children who were shockingly bad at peeling eggs (“Egg Peeling Disorder”, or EPD for short). This does not mean that this ability will have any influence whatsoever on their academic or social development. But, in addition, we can collect data on a large number of variables that are indicative of children’s abilities: five reading tests, five mathematics tests, tests of fine motor skills, gross motor skills, vocabulary, syntactic skills, physical strength, the frequency of social interactions – the list goes on and on. Again, by the laws of probability, as we increase the number of variables, we increase the probability that at least one of them will be correlated with the ability to peel an egg, just by chance.

Would it help to ask: “Has this studies been pre-registered?” Above, I described a way in which any stupid idea can be turned into a paper showing that a given skill can be measured and correlates with real-life outcomes. By maximising the number of variables, the laws of probability give us a very good chance to find a significant result. In a pre-registered report, the researchers would have to declare, before they collect or look at the data, which tests they plan to use, and where they expect to find significant correlations. This gives less wiggle-space for significance-fishing, or combing the data for significant results which likely just reflect random noise.

Example 2: Clinical implications of ghost studies

The second example is from the perspective of a researcher. A recent paper I came across reviewed studies on a certain clinical population performing tasks tapping a cognitive skill – let’s call it “stylistical turning”. The review concluded that clinical groups perform, on average, worse than control groups on stylistical turning tasks, and suggests stylistical turning training to improve outcomes in this clinical population. Even disregarding the correlation-causation confusion, the conclusion of this paper is problematic, because in this particular case, I happen to know of two well-designed unpublished studies which did not find that the clinical group performed worse than a control group – in fact, both found that the stylistical turning task used by the original study doesn’t even work! Yet, as far as I know, neither has been published (even though I’d encouraged the researchers behind these studies to submit). So, the presence of unpublished “ghost” studies, which cannot be found through a literature search, has profound consequences for a question of clinical importance.

Would it help in this case to demand that studies are pre-registered? Yes, because pre-registration involves creating a record, prior to the collection of data, that this study will be conducted. In the case of our ghost studies, someone conducting a literature review would at least be able to find the registration plan. Even if the data did not end up being published, the person conducting the literature review could (and should) contact the authors and ask what became of these studies.

Is pre-registration really the holy grail?

As for most complex issues, it would be overly simplistic to conclude that pre-registration would fix everything. Pre-registration should help combat questionable research practices (fishing for significance in a large sea of data, as described in Example 1), and publication bias (selective publication of positive results, leading to a distorted literature, as described in Example 2). These are issues that standard peer-review cannot address: When I review the manuscript of a non-pre-registered paper, I cannot possibly know if the authors collected data on 50 other variables and report only the ones that came out as significant. Similarly, I cannot possibly know if, somewhere else in the world, a dozen other researchers conducted the exact same experiment and did not find a significant result.

What would happen if we – researchers and the general public alike – began to demand that studies must be pre-registered if they are to be used to inform practice? Luckily, medical research is ahead of educational sciences on this front: Indeed, pre-registration seems to decrease the number of significant findings, which probably reflects a more realistic view of the world.

So, what could possibly go wrong with pre-registration? First, if we start demanding pre-registered reports now, we can pretty much throw away everything we’ve learned about educational sciences so far. There is a lot of bullshit out there for sure, but there are also effective methods, which show consistent benefits across many studies. But, as pre-registration has not really kicked off yet in educational sciences, none of these studies have been pre-registered. This raises important questions: Should we repeat all existing studies in a pre-registered format, even when is a general consensus among researchers that a given method is effective? On the one hand, this would be very time- and resource-consuming. On the other hand, some things that we think we know turn out to be false. And besides, even when a method is, in reality, effective, the selective publication of positive results makes it looks like the method is much more effective than it really is. In addition to knowing what works and what doesn’t, we also need to make decisions about which method works better: this requires a good understanding of the extent to which a method helps.

It is also clear that we need to change the research structure before we unconditionally demand pre-registered reports: at this stage, it would be unfair to judge a paper as worthless if it has not been pre-registered, because pre-registration is just not done in educational sciences (yet). If more journals offered the registered report format, and researchers were incentivised for publishing pre-registered studies rather than mass-producing significant results, this would set the conditions for the general public to start demanding that practice is informed by pre-registered studies only.

As a second issue, there is one thing we have learned from peer-review: When there are strong incentives, people learn to play the system. At this stage, peer-review is the major determining factor of which studies are considered trustworthy by fellow researchers, policy-makers and stakeholders. This has resulted in a market for predatory journals, which publish anything under the appearance of a peer-reviewed paper, if you give them money. Would it be possible to play the system of pre-registered reports?

It is worth noting that there are two ways to do a pre-registration. One way is for a researcher to write the pre-registration report, upload it by themselves as a time-stamped, non-modifiable document, and then to go ahead with the data collection. In the final paper, they add a link to the pre-registration report. Peer-review occurs at the stage when the data has already been collected. Both the peer-reviewers and the readers can download the pre-registration report and compare the authors’ plans with what they actually did. It is possible to cheat with this format: Run the study, look at the result, and write the pre-registered report retrospectively, based on the results that have already come out as significant. The final paper can then be submitted with a link to the fake pre-registered report, and with a bit of luck, the study would appear as pre-registered in a peer-reviewed journal. This would be straight-out cheating as opposed to being in a moral grey-zone, which is the current status of questionable research practices. But it could be a real concern when there are strong incentives involved.

The second way to do a pre-registration is the so-called registered report (RR) format. Here, journals conduct peer-review of the pre-registered report rather than the final paper. This means that the paper is evaluated based on its methodological soundness and the strength of the proposed analyses. After the reviewers approve of the pre-registered plan, the authors get the thumbs-up to start data collection. Cheating by submitting a plan for a study that has already been conducted becomes difficult in this format, because reviewers are likely to propose some changes to the methodology: if the data has already been collected, the cheating authors would be put in a checkmate position because they would need to collect new data after making the methodological changes.

For both formats, there are more subtle ways to maximise the chances of supporting your theory (let’s say, if you have a financial interest in the results coming out in a certain way). A bad pre-registration report could be written in a way that is vague: As we saw in Example 1, this would still give the authors room to wiggle with their analyses until they find a significant result (e.g., “We will test mathematical skills”, but neglecting to mention that 5 different tests will be used, and all possible permutations of these tests will be used to calculate an average score until one of them turns out to be significant). This would be less likely to happen with the RR format than with non-peer-reviewed pre-registration, because a good peer-reviewer should be able to pick up on this vagueness, and demand that the authors specify exactly which variables they will measure, how they will measure them, and how they will analyse them. But the writer of the registered report could hope for inattentive reviewers, or submit to many different journals until one finally accepts the sloppily-written report. To circumvent this problem, then, it is necessary to combine RRs with rigorous peer-review. From this perspective, the most important task of the reviewer is to make sure that the registered report is written in a clear and unambiguous manner, and that the resulting paper closely follows what the authors said they would do in the registered report.

Conclusion

So, should we start demanding that educational practice is based on pre-registered studies? In an ideal world: Yes. But for now, we need top-down changes inside the academic system, which would encourage researchers to conduct pre-registered studies.

Is it possible to cheat with pre-registered reports in such a way that we don’t end up solving the problems I outlined in this blogpost? Probably yes, although a combination of the RR format (where the pre-registered report rather than the final paper is submitted to a journal) and rigorous peer-review should minimise such issues.

What should we do in the meantime? My proposed course of action will be to focus on making it more common among education researchers to pre-register their studies. One way to achieve this is to encourage as many journals as possible to adopt the RR format. To have good peer-review for RRs, we also need to spread awareness among researchers about what to look out for when reviewing a RR. Some journals which publish RRs, such as Cortex, have very detailed guidelines for reviewers. In addition, perhaps workshops about how to review a RR could be useful.

Xenia Schmalz's blog

Tuesday, August 21, 2018

“But has this study been pre-registered?” Can registered reports improve the credibility of science?

Blog Archive