Xenia Schmalz's blog: March 2019

Thursday, March 14, 2019

Why I plan to move away from statistical learning research

Statistical learning is a hot topic, with papers about a link between statistical learning ability and reading and/or dyslexia mushrooming all over the place. In this blog post, I am very sceptical about statistical learning, but before I continue, I should make it clear that it is, in principle, an interesting topic, and there are a lot of studies which I like very much.

I’ve published two papers on statistical learning and reading/dyslexia. My main research interest is in cross-linguistic differences in skilled reading, reading acquisition, and dyslexia, which was also the topic of my PhD. The reason why, during my first post-doc, I became interested in the statistical learning literature, was, in retrospect, exactly the reason why I should have stayed away from it: It seemed relevant to everything I was doing.

From the perspective of cross-linguistic reading research, statistical learning seemed to be integral to understanding cross-linguistic differences. This is because the statistical distributions underlying the print-to-speech correspondences differ across orthographies: in orthographies such as English, children need to extract statistical regularities such as a being often pronounced as /ɔ/ when it succeeds a w (e.g., in “swan”). The degree to which these statistical regularities provide reliable cues differ across orthographies: for example, in Finnish, letter-phoneme correspondences are reliable, such that children don’t need to extract a large number of subtle regularities in order to be able to read accurately.

From a completely different perspective, I became interested in the role of letter bigram frequency during reading. One can count how often a given letter pair co-occurs in a given orthography. The question is whether the average (or summed) frequency of the bigrams within a word affects the speed with which this word is processed. This is relevant to psycholinguistic experiments from a methodological perspective: if letter bigram frequency affects reading efficiency, it’s a factor that needs to be controlled while selecting items for an experiment. Learning the frequency of letter combinations can be thought of as a sort of statistical learning task, because it involves the conditional probabilities of a letter given the other.

The relevance of statistical learning to everything should have been a warning sign, because, as we know from Karl Popper, something that explains everything actually explains nothing. This becomes clearer when we ask the first question that a researcher should ask: What is statistical learning? I don’t want to claim that there is no answer to this question, nor do I want to provide an extensive literature review of the studies that do provide a precise definition. Suffice it to say: Some papers have definitions of statistical learning that are extremely broad, which is the reason why it is often used as a hand-wavy term denoting a mechanism that explains everything. This is an example of a one-word explanation, a term coined by Gerd Gigerenzer in his paper “Surrogates for theories” (one of my favourite papers). Other papers provide more specific definitions, for example, by defining statistical learning based on a specific task that is supposed to measure it. However, I have found no consensus among these definitions: and given that different researchers have different definitions for the same terminology, the resulting theoretical and empirical work is (in my view) a huge mess.

In addition to these theoretical issues, there is also a big methodological mess when it comes to the literature on statistical learning and reading or dyslexia. I’ve written about this in more detail in our two papers (linked above), but here I will list the methodological issues in a more compact manner: First, when we’re looking at individual differences (for example, by correlating reading ability and statistical learning ability), the lack of a task with good psychometric properties becomes a huge problem. This issue has been discussed in a number of publications by Noam Siegelman and colleagues, who even developed a task with good psychometric properties for adults (e.g., here and here). However, as far as I’ve seen, there are still no published studies on reading ability or dyslexia using improved tasks. Furthermore, recent evidence suggests that a statistical learning task which works well with adults still has very poor psychometric properties when applied to children.

Second, the statistical learning and reading literature is a good illustration of all the issues that are associated with the replication crisis. Some of these are discussed in our systematic review about statistical learning and dyslexia (linked above). The publication bias in this area (selective publication of significant results) became even clearer to me when I presented our study on statistical learning and reading ability – where we obtained a null result – at the SSSR conference in Brighton (2018). There were several proponents of the statistical learning theory (if we can call it that) of reading and dyslexia, but none of them came to my poster to discuss this null result. Conversely, a number of people dropped by to let me know that they’ve conducted similar studies and also gotten null results.

Papers on statistical learning and reading/dyslexia continue to be published, and at some point, I was close to being convinced that maybe, visual statistical learning is related to learning to read in orthographies with a visually complex orthography. But then, some major methodological or statistical issue always jumps out at me when I read a paper closely enough. The literature reviews of these papers tend to be biased, often listing studies with null-results as evidence for the presence of an effect, or else picking out all the flaws of papers with null results, while treating the studies with positive results as a holy grail. I have stopped reading such papers, because it does not feel like a productive use of my time.

I have also stopped accepting invitations to review papers about statistical learning and reading/dyslexia, because I have started to doubt my ability to give an objective review. By now, I have a strong prior that there is no link between domain-general statistical learning ability and reading/dyslexia. I could be convinced otherwise, but would require very strong evidence (i.e., a number large-scale pre-registered studies from independent labs with psychometrically well-established tasks). While I strongly believe that such evidence is required, I realise that it is unreasonable to expect such studies from most researchers who conduct this type of research, who are mainly early-career researchers who base their methodology on previous studies.

I also stopped doing or planning any studies on domain-general statistical learning. The amount of energy necessary to refute bullshit is an order of magnitude bigger than to produce it, as Alberto Brandolini famously tweeted. This is not to say that everything to do with statistical learning and reading/dyslexia is bullshit, but – well, some of it definitely is. I hope that good research will continue to be done in this area, and that the state of the literature will become clearer because of this. In the meantime, I have made the personal decision to move away from this line of research. I have received good advice from one of my PhD supervisors: not to get hung up on research that I think is bad, but to pick an area where I think there is good work and to build on that. Sticking to this advice definitely makes the research process more fun (for me). Statistical learning studies are likely to yield null results, which end up uninterpretable because of the psychometric issues with statistical learning tasks. Trying to publish this kind of work is not a pleasant experience.

Why did I write this blog post? Partly, just to vent. I wrote it as a blog post and not as a theoretical paper, because it lacks the objectivity and a systematic approach which would be required for a scientifically sound piece of writing. If I were to write a scientifically sound paper, I would need to break my resolution to stop doing research on statistical learning, so a blog post it is. Some of the issues above have been discussed in our systematic review about statistical learning and dyslexia, but I also thought it would be good to summarise these arguments in a more concise form. Perhaps some beginning PhD student who is thinking about doing their project on statistical learning and reading will come across this post. In this case, my advice would be: pick a different topic.

Sunday, March 10, 2019

What’s next for Registered Reports? Selective summary of a meeting (7.3.2019)

Last week, I attended a meeting about Registered Reports. It was a great opportunity, not only to discuss Registered Reports, but also to meet some people whom I had previously only known from twitter, over a glass of Whiskey close to London Bridge.

The meeting felt very productive, and I took away a lot of new information, about the Registered Report Format in general, and also some specific things that will be useful to me when I submit my next Registered Report. Here, I don’t aim to summarise everything that was discussed, but to focus on those aspects that could be of practical importance to individual researchers.

What’s stopping researchers from submitting Registered Reports?

We dedicated the entire morning to discussing how to increase the submission rate of Registered Reports. Before the meeting, I had done an informal survey among colleagues and on twitter to see what reasons people had for not submitting Registered Reports. The response rate was pretty low, suggesting that a lack of interest may be a leading factor (due either to apathy or scepticism – from my informal survey, I can’t tell). From people who did respond, the main reason was time: often, especially younger researchers are on short-term contracts (1-3 years), and are pressured for various reasons to start data collection as soon as possible. Among such reasons, people mentioned grants: funders often expect strict adherence to a timeline. And, unfortunately, such timing pressures disproportionately affect earlier career researchers, exactly the demographic which is most open to trying out a new way of conducting and publishing research.

Submitting a Registered Report may take a while – there is no point sugar-coating this. In contrast to standard studies, authors of Registered Reports need to spend more time to plan the study, because writing the report involves planning in detail; there may be several rounds of review before in-principle acceptance, and addressing reviewers’ comments may involve collecting pilot data. Given my limited experience, I would estimate that about 6-9 months would need to be added to the study timeline before one can count with in-principle acceptance and data collection can be started.

Of course, the increase in time that you spend before conducting the experiment will substantially improve the quality of the paper. A Registered Report is very likely to cut a lot of time at the end of the research cycle: when realising how long it may take to get in-principle acceptance, you should always bear in mind the painstakingly slow and frustrating process of submitting a paper to one journal after the other, accumulating piles of reviews varying in constructiveness and politeness, being told about methodological flaws that now you can’t fix, about how your results should have been different, and eventually unceremoniously throwing the study which you started with such great enthusiasm into the file-drawer.

Long-term benefits aside, unfortunately the issue of time remains for researchers on short-term contracts and with grant pressures. We could not think of any quick fix to this problem. In the long term, solutions may involve planning in this time when you write your next grant application. One possibility could be to write that you plan to conduct a systematic review during the time that you wait for in-principle acceptance. In my recently approved grant from the Deutsche Forschungsgemeinschaft, I proposed two studies: for the first study, I optimistically included a period of three months for “pre-registration and set-up”, and for the second study a period of twelve months (because this would happen in parallel to data collection for the first study). This somewhat backfired, because, while the grant was approved, they cut 6 months from my proposed timeline because they considered 12 months to be way too long for “pre-registration and set-up”. So, the strategy of planning for registered reports in grant applications may work, but bear in mind that it’s not risk-free.

A new thing that I learned about during the meeting are Registered Report Research Grants: Here, journals pair up with funding agencies, and reviews of the Registered Report happens in parallel to the review of the funding proposal. This way, once in-principle acceptance is in, the funding is released and data collection can start. This sounds like an amazingly efficient win-win-win solution, and I sincerely hope that funding agencies will routinely offer such grants.

How to encourage researchers to publish Registered Reports?

Here, I’ll list a few bits and pieces that were suggested as solutions. Some of these are aimed at the individual researcher, though many would require some top-down changes. The demographic most happy to try out a new publication system, as mentioned above, are likely to be early-career researchers, especially PhD students.

Members at the meeting reported positive experiences with department-driven working groups, such as the ReproducibiliTea initiative or Open Science Cafés. In some departments, such working groups have led to PhD students taking the initiative and proposing to their advisors that they would like to do their next study as a Registered Report. We discussed that encouraging PhD students to publish one study as a Registered Report could be a good recommendation. For departments which have formal requirements about the number of publications that are needed in order to graduate, a Registered Report could count more than a standard publication: let’s say, they either need to publish three standard papers, or one standard paper and a Registered Report (or two Registered Reports).

Deciding to publish a Registered Report is like jumping into cold water: the format requires some pretty big changes in the way that a study is conducted, and one is unsure if it will really impress practically important people (such as potential employers or grant reviewers) over pumping out standard publications. Taking a step back, taking a deep breath and thinking about the pros and cons, I would say that, in many cases, the advantages outweigh the disadvantages. Yes, the planning stage may take longer, but you will cut time at the end, during the publication process, with a much higher success that the study will be published. A fun fact I learned during the meeting: At the journal Cortex, once a Registered Report gets past the editorial desk (i.e., the editors established that the paper fits the scope of the journal), the rejection rate is only 10% (which is why we need more journals adopting the Registered Report format: this way, any paper, including those of interest to a specialised audience, will be able to find a good home). And, once you have in-principle acceptance, you can list the paper in on your CV, which is (to many professors) much more impressive than a list of "in preparation"/"submitted" publications. If the Stage 1 review process takes unusually long and you're running out of time in your contract, you can withdraw the Registered Report, incorporate the comments to date, and conduct the experiment as a Preregistered Study.

Summary

Some of the suggestions listed above are aimed at individual researchers. The meeting was encouraging and helpful in terms of getting some suggestions that could be applied here and now. It also made it clear that top-down changes are required: the Registered Report format involves a different timeline compared to standard submissions, so university expectations (e.g., in terms of the required number of publications for PhD students, short-term post-doc contracts) and funding structures need to be changed.