Statistical learning is a hot
topic, with papers about a link between statistical learning ability and
reading and/or dyslexia mushrooming all over the place. In this blog post, I am
very sceptical about statistical learning, but before I continue, I should make
it clear that it is, in principle, an interesting topic, and there are a lot of
studies which I like very much.
I’ve published two papers on
statistical learning and reading/dyslexia. My main research interest is in
cross-linguistic differences in skilled reading, reading acquisition, and
dyslexia, which was also the topic of my PhD. The reason why, during my first
post-doc, I became interested in the statistical learning literature, was, in
retrospect, exactly the reason why I should have stayed away from it: It seemed
relevant to everything I was doing.
From the perspective of
cross-linguistic reading research, statistical learning seemed to be integral
to understanding cross-linguistic differences. This is because the statistical distributions underlying the
print-to-speech correspondences differ across orthographies: in
orthographies such as English, children need to extract statistical
regularities such as a being often
pronounced as /ɔ/ when it succeeds a w (e.g.,
in “swan”). The degree to which these statistical regularities provide reliable
cues differ across orthographies: for example, in Finnish, letter-phoneme
correspondences are reliable, such that children don’t need to extract a large
number of subtle regularities in order to be able to read accurately.
From a completely different
perspective, I became interested in the role of
letter bigram frequency during reading. One can count how often a given
letter pair co-occurs in a given orthography. The question is whether the
average (or summed) frequency of the bigrams within a word affects the speed
with which this word is processed. This is relevant to psycholinguistic
experiments from a methodological perspective: if letter bigram frequency
affects reading efficiency, it’s a factor that needs to be controlled while
selecting items for an experiment. Learning the frequency of letter
combinations can be thought of as a sort of statistical learning task, because
it involves the conditional probabilities of a letter given the other.
The relevance of statistical
learning to everything should have been a warning sign, because, as we know
from Karl Popper, something that explains everything actually explains nothing.
This becomes clearer when we ask the first question that a researcher should
ask: What is statistical learning? I don’t want to claim that there is no
answer to this question, nor do I want to provide an extensive literature
review of the studies that do provide a precise definition. Suffice it to say:
Some papers have definitions of statistical learning that are extremely broad,
which is the reason why it is often used as a hand-wavy term denoting a
mechanism that explains everything. This is an example of a one-word
explanation, a term coined by Gerd Gigerenzer in his paper “Surrogates
for theories” (one of my favourite papers). Other papers provide more
specific definitions, for example, by defining statistical learning based on a
specific task that is supposed to measure it. However, I have found no
consensus among these definitions: and given that different researchers have
different definitions for the same terminology, the resulting theoretical and
empirical work is (in my view) a huge mess.
In addition to these theoretical
issues, there is also a big methodological mess when it comes to the literature
on statistical learning and reading or dyslexia. I’ve written about this in more
detail in our two papers (linked above), but here I will list the
methodological issues in a more compact manner: First, when we’re looking at
individual differences (for example, by correlating reading ability and
statistical learning ability), the lack of a task with good psychometric
properties becomes a huge problem. This issue has been discussed in a number of
publications by Noam Siegelman and colleagues, who even developed a task with
good psychometric properties for adults (e.g., here
and here).
However, as far as I’ve seen, there are still no published studies on reading
ability or dyslexia using improved tasks. Furthermore, recent
evidence suggests that a statistical learning task which works well with
adults still has very poor psychometric properties when applied to children.
Second, the statistical learning
and reading literature is a good illustration of all the issues that are
associated with the replication crisis. Some of these are discussed in our
systematic review about statistical learning and dyslexia (linked above). The
publication bias in this area (selective publication of significant results)
became even clearer to me when I presented our study on statistical learning
and reading ability – where we obtained a null result – at the SSSR conference
in Brighton (2018). There were several proponents of the statistical learning
theory (if we can call it that) of reading and dyslexia, but none of them came
to my poster to discuss this null result. Conversely, a number of people dropped
by to let me know that they’ve conducted similar studies and also gotten null results.
Papers on statistical learning and
reading/dyslexia continue to be published, and at some point, I was close to
being convinced that maybe, visual statistical learning is related to learning
to read in orthographies with a visually complex orthography. But then, some
major methodological or statistical issue always jumps out at me when I read a
paper closely enough. The literature reviews of these papers tend to be biased,
often listing studies with null-results as evidence for the presence of an
effect, or else picking out all the flaws of papers with null results, while
treating the studies with positive results as a holy grail. I have stopped
reading such papers, because it does not feel like a productive use of my time.
I have also stopped accepting
invitations to review papers about statistical learning and reading/dyslexia,
because I have started to doubt my ability to give an objective review. By now,
I have a strong prior that there is no link between domain-general statistical
learning ability and reading/dyslexia. I could be convinced otherwise, but
would require very strong evidence (i.e., a number large-scale pre-registered
studies from independent labs with psychometrically well-established tasks). While
I strongly believe that such evidence is required, I realise that it is
unreasonable to expect such studies from most researchers who conduct this type
of research, who are mainly early-career researchers who base their methodology
on previous studies.
I also stopped doing or planning
any studies on domain-general statistical learning. The amount of energy
necessary to refute bullshit is an order of magnitude bigger than to produce
it, as Alberto Brandolini famously tweeted. This is not to say that everything
to do with statistical learning and reading/dyslexia is bullshit, but – well,
some of it definitely is. I hope that good research will continue to be done in
this area, and that the state of the literature will become clearer because of
this. In the meantime, I have made the personal decision to move away from this
line of research. I have received good advice from one of my PhD supervisors:
not to get hung up on research that I think is bad, but to pick an area where I
think there is good work and to build on that. Sticking to this advice
definitely makes the research process more fun (for me). Statistical learning
studies are likely to yield null results, which end up uninterpretable because
of the psychometric issues with statistical learning tasks. Trying to publish
this kind of work is not
a pleasant experience.
Why did I write this blog post?
Partly, just to vent. I wrote it as a blog post and not as a theoretical paper,
because it lacks the objectivity and a systematic approach which would be
required for a scientifically sound piece of writing. If I were to write a scientifically
sound paper, I would need to break my resolution to stop doing research on
statistical learning, so a blog post it is. Some of the issues above have been
discussed in our systematic review about statistical learning and dyslexia, but
I also thought it would be good to summarise these arguments in a more concise
form. Perhaps some beginning PhD student who is thinking about doing their
project on statistical learning and reading will come across this post. In this
case, my advice would be: pick a different topic.