TL;DR: Probably not.
Imagine there is a way to improve
reading ability in children in dyslexia, which is fun and efficient. For
parents of children with dyslexia this would be great: No more dragging your
child to therapists, spending endless hours in the evening trying to get the
child to practice their letter-sound rules or forcing them to sit down with a book. According to
several recent papers, a fun and quick treatment to improve reading ability
might be in sight, and every parent can apply this treatment in their own home:
Action video gaming.
Action video games differ from
other types of games, because they involve situations where the player has to
quickly shift their attention from one visual stimulus to another. First-person
shooter games are a good example: one might focus on one part of the screen,
and then an “enemy” appears and one needs to direct the visual attention to him
and shoot him1.
The idea that action video gaming
could improve reading ability is not as random as might seem at first sight.
Indeed, there is a large body of work, albeit very controversial, that suggests
that children or adults with dyslexia might have problems with shifting visual
attention. The idea that a visual deficit might underlie dyslexia originates
from the early 1980s (Badcock et al., Galaburda et al.; references are in the
articles linked below), thus it is not in any way novel or revolutionary. A
summary of this work would warrant a separate blog post or academic publication,
but for some (favourable) reviews, see Vidyasagar, T. R.,
& Pammer, K. (2010). Dyslexia: a deficit in visuo-spatial attention, not in
phonological processing. Trends in Cognitive Sciences, 14(2),
57-63 (downloadable here)
or Stein, J., & Walsh, V. (1997). To see but not to read; the magnocellular
theory of dyslexia. Trends in neurosciences, 20(4), 147-152
(downloadable here),
or (for a more agnostic review) Boden, C., & Giaschi, D. (2007). M-stream
deficits and reading-related visual processes in developmental dyslexia. Psychological
Bulletin, 133(2), 346 (downloadable here).
It is worth noting that there is little consensus, amongst the proponents of
this broad class of visual-attentional deficit theories, about the exact
cognitive processes that are impaired and how they would lead to problems with
reading.
The way research
should proceed is clear: If there is a theoretical groundwork, based on
experimental studies, to suggest that a certain type of treatment might work,
one does a randomised controlled trial (RCT): A group of patients are randomly
divided into two groups, one is subjected to the treatment in question, and the
other to a control treatment, and we compare the improvement between pre- and
post-measurement in the two groups. To date, there are three such studies:
Franceschini, S.,
Gori, S., Ruffino, M., Viola, S., Molteni, M., & Facoetti, A. (2013).
Action video games make dyslexic children read better. Current Biology, 23(6),
462-466 (here)
Franceschini, S.,
Trevisan, P., Ronconi, L., Bertoni, S., Colmar, S., Double, K., ... & Gori,
S. (2017). Action video games improve reading abilities and visual-to-auditory
attentional shifting in English-speaking children with dyslexia. Scientific
Reports, 7(1), 5863 (here), and
Gori, S., Seitz, A.
R., Ronconi, L., Franceschini, S., & Facoetti, A. (2016). Multiple causal
links between magnocellular–dorsal pathway deficit and developmental dyslexia. Cerebral
Cortex, 26(11), 4356-4369 (here).
In writing the
current critique, I am assuming no issues with the papers at stake, or with the
research skills or integrity of the researchers. Rather, I would like to show
that, under the above assumptions, the three studies may provide a highly
misleading picture of the effect of video gaming on reading ability. The
implications are clear and very important: Parents of children with dyslexia
have access to many different sources of information, some of which provide
only snake-oil treatments. From a quick google search for “How to cure
dyslexia”, the first five links suggest modelling letters out of clay, early
assessment, multi-sensory instructions, more clay sculptures, and teaching
phonemic awareness. As reading researchers, we should not add to the confusion or divert resources from treatments that have actually been shown to work, by
adding yet another “cure” to the list.
So, what is my gripe
with these three papers? First, that there are only three such papers. As I
mentioned above, the idea that there is a deficit in visual-attentional
processing amongst people with dyslexia, and that this might be a cause of
their poor reading ability, has been floating around for over 30 years. We know
that the best way to establish causality is through a treatment study (RCT): We
have known this for well over thirty years2. So, why didn’t more
people conduct and publish RCTs on this topic?
The Mystery of Missing Data
Here is a hypothesis
which, admittedly, is difficult to test: RCTs have been conducted for 30 years,
but only three of them ever got published. This is a well-known phenomenon in
scientific publishing: in general, studies which report positive findings are
easier to publish. Studies which do not find a significant result tend to get
stored in file-drawer archives. This is called the File-Drawer Problem, and has
been discussed as early as 1979 (Rosenthal, R. (1979). The "File Drawer
Problem" and Tolerance for Null Results. Psychological Bulletin, 86(3),
638-641, here).
The reason this is a problem goes back to the very definition of the
statistical test we generally use to establish significance: The p-value. p-values are considered “significant” if they are below 0.05, i.e.,
below 5%. The p-value is defined as
the probability of obtaining the data or more extreme observations, under the
assumption that the null hypothesis is true. They key is the second part. By
rephrasing the definition, we get the following: When the effect is not there,
the p-value tells us that it is there
5% of the time. This is a feature, not a bug, as it does exactly what the p-value was designed to do: It gives us
a long-run error rate and allows us to keep it constant at 5% across a set of
studies. But this desired property becomes invalidated in a world where we only
publish positive results. In a scenario where the effect is not there, 5 in 100 studies will give us a significant p-value,
on average. If only the five significant studies are published, we have a 100% rate of false
positives (significant p-values
in the absence of a true effect) in the literature. If we assume that the action video gaming effect is not there, then
we would expect, on average, three false
positives out of 60 studies3. Is it possible
that in 30 years, there is an accumulation of studies which trained dyslexic
children’s visual-attentional skills and observed no improvement?
Magnitude Errors
The second issue in the currently
published literature relates to the previous point, and extends to the
possibility that there might be an effect of action video gaming on reading
ability. So, for now, let’s assume the effect is there. Perhaps it is even a
big effect, let’s say, it has a standardised effect size (Cohen’s d) of 0.3, which is considered to be a small-to-medium-size
effect. Realistically, the effect of action video gaming on reading ability is very unlikely to be bigger, since the
best-established treatment effects have shown effect sizes of around 0.3
(Galuschka et al., 2014; here).
We can simulate very easily (in R)
what will happen in this scenario. We pick a sample of 16 participants (the
number of dyslexic children assigned to the action video gaming group in
Franceschini et al., 2017). Then, we calculate the average improvement across
the 16 participants, in the standardised score:
x=rnorm(16,0.3,1)
mean(x)
The first average value I get a mean improvement of 0.24.
Not bad. Then I run the code again, and get a whooping 0.44! Next time, not so
lucky: 0.09. And then, we even get a negative effect, of -0.30.
This is just a brief illustration of the fact that, when you
sample from the population, your observed effect will jump around the true
population effect size due to random variation. This might seem trivial to
some, but, unfortunately, this fact is often forgotten even by well-established
researchers, who may go on to treat an observed effect size as a precise
estimate.
When we sample, repeatedly, from a population, and plot a
histogram of all the observed means, we get a normal
distribution: A fair few observed means will be close to the true
population mean, but some will not be at all.
We’re closing in on the point I
want to make here: Just by chance, someone will eventually run an experiment
and obtain an effect size of 0.7, even if the true effect is 0.5, 0.2, or even
0. Bigger observed effects, when all else is equal, will yield significant results
while smaller observed effects will be non-significant. This means: If you run
a study, and by chance you observe an effect size that is bigger than the
population effect size, there will be a higher probability that it will be
significantly and get published. If your identical twin sibling runs an
identical study but happens to obtain an effect size that is smaller than yours
– even if it corresponds to the true effect size! – it may not be significant,
and they will be forced to stow it in their file drawer.
Given that only the significant
effects are published (or even if there is a disproportionate number of positive compared to negative outcomes), we end up with a skewed literature. In the first-case
scenario, we considered the possibility that the effect might not be there at
all. In the second scenario, we assume that the effect is there, but even so,
the published studies, due to the presence of publication bias, may have
captured effect sizes that are larger than the actual treatment effect. This
has been called by Gelman & Carlin (2014, here)
the “Magnitude Error”, and has been described, with an illustration that I like
to use in talks, by Schmidt in 1992 (see Figure 2, here).
Getting back to action video gaming
and dyslexia: Maybe action video gaming improves dyslexia. We don’t know: Given
only three studies, it is difficult to adjudicate between two possible scenarios
(no effect + publication bias or small effect + publication bias).
So, let’s have a look at the
effects reported in the three published papers. I will ignore the 2013 paper4,
because it only provides the necessary descriptives in figures rather than
tables, and the journal format hides the methods section with vital information
about the number of participants god-knows-where. In the 2017 paper, Table 1 provides
the pre- and post-measurement values of the experimental and control group, for
word reading speed, word reading accuracy, phonological decoding (pseudoword
reading) speed, and phonological decoding accuracy. The paper even reports the
effect sizes: The action video game training had no effect on reading accuracy.
For speed, the effect sizes are d =
0.27 and d = 0.45 for word and
pseudoword reading, respectively. In the 2015 paper, the effect size for the
increase in speed for word reading (second row of the table) is 0.34, and for
pseudoword reading ability, it is 0.58.
The effect sizes are thus
comparable across studies. Putting the effect sizes into context: The 2017
study found an increase in speed, from 88 seconds to 76 seconds to read a list of
words, and from 86 seconds to 69 seconds to read a list of pseudowords. For
words, this translates to an increase in speed of 14%: In practical terms, if
it takes a child 100 hours to read a book before training, it would take the
same child only 86 hours to read the same book after training.
In experimental terms, this is not
a huge effect, but it competes with the effect sizes for well-established
treatment methods such as phonics instruction (Hedge’s g’ = 0.32; Galuschka et al., 2014)5. Phonics instruction
focuses on a proximal cause of poor reading: A deficit in mapping speech sounds
onto print. We would expect a focus of proximal causes to have a stronger
effect than a focus on distal causes, where there are many intermediate steps
between a deficit and reading ability, as explained by McArthur and Castles
(2017) here. In
our case, the following things have to happen for a couple of weeks of action
video gaming to improve reading ability:
- Playing first-person shooter
games has to increase children’s ability to switch their attention rapidly,
- The type of attention switching
during reading is the same as the attention switching to a stimulus which
appears suddenly on the screen,
- Improving your visual attention leads to an
increase in reading speed.
There are ifs and buts at each of
these steps. The link between action video gaming and visual-attentional
processing would be diluted by other things which train children’s visual-attentional
skills, such as how often they read, played tennis, sight-read sheet music, or
looked through “Where’s Wally” books during the training period.6 In
between visual-attentional processing and reading ability, are other variables which
affect reading ability and dilute this link: the amount of time they read at
home, motivation and tiredness at the first versus the second testing time
point, and many others. These other factors dilute the treatment effect by
adding variability to the experiment that is not due to the treatment. This
should lead to smaller effect sizes.
In short: There might be an effect
of action video gaming on reading ability. But I’m willing to bet that it will
be smaller than the effect reported in the published studies. I mean this
literally: I will buy a good bottle of a drink of your choice to anyone who can
convince me that the effect 2 weeks of action video gaming on reading ability
is in the vicinity of d = 0.3.
How to provide a convincing case for an effect of action video gaming
on reading ability
The idea that something as simple
as action video gaming can improve children’s ability to do one of the most
complex tasks they learn at school is an incredible claim. Incredible claims
require very strong evidence. Especially if the claim has practical
implications.
To convince me, one would have to
conduct a study which is (1) well-powered, and (2) pre-registered. Let’s assume
that the effect is, indeed, d = 0.3.
With g*power, we can easily
calculate how many participants we would need to recruit for 80% power. Setting
“Means: Difference between two dependent means (matched pairs)” in “Statistical
test”, a one-tailed test (note that both of these decisions increase power,
i.e., decrease the number of required participants), effect size of 0.3, alpha
of 0.05 and power of 0.8, it shows that we need 71 children in a
within-children design to have adequate power to detect such an effect.
A study should also be
pre-registered. This would remove the possibility of the authors tweaking the
data, analysis and variables until they get significant results. This is
important in reading research, because there are many different ways in which
reading ability can be calculated. For example, Gori and colleagues (Table 3)
present 6 different dependent variables that can be used as the outcome
measure. The greater the amount of variables one can possibly analyse, the
greater the flexibility for conducting analyses until at least some contrast becomes
significant (Simmons et al., 2011, here).
Furthermore, pre-registration will reduce the overall effect of publication bias, because
there will be a record of someone having started a given study:
In short: To make a convincing case
that there is an effect of the magnitude reported in the published literature,
we would need a pre-registered study with at least 70 participants in a
within-subject design.
Some final recommendations
For
researchers: I hope that I managed to illustrate how publication bias can
lead to magnitude errors: the illusion that an effect is much bigger than it
actually is (regardless of whether or not it exists). Your perfect study which
you pre-registered and published with a significant result and without p-hacking might be interpreted very
differently if we knew about all the unpublished studies that are hidden away.
This is a pretty terrifying thought: As long as publication bias exists, you
can be entirely wrong with the interpretation of your study, even if you do all
the right things. We are quickly running out of excuses: We need to move
towards pre-registration, especially for research questions such as the one I
discussed here, which has strong practical implications. So, PLEASE PLEASE
PLEASE, no more underpowered and non-registered studies of action video gaming
on reading ability.
For
funders: Unless a study on the effect of action video gaming on reading
ability is pre-registered and adequately powered, it will not give us
meaningful results. So, please don’t spend any more of the tax payers’ money on
studies that cannot be used to address the question they set out to answer. In
case you have too much money and don’t know what to do with it: I am looking
for funding for a project on GPC learning and automatisation in reading
development and dyslexia.
For
parents and teachers who want to find out what’s best for their child or
student: I don’t know what to tell you. I hope we’ll sort out the publication
bias thing soon. In the meantime, it’s best to focus on proximal causes of
reading problems, as proposed by McArthur and Castles (2017) here.
-------------------------------------------------------
1 I know absolutely
nothing about shooter games, but from what I understand characters there tend
to be males.
2 More like 300 years,
Wikipedia informs me.
3 This assumes no
questionable research practices: With questionable research practices, the
false positive rate may inflate to 60%, meaning that we would need to assume
the presence of only 2 unpublished studies which did not find a significant
treatment effect (Simmons et al., 2011, here)
4 I can do this in a
blog post, right?
5 And this is probably
an over-estimation, given publication bias.
6 If playing action
video games increases visual-attentional processing ability, then so should,
surely, these other things?
No comments:
Post a Comment