TL;DR: Probably not.
Imagine there is a way to improve reading ability in children in dyslexia, which is fun and efficient. For parents of children with dyslexia this would be great: No more dragging your child to therapists, spending endless hours in the evening trying to get the child to practice their letter-sound rules or forcing them to sit down with a book. According to several recent papers, a fun and quick treatment to improve reading ability might be in sight, and every parent can apply this treatment in their own home: Action video gaming.
Action video games differ from other types of games, because they involve situations where the player has to quickly shift their attention from one visual stimulus to another. First-person shooter games are a good example: one might focus on one part of the screen, and then an “enemy” appears and one needs to direct the visual attention to him and shoot him1.
The idea that action video gaming could improve reading ability is not as random as might seem at first sight. Indeed, there is a large body of work, albeit very controversial, that suggests that children or adults with dyslexia might have problems with shifting visual attention. The idea that a visual deficit might underlie dyslexia originates from the early 1980s (Badcock et al., Galaburda et al.; references are in the articles linked below), thus it is not in any way novel or revolutionary. A summary of this work would warrant a separate blog post or academic publication, but for some (favourable) reviews, see Vidyasagar, T. R., & Pammer, K. (2010). Dyslexia: a deficit in visuo-spatial attention, not in phonological processing. Trends in Cognitive Sciences, 14(2), 57-63 (downloadable here) or Stein, J., & Walsh, V. (1997). To see but not to read; the magnocellular theory of dyslexia. Trends in neurosciences, 20(4), 147-152 (downloadable here), or (for a more agnostic review) Boden, C., & Giaschi, D. (2007). M-stream deficits and reading-related visual processes in developmental dyslexia. Psychological Bulletin, 133(2), 346 (downloadable here). It is worth noting that there is little consensus, amongst the proponents of this broad class of visual-attentional deficit theories, about the exact cognitive processes that are impaired and how they would lead to problems with reading.
The way research should proceed is clear: If there is a theoretical groundwork, based on experimental studies, to suggest that a certain type of treatment might work, one does a randomised controlled trial (RCT): A group of patients are randomly divided into two groups, one is subjected to the treatment in question, and the other to a control treatment, and we compare the improvement between pre- and post-measurement in the two groups. To date, there are three such studies:
Franceschini, S., Gori, S., Ruffino, M., Viola, S., Molteni, M., & Facoetti, A. (2013). Action video games make dyslexic children read better. Current Biology, 23(6), 462-466 (here)
Franceschini, S., Trevisan, P., Ronconi, L., Bertoni, S., Colmar, S., Double, K., ... & Gori, S. (2017). Action video games improve reading abilities and visual-to-auditory attentional shifting in English-speaking children with dyslexia. Scientific Reports, 7(1), 5863 (here), and
Gori, S., Seitz, A. R., Ronconi, L., Franceschini, S., & Facoetti, A. (2016). Multiple causal links between magnocellular–dorsal pathway deficit and developmental dyslexia. Cerebral Cortex, 26(11), 4356-4369 (here).
In writing the current critique, I am assuming no issues with the papers at stake, or with the research skills or integrity of the researchers. Rather, I would like to show that, under the above assumptions, the three studies may provide a highly misleading picture of the effect of video gaming on reading ability. The implications are clear and very important: Parents of children with dyslexia have access to many different sources of information, some of which provide only snake-oil treatments. From a quick google search for “How to cure dyslexia”, the first five links suggest modelling letters out of clay, early assessment, multi-sensory instructions, more clay sculptures, and teaching phonemic awareness. As reading researchers, we should not add to the confusion or divert resources from treatments that have actually been shown to work, by adding yet another “cure” to the list.
So, what is my gripe with these three papers? First, that there are only three such papers. As I mentioned above, the idea that there is a deficit in visual-attentional processing amongst people with dyslexia, and that this might be a cause of their poor reading ability, has been floating around for over 30 years. We know that the best way to establish causality is through a treatment study (RCT): We have known this for well over thirty years2. So, why didn’t more people conduct and publish RCTs on this topic?
The Mystery of Missing Data
Here is a hypothesis which, admittedly, is difficult to test: RCTs have been conducted for 30 years, but only three of them ever got published. This is a well-known phenomenon in scientific publishing: in general, studies which report positive findings are easier to publish. Studies which do not find a significant result tend to get stored in file-drawer archives. This is called the File-Drawer Problem, and has been discussed as early as 1979 (Rosenthal, R. (1979). The "File Drawer Problem" and Tolerance for Null Results. Psychological Bulletin, 86(3), 638-641, here).
The reason this is a problem goes back to the very definition of the statistical test we generally use to establish significance: The p-value. p-values are considered “significant” if they are below 0.05, i.e., below 5%. The p-value is defined as the probability of obtaining the data or more extreme observations, under the assumption that the null hypothesis is true. They key is the second part. By rephrasing the definition, we get the following: When the effect is not there, the p-value tells us that it is there 5% of the time. This is a feature, not a bug, as it does exactly what the p-value was designed to do: It gives us a long-run error rate and allows us to keep it constant at 5% across a set of studies. But this desired property becomes invalidated in a world where we only publish positive results. In a scenario where the effect is not there, 5 in 100 studies will give us a significant p-value, on average. If only the five significant studies are published, we have a 100% rate of false positives (significant p-values in the absence of a true effect) in the literature. If we assume that the action video gaming effect is not there, then we would expect, on average, three false positives out of 60 studies3. Is it possible that in 30 years, there is an accumulation of studies which trained dyslexic children’s visual-attentional skills and observed no improvement?
The second issue in the currently published literature relates to the previous point, and extends to the possibility that there might be an effect of action video gaming on reading ability. So, for now, let’s assume the effect is there. Perhaps it is even a big effect, let’s say, it has a standardised effect size (Cohen’s d) of 0.3, which is considered to be a small-to-medium-size effect. Realistically, the effect of action video gaming on reading ability is very unlikely to be bigger, since the best-established treatment effects have shown effect sizes of around 0.3 (Galuschka et al., 2014; here).
We can simulate very easily (in R) what will happen in this scenario. We pick a sample of 16 participants (the number of dyslexic children assigned to the action video gaming group in Franceschini et al., 2017). Then, we calculate the average improvement across the 16 participants, in the standardised score:
The first average value I get a mean improvement of 0.24. Not bad. Then I run the code again, and get a whooping 0.44! Next time, not so lucky: 0.09. And then, we even get a negative effect, of -0.30.
This is just a brief illustration of the fact that, when you sample from the population, your observed effect will jump around the true population effect size due to random variation. This might seem trivial to some, but, unfortunately, this fact is often forgotten even by well-established researchers, who may go on to treat an observed effect size as a precise estimate.
When we sample, repeatedly, from a population, and plot a histogram of all the observed means, we get a normal distribution: A fair few observed means will be close to the true population mean, but some will not be at all.
We’re closing in on the point I want to make here: Just by chance, someone will eventually run an experiment and obtain an effect size of 0.7, even if the true effect is 0.5, 0.2, or even 0. Bigger observed effects, when all else is equal, will yield significant results while smaller observed effects will be non-significant. This means: If you run a study, and by chance you observe an effect size that is bigger than the population effect size, there will be a higher probability that it will be significantly and get published. If your identical twin sibling runs an identical study but happens to obtain an effect size that is smaller than yours – even if it corresponds to the true effect size! – it may not be significant, and they will be forced to stow it in their file drawer.
Given that only the significant effects are published (or even if there is a disproportionate number of positive compared to negative outcomes), we end up with a skewed literature. In the first-case scenario, we considered the possibility that the effect might not be there at all. In the second scenario, we assume that the effect is there, but even so, the published studies, due to the presence of publication bias, may have captured effect sizes that are larger than the actual treatment effect. This has been called by Gelman & Carlin (2014, here) the “Magnitude Error”, and has been described, with an illustration that I like to use in talks, by Schmidt in 1992 (see Figure 2, here).
Getting back to action video gaming and dyslexia: Maybe action video gaming improves dyslexia. We don’t know: Given only three studies, it is difficult to adjudicate between two possible scenarios (no effect + publication bias or small effect + publication bias).
So, let’s have a look at the effects reported in the three published papers. I will ignore the 2013 paper4, because it only provides the necessary descriptives in figures rather than tables, and the journal format hides the methods section with vital information about the number of participants god-knows-where. In the 2017 paper, Table 1 provides the pre- and post-measurement values of the experimental and control group, for word reading speed, word reading accuracy, phonological decoding (pseudoword reading) speed, and phonological decoding accuracy. The paper even reports the effect sizes: The action video game training had no effect on reading accuracy. For speed, the effect sizes are d = 0.27 and d = 0.45 for word and pseudoword reading, respectively. In the 2015 paper, the effect size for the increase in speed for word reading (second row of the table) is 0.34, and for pseudoword reading ability, it is 0.58.
The effect sizes are thus comparable across studies. Putting the effect sizes into context: The 2017 study found an increase in speed, from 88 seconds to 76 seconds to read a list of words, and from 86 seconds to 69 seconds to read a list of pseudowords. For words, this translates to an increase in speed of 14%: In practical terms, if it takes a child 100 hours to read a book before training, it would take the same child only 86 hours to read the same book after training.
In experimental terms, this is not a huge effect, but it competes with the effect sizes for well-established treatment methods such as phonics instruction (Hedge’s g’ = 0.32; Galuschka et al., 2014)5. Phonics instruction focuses on a proximal cause of poor reading: A deficit in mapping speech sounds onto print. We would expect a focus of proximal causes to have a stronger effect than a focus on distal causes, where there are many intermediate steps between a deficit and reading ability, as explained by McArthur and Castles (2017) here. In our case, the following things have to happen for a couple of weeks of action video gaming to improve reading ability:
- Playing first-person shooter games has to increase children’s ability to switch their attention rapidly,
- The type of attention switching during reading is the same as the attention switching to a stimulus which appears suddenly on the screen,
- Improving your visual attention leads to an increase in reading speed.
There are ifs and buts at each of these steps. The link between action video gaming and visual-attentional processing would be diluted by other things which train children’s visual-attentional skills, such as how often they read, played tennis, sight-read sheet music, or looked through “Where’s Wally” books during the training period.6 In between visual-attentional processing and reading ability, are other variables which affect reading ability and dilute this link: the amount of time they read at home, motivation and tiredness at the first versus the second testing time point, and many others. These other factors dilute the treatment effect by adding variability to the experiment that is not due to the treatment. This should lead to smaller effect sizes.
In short: There might be an effect of action video gaming on reading ability. But I’m willing to bet that it will be smaller than the effect reported in the published studies. I mean this literally: I will buy a good bottle of a drink of your choice to anyone who can convince me that the effect 2 weeks of action video gaming on reading ability is in the vicinity of d = 0.3.
How to provide a convincing case for an effect of action video gaming on reading ability
The idea that something as simple as action video gaming can improve children’s ability to do one of the most complex tasks they learn at school is an incredible claim. Incredible claims require very strong evidence. Especially if the claim has practical implications.
To convince me, one would have to conduct a study which is (1) well-powered, and (2) pre-registered. Let’s assume that the effect is, indeed, d = 0.3. With g*power, we can easily calculate how many participants we would need to recruit for 80% power. Setting “Means: Difference between two dependent means (matched pairs)” in “Statistical test”, a one-tailed test (note that both of these decisions increase power, i.e., decrease the number of required participants), effect size of 0.3, alpha of 0.05 and power of 0.8, it shows that we need 71 children in a within-children design to have adequate power to detect such an effect.
A study should also be pre-registered. This would remove the possibility of the authors tweaking the data, analysis and variables until they get significant results. This is important in reading research, because there are many different ways in which reading ability can be calculated. For example, Gori and colleagues (Table 3) present 6 different dependent variables that can be used as the outcome measure. The greater the amount of variables one can possibly analyse, the greater the flexibility for conducting analyses until at least some contrast becomes significant (Simmons et al., 2011, here). Furthermore, pre-registration will reduce the overall effect of publication bias, because there will be a record of someone having started a given study:
In short: To make a convincing case that there is an effect of the magnitude reported in the published literature, we would need a pre-registered study with at least 70 participants in a within-subject design.
Some final recommendations
For researchers: I hope that I managed to illustrate how publication bias can lead to magnitude errors: the illusion that an effect is much bigger than it actually is (regardless of whether or not it exists). Your perfect study which you pre-registered and published with a significant result and without p-hacking might be interpreted very differently if we knew about all the unpublished studies that are hidden away. This is a pretty terrifying thought: As long as publication bias exists, you can be entirely wrong with the interpretation of your study, even if you do all the right things. We are quickly running out of excuses: We need to move towards pre-registration, especially for research questions such as the one I discussed here, which has strong practical implications. So, PLEASE PLEASE PLEASE, no more underpowered and non-registered studies of action video gaming on reading ability.
For funders: Unless a study on the effect of action video gaming on reading ability is pre-registered and adequately powered, it will not give us meaningful results. So, please don’t spend any more of the tax payers’ money on studies that cannot be used to address the question they set out to answer. In case you have too much money and don’t know what to do with it: I am looking for funding for a project on GPC learning and automatisation in reading development and dyslexia.
For parents and teachers who want to find out what’s best for their child or student: I don’t know what to tell you. I hope we’ll sort out the publication bias thing soon. In the meantime, it’s best to focus on proximal causes of reading problems, as proposed by McArthur and Castles (2017) here.
1 I know absolutely nothing about shooter games, but from what I understand characters there tend to be males.
2 More like 300 years, Wikipedia informs me.
3 This assumes no questionable research practices: With questionable research practices, the false positive rate may inflate to 60%, meaning that we would need to assume the presence of only 2 unpublished studies which did not find a significant treatment effect (Simmons et al., 2011, here)
4 I can do this in a blog post, right?
5 And this is probably an over-estimation, given publication bias.
6 If playing action video games increases visual-attentional processing ability, then so should, surely, these other things?