In reading research, a question of enormous practical (and
theoretical) importance is: Why do some children of adequate language skills,
intelligence and educational opportunities lag behind their peers in their reading
ability, or more simply put: What causes reading problems? This question has
been tackled for decades, and the answer has proved to be incredibly complex.
(There is still no agreed-upon answer to date.)
Experimentally, finding a causal influence of a behavioural outcome is
somewhat tricky. Paraphrasing my undergraduate statistics textbook: there are
three points that you need to show before claiming causality. (1) There is a
correlation between the outcome measure (e.g., reading ability) and performance
on the task which is proposed to cause the variability therein (say,
phonological awareness). (2) The causal influence precedes the skill that it’s
supposed to test (e.g., phonological awareness at an earlier time point is
associated with reading at a later time point). (3) Experimentally manipulating
the causing variable should affect performance on the outcome measure (e.g.,
children become better readers if you train them on a phonological awareness
task).
There are two statistical procedures which are commonly used,
in reading research, to show a causal relationship, even though they completely
ignore Point (3). An experimental manipulation is essential for making a causal
claim: both Points (1) and (2) are susceptible to the alternative explanation
that a third factor influences both measures. For example, phonological
awareness, even if measured before the onset of reading instruction, may be
linked to reading ability in Grade 3, but both of them may be caused by
vocabulary knowledge, parental involvement in their children’s education, the
child’s intelligence, or statistical learning ability, just to name a few
possibilities.
Most researchers know that correlation ≠ causation, but many
seem to succumb to the temptation of inferring causation from structural
equation models (SEMs). Paraphrasing my undergraduate statistics lecturer: SEM
is a way of rearranging correlations in such a way that makes it look like you
can infer causality. Here, the outcome measure and predictors are represented
as boxes, and the unique variance of the particular link, obtained by a
regression analysis, is written next to each arrow going from a predictor to
the outcome measure. Even if a predictor is measured at an earlier time than
the outcome measure (thus showing precedence, as per Point 2), this fails to
show a causal relationship, as a third, unmeasured factor could be causing both.
Having just returned from a selective summer school on
literacy, I have counted a total of four statements inferring a causal
relationship from SEMs during this meeting, one by a prominent professor. They
are in good company. Just to pick one example, a recent paper has used SEMs to
infer causation (Hulme, Bowyer-Crane, Carroll, Duff, & Snowling, 2012)1.
While I’m at it, there is another methodological method that
has been used to infer causation even though it can’t, namely reading age
matched designs. The logic is as follows: if you compare poor readers to good
readers, who are matched on age, on any task (say, phonological awareness), you
can expect poor readers to perform worse than good readers. This could be
because being skilled at this task facilitates learning to read, or performance
on this task could be a result of more reading exposure among the good readers
(because good readers tend to read in their free time, while poor readers
don’t). In a reading age matched design, one compares a group of poor readers,
given their age, to a group of younger readers, who are average or good given
their age, but their absolute reading ability is equivalent to that of the poor
readers. If poor readers perform worse on phonological awareness tasks than
their younger controls, this suggests that the deficit in phonological awareness
is not a result of a lack of reading exposure.
There are theoretical problems in matching children for
their absolute reading ability, because older poor readers and younger
average-good readers are unlikely to have identical performance on different
aspects of reading (see Jackson & Coltheart, 2001): the control group could vary
widely in their age and cognitive skill profiles, depending whether the task to
match them measures their nonword reading accuracy, their word reading fluency,
or text comprehension. Even if it was theoretically possible to match poor
readers to younger controls in terms of their reading ability, the caveats from
SEMs still apply: it is possible that poor phonological awareness and poor
reading skills are both caused by a third underlying factor. Although I know of
no peer-reviewed paper that explicitly makes a causal claim based on the
reading age matched design, I have heard such claims at conference talks, and
causality is often implied in published papers, without explicitly stating the
alternative explanation.
The TL;DR summary of this post is very simple: It is never OK to infer causality from
correlations.
References
Hulme, C., Bowyer-Crane, C., Carroll, J. M., Duff, F.
J., & Snowling, M. J. (2012). The Causal Role of Phoneme Awareness and
Letter-Sound Knowledge in Learning to Read: Combining Intervention Studies With
Mediation Analyses. Psychological
Science, 23(6), 572-577. doi:10.1177/0956797611435921
Jackson, N., & Coltheart, M. (2001). Routes to reading success and failure:
Toward an integrated cognitive psychology of atypical reading. New York,
NY: Psychology Press.
Footnote
[1] To be fair, this study also
has a training component. Whether the paper makes a convincing claim for a
causal relationship is a different question, but either way someone who only
has a quick read of the title and abstract may get the impression that SEMs are
a tool for assessing causality.