Last year, I went to an R Ladies
event. This event took place at the Süddeutsche Zeitung, one of the biggest and
most serious newspapers in Germany. The workshop was presented by two R Ladies
from the data-driven journalism department of the newspaper. The event was
extremely interesting: as it turns out, the job of a data-driven journalist is
to collect or find data, and present it to the readers in an understandable
way. One project which was presented included an analysis
of the transcripts from the Bundestag meetings, presented in easy-to-digest
graphs. Another project contained new data on the very
socially relevant question of housing prices in Germany.
Throughout the event, I kept
thinking: They are much further in terms of open communication than we are. As an essential part of their job, data-driven journalists need to present
often complex data in a way that any interested reader can interpret it. At the
same time, the R Ladies at the event kept emphasising that the data and
R/RMarkdown scripts were publicly available, for anyone who doubted their
conclusions or wanted to try out things for themselves.
This brings me to the idea of what
the perfect article would look like. I guess you know where this is going, but
before I go there, to avoid disappointment, I will add that, in this blog post,
I will not give any advice on how to actually write such a perfect article, nor
how to achieve a research world where such articles will be the norm. I will
just provide a dreamer’s description of a utopian world, and finish off with some
questions that I have no answer for.
The perfect article would have a
pyramidal structure. At the top layer would be a description of the study,
written at a level that a high school student should understand it. The data
could be presented in an interactive shiny app, and there would be easy-to-read
explanations of the research question, its importance, how the data should be
interpreted to answer this research question, and any limitations that may
affect the interpretation of the data.
Undergraduate students in the field
of study (or very interested readers) would be navigated to a more detailed
description of the study, which describes the research methods in more detail.
Here, the statistical analyses and the theoretical relevance would need to be explained,
and a more thorough description of methodological limitations should be
provided.
The next level would be aimed at
researchers in the field of study. Here, the study would need to be placed in
relation to previous work on this topic, and a very thorough discussion of the
theoretical implications would be needed.
The final level would include all
the data, all the materials, and all the analysis script. This level would be
aimed at researchers who plan to build on this work. It will allow them to
double check that the results are robust and that there are no mistakes in the
data analysis. They would also be able to get the materials, allowing them to
build as closely as possible on previous work.
Even in an ideal world, this format
would not be suitable for all fields. For example, in theoretical mathematics,
it would probably be very difficult to come up with a project that could be
explained to a lay audience through a shiny app. More applied mathematics
could, however, be presented as the deeper layers of a project where these
methods are applied.
Many practical concerns jump out of
my perfect article proposal. Most obviously, an article of this form would be
unsuitable for a paper format. It would, however, be relatively
straight-forward to implement in online journals. This, however,
would require expertise that not all academic authors have. (In fact, I would
guess: an expertise that most academic authors don’t have.) Even for those that
do have the skills, it would require much more time, and as we all know, time
is something that we don’t have, because we need to publish in large quantities
if we want to have a job. Another issue with this format is: many studies are
incremental, and they would not be at all interesting to a general audience. So
why spend time on creating the upper layers of the pyramid?
A solution to the last issue would
be to completely re-think the role that papers have in the academic process.
Instead of publishing papers, the mentality could switch to publishing
projects. Often, a researcher or lab is concerned with a broader research
question. Perhaps what would be, in our current system, ten separate
publications could be combined to make a more general point about such a broad
research question, which would be of
interest to a general public. Such a switch in mind set would also give
researchers a greater sense of purpose, as they would need to keep this broad
research question in the back of their minds while they conduct separate
studies.
Another question would fall out of
this proposal to publish projects rather than individual studies: What would
happen with authorship? If five different PhD students conducted the individual
studies, some of them would need to give up their first authorship if their
work is combined into a single project. Here, the solution would be to move
away from an authorship model, and instead list each researcher’s contribution
along with the project’s content. And, as part of the team, one could also find
a programmer (or data-driven journalist), who would be able to contribute to
the technical side of presenting the content, and to making sure that the upper
layers of the presentation are really understandable to the intended audience.
The problem would remain that PhD
students would go without first authorship. But, in an ideal world, this would
not matter, because their contributions to the project would be clearly
acknowledged, and potential employers could actually judge them based on the
quality, not the quantity of their work. In an ideal world…