Last year, I went to an R Ladies event. This event took place at the Süddeutsche Zeitung, one of the biggest and most serious newspapers in Germany. The workshop was presented by two R Ladies from the data-driven journalism department of the newspaper. The event was extremely interesting: as it turns out, the job of a data-driven journalist is to collect or find data, and present it to the readers in an understandable way. One project which was presented included an analysis of the transcripts from the Bundestag meetings, presented in easy-to-digest graphs. Another project contained new data on the very socially relevant question of housing prices in Germany.
Throughout the event, I kept thinking: They are much further in terms of open communication than we are. As an essential part of their job, data-driven journalists need to present often complex data in a way that any interested reader can interpret it. At the same time, the R Ladies at the event kept emphasising that the data and R/RMarkdown scripts were publicly available, for anyone who doubted their conclusions or wanted to try out things for themselves.
This brings me to the idea of what the perfect article would look like. I guess you know where this is going, but before I go there, to avoid disappointment, I will add that, in this blog post, I will not give any advice on how to actually write such a perfect article, nor how to achieve a research world where such articles will be the norm. I will just provide a dreamer’s description of a utopian world, and finish off with some questions that I have no answer for.
The perfect article would have a pyramidal structure. At the top layer would be a description of the study, written at a level that a high school student should understand it. The data could be presented in an interactive shiny app, and there would be easy-to-read explanations of the research question, its importance, how the data should be interpreted to answer this research question, and any limitations that may affect the interpretation of the data.
Undergraduate students in the field of study (or very interested readers) would be navigated to a more detailed description of the study, which describes the research methods in more detail. Here, the statistical analyses and the theoretical relevance would need to be explained, and a more thorough description of methodological limitations should be provided.
The next level would be aimed at researchers in the field of study. Here, the study would need to be placed in relation to previous work on this topic, and a very thorough discussion of the theoretical implications would be needed.
The final level would include all the data, all the materials, and all the analysis script. This level would be aimed at researchers who plan to build on this work. It will allow them to double check that the results are robust and that there are no mistakes in the data analysis. They would also be able to get the materials, allowing them to build as closely as possible on previous work.
Even in an ideal world, this format would not be suitable for all fields. For example, in theoretical mathematics, it would probably be very difficult to come up with a project that could be explained to a lay audience through a shiny app. More applied mathematics could, however, be presented as the deeper layers of a project where these methods are applied.
Many practical concerns jump out of my perfect article proposal. Most obviously, an article of this form would be unsuitable for a paper format. It would, however, be relatively straight-forward to implement in online journals. This, however, would require expertise that not all academic authors have. (In fact, I would guess: an expertise that most academic authors don’t have.) Even for those that do have the skills, it would require much more time, and as we all know, time is something that we don’t have, because we need to publish in large quantities if we want to have a job. Another issue with this format is: many studies are incremental, and they would not be at all interesting to a general audience. So why spend time on creating the upper layers of the pyramid?
A solution to the last issue would be to completely re-think the role that papers have in the academic process. Instead of publishing papers, the mentality could switch to publishing projects. Often, a researcher or lab is concerned with a broader research question. Perhaps what would be, in our current system, ten separate publications could be combined to make a more general point about such a broad research question, which would be of interest to a general public. Such a switch in mind set would also give researchers a greater sense of purpose, as they would need to keep this broad research question in the back of their minds while they conduct separate studies.
Another question would fall out of this proposal to publish projects rather than individual studies: What would happen with authorship? If five different PhD students conducted the individual studies, some of them would need to give up their first authorship if their work is combined into a single project. Here, the solution would be to move away from an authorship model, and instead list each researcher’s contribution along with the project’s content. And, as part of the team, one could also find a programmer (or data-driven journalist), who would be able to contribute to the technical side of presenting the content, and to making sure that the upper layers of the presentation are really understandable to the intended audience.
The problem would remain that PhD students would go without first authorship. But, in an ideal world, this would not matter, because their contributions to the project would be clearly acknowledged, and potential employers could actually judge them based on the quality, not the quantity of their work. In an ideal world…