Friday, May 24, 2019

The perfect article

Last year, I went to an R Ladies event. This event took place at the Süddeutsche Zeitung, one of the biggest and most serious newspapers in Germany. The workshop was presented by two R Ladies from the data-driven journalism department of the newspaper. The event was extremely interesting: as it turns out, the job of a data-driven journalist is to collect or find data, and present it to the readers in an understandable way. One project which was presented included an analysis of the transcripts from the Bundestag meetings, presented in easy-to-digest graphs. Another project contained new data on the very socially relevant question of housing prices in Germany.

Throughout the event, I kept thinking: They are much further in terms of open communication than we are. As an essential part of their job, data-driven journalists need to present often complex data in a way that any interested reader can interpret it. At the same time, the R Ladies at the event kept emphasising that the data and R/RMarkdown scripts were publicly available, for anyone who doubted their conclusions or wanted to try out things for themselves.

This brings me to the idea of what the perfect article would look like. I guess you know where this is going, but before I go there, to avoid disappointment, I will add that, in this blog post, I will not give any advice on how to actually write such a perfect article, nor how to achieve a research world where such articles will be the norm. I will just provide a dreamer’s description of a utopian world, and finish off with some questions that I have no answer for.

The perfect article would have a pyramidal structure. At the top layer would be a description of the study, written at a level that a high school student should understand it. The data could be presented in an interactive shiny app, and there would be easy-to-read explanations of the research question, its importance, how the data should be interpreted to answer this research question, and any limitations that may affect the interpretation of the data.

Undergraduate students in the field of study (or very interested readers) would be navigated to a more detailed description of the study, which describes the research methods in more detail. Here, the statistical analyses and the theoretical relevance would need to be explained, and a more thorough description of methodological limitations should be provided.

The next level would be aimed at researchers in the field of study. Here, the study would need to be placed in relation to previous work on this topic, and a very thorough discussion of the theoretical implications would be needed.

The final level would include all the data, all the materials, and all the analysis script. This level would be aimed at researchers who plan to build on this work. It will allow them to double check that the results are robust and that there are no mistakes in the data analysis. They would also be able to get the materials, allowing them to build as closely as possible on previous work.

Even in an ideal world, this format would not be suitable for all fields. For example, in theoretical mathematics, it would probably be very difficult to come up with a project that could be explained to a lay audience through a shiny app. More applied mathematics could, however, be presented as the deeper layers of a project where these methods are applied.

Many practical concerns jump out of my perfect article proposal. Most obviously, an article of this form would be unsuitable for a paper format. It would, however, be relatively straight-forward to implement in online journals. This, however, would require expertise that not all academic authors have. (In fact, I would guess: an expertise that most academic authors don’t have.) Even for those that do have the skills, it would require much more time, and as we all know, time is something that we don’t have, because we need to publish in large quantities if we want to have a job. Another issue with this format is: many studies are incremental, and they would not be at all interesting to a general audience. So why spend time on creating the upper layers of the pyramid?

A solution to the last issue would be to completely re-think the role that papers have in the academic process. Instead of publishing papers, the mentality could switch to publishing projects. Often, a researcher or lab is concerned with a broader research question. Perhaps what would be, in our current system, ten separate publications could be combined to make a more general point about such a broad research question, which would be of interest to a general public. Such a switch in mind set would also give researchers a greater sense of purpose, as they would need to keep this broad research question in the back of their minds while they conduct separate studies.

Another question would fall out of this proposal to publish projects rather than individual studies: What would happen with authorship? If five different PhD students conducted the individual studies, some of them would need to give up their first authorship if their work is combined into a single project. Here, the solution would be to move away from an authorship model, and instead list each researcher’s contribution along with the project’s content. And, as part of the team, one could also find a programmer (or data-driven journalist), who would be able to contribute to the technical side of presenting the content, and to making sure that the upper layers of the presentation are really understandable to the intended audience.

The problem would remain that PhD students would go without first authorship. But, in an ideal world, this would not matter, because their contributions to the project would be clearly acknowledged, and potential employers could actually judge them based on the quality, not the quantity of their work. In an ideal world…

Thursday, May 16, 2019

Why I stopped signing my reviews

Since the beginning of this year, I stopped signing my peer reviews. I had systematically signed my reviews for a few years: I think I started this at the beginning of my first post-doc, back in 2015. My reasons for signing were the following: (1) Science should be about an open exchange of ideas. I have previously benefitted from signed reviews, because I could contact the reviewer with follow-up questions, which has resulted in very fruitful discussion. (2) Something ideological about open science (I don’t remember the details). (3) As an early career researcher, one is still very unknown. Signing reviews might help colleagues to associate your name with your work. As for the draw-backs, there is the often-cited concern that authors may want to take revenge if they receive a negative review, and even in the absence of any bad intentions, they may develop implicit biases against you. I weighed this disadvantage against the advantages listed above, and I decided that it’s worth the risk.

So then, why did I stop? There was a specific review that made me change my mind, because I realised that by signing reviews, one might get into all kinds of unanticipated awkward situations. I will recount this particular experience, of course, removing all details to protect the authors’ identity (which, by the way, I don’t know, but perhaps others might be able to guess with sufficient detail).

A few months ago, I was asked to review a paper about an effect, which I had not found in one of my previous studies. This study reported a significant effect. I could not find anything wrong with the methods or analyses, but the introduction was rather biased, in the sense that it cited only studies that did show this effect, and did not cite my study. I asked the authors to cite my study. I also asked them to provide a scatterplot of their data.

The next version of this manuscript that I received included the scatterplot, as I’d asked, and a citation of my study. Except, my study was cited in the following context (of course, fully paraphrased): “The effect was found in a previous study (citation). Schmalz et al. did not find the effect, but their study sucks.” At the same time, I noticed something very strange about the scatterplot. After asking several stats-savvy colleagues to verify that this strange thing was, indeed, very strange, I wrote in my review that I don’t believe the results, because the authors must have made a coding error during data processing.

I really did not like sending this review, because I was afraid that it would look (both to the editor and to the authors) like I had picked out a reason to dismiss the study because they had criticised my paper. However, I had signed my previous review, and whether or not I would sign during this round, it would be clear to the authors that it was me.

In general, I still think that signing reviews has a lot of advantages. Whether the disadvantages outweigh the benefits depends on each reviewer’s preference. For myself, the additional drawback that there may be unexpected awkward situations that one really doesn’t want to get into as an early career researcher tipped the balance, but it’s still a close call.