Xenia Schmalz's blog

Wednesday, September 10, 2025

Will peer review collapse in the coming years?

I generally roll my eyes whenever someone prophesises an upcoming apocalypse. I’ve heard enough promises of fire and brimstone that have not yet come to pass to remain optimistic about most things. Yet, when I was holding my annual open science workshop for a masters programme at my local university, I found myself telling the students: “I predict that the peer review system will collapse in the coming five years.” Now, I’m not sure if I’m right about this: Although I’ve heard many apocalyptic predictions throughout my life, none of them has actually become reality, so chances are pretty high that I’m wrong.

I’ve been meaning to write a blogpost about this for a while, but I’m not sure there’s anything new to say. After all, complaining about peer review is one of the favourite pastimes of academics. The trigger for writing it was a provocative post on bluesky by Samuel Mehr. He wrote: “my reactionary take is that peer review is fine, on balance, and most people who hate it feel this way for reasons that are tangential to a real assessment of peer review (eg, they're annoyed about how a specific case of peer review played out, probably one that didn't go their way)”. Although I responded, this seems like a difficult topic to discuss on social media without mixing up different issues. So, here is a blogpost attempting to summarise the issues and the reasons for my prediction. The problem is: I don’t have a solution for the issues that I see with peer review. But it seems to me that there are options that are at the very least not worse than what we currently have.

Issue 1: People hate peer review because they have bad experiences where their own papers were rejected

Here, I’ve paraphrased Samuel’s main point from his bluesky post, but I’d like to discuss it from a different perspective. I’m sure that he’s right that many people hate the peer review process partly (and perhaps subconsciously) because the reviewers fail to see the genius behind their amazing papers. Of course – let’s face it – our article is probably less ground-breaking and more flawed than we’d like to think. Especially as an early career researcher, it can be disheartening to submit a paper that consists of your blood, sweat and tears, and that you’re convinced will change the world, only to have it ripped to shreds by the reviewers. At least in my experience, it’s a learning process to realise that your work is nowhere near as important to other people as it is to you.

This is an important realisation, but the question is what is the best way to find this out. At this stage, I have to second Samuel’s observation that most of the reviewers are thoughtful, helpful, and show a high level of competence. Learning how to discuss with people who completely and utterly disagree with me has been a fun challenge for me over the last decade, and it involved learning to take my own work less seriously, on a personal level. Arguably, the peer review process, as it currently stands, is a good way to acquire this skill.

Conversely, one might argue that it’s not: if this many people hate the peer review process because it hurt their ego, we should at least consider whether there is a way to teach these skills that may be a little less cruel. In my experience, there may be, for every 20 thoughtful and kind reviewers, one with a broken caps lock key and questionable manners. It would be bad luck, but not exceptional, to encounter such reviewers for the very first paper one submits. Should we protect authors from the caps lock guy? I think we should, because the peer review would work much better if we move it away from antagonistic feelings and towards fruitful discussions. I’m not sure about the best way to do this, though.

Publishing the reviews alongside an article could be an option: the caps lock guy may be more careful with their wording when they know the general public can actually read their review. Although one may also consider posting the reviewer’s name along with their review, I wouldn’t make this mandatory: the peer review process needs to be such that an early career researcher wouldn’t be intimidated if they want to point out some fundamental flaw in a big shot’s paper. The empirical question is if caps lock guy is confident enough to post a caps lock review publicly, if it’s anonymous.

An alternative way to tame the caps lock guy would be on the level of journals and editors: they could include a statement that they discourage bad manners and ask the reviewers to provide the type of review that they would have liked to have received back in their days as an early career researcher. This may work as a gentle nudge, but it would be difficult to impose any rules. If editors get the power to decide whether they will forward a review to the authors and use it to inform their decision, based on the review’s tone, this may introduce additional biases in the peer review system (e.g., “You were mean to my friend!” vs. “You were mean to a person with whom I have a long-standing professional rivalry!”). Furthermore, let’s say an editor does decide to reject a review based on the tone. Then, they would need to go out and recruit an additional reviewer, which would delay the publication process and may make the editor’s job even more difficult. To confound this issue, reviewers may be more reluctant to agree to review a paper in the first place, if they are worried that the reviewer may reject their review because they don’t like the tone. Although we might think that we know a non-constructive and rude review when we see one, there are also cultural differences in how directly one expresses criticism. In other words, German and Dutch reviewers may be barred from reviewing any papers for American journals if editors can reject the reviews based on the tone.

So, nicifying the peer review process would be good, but there’s no straight-forward way to achieve this. At the same time, in the spirit of remaining sceptical about my own scepticism about peer review, I should note that this is not an issue that is directly and intrinsically related to the concept of peer review: provided that peer review can be, theoretically, an open and constructive dialogue, this first issue is not an argument against the whole concept of peer review.

Issue 2: Lack of transparency and quality control

The second issue has been debated a lot, so there’s not much to say about that. The peer review process happens behind closed doors. This means, firstly, that a lot of academics get little acknowledgement of their behind-the-scenes editorial and reviewing work. And secondly, there is a lack of transparency about how the decision to publish a paper – or reject it – has been reached. The is especially important for the latter scenario: it’s probably a good thing that not every paper that was ever written will see the light of day, but at the same time, we don’t know how many good papers have gotten thrown out because of a discouraging but unfair initial review.

For published papers, the solution is relatively simple: Publish the reviews. Some journals have started doing this, as was pointed out to me in the bluesky thread I linked to above. Publishing the reviews will also allow any reader to get an idea of the quality of the reviews. Importantly, it will allow meta-scientists to compare the quality of reviews across different formats of peer-review. (I was going to enrich this blog post with some analyses like this, but I decided that it would be too time consuming to scrape and analyse the amount of reviews that would be needed to reach any meaningful conclusions.)

We have to be careful not to conflate issues, though. A lack of transparency is not an integral aspect of peer review. In fact, we can easily move towards open peer review. The problem would remain, however, that we would never see the reviews of papers that don’t get published.

Issue 3: Peer review as career gate keeping

During and shortly after my PhD, I was sad about rejections because they hurt my ego. As I became more confident in myself and (rightly) less confident in the ingenuity of my papers, I stopped being sad about this, but I gradually became more sad because every rejection means a smaller chance of grants and positions, which is strongly linked to an existential threat of having to leave academia. I consider myself lucky, because I have always worked in labs where my advisors and colleagues valued quality over quantity, and there was no formal pressure to publish. For example, I did not have any rules about the number of papers that I need to publish in order to get a PhD. My current PhD students do have such a rule, imposed by the university: Two accepted articles in a high-IF journal, at least one as a first author. Such rules are stupid and harmful (as an aside: I don’t think that, as a reviewer, I’d always pass a tone check by the editor).

First, the pressure to publish is stupid, because it’s not necessarily an indicator of a researcher’s worth. This has been debated to no end already, so just to add my two cents: Maybe there is some positive correlation between quantity and quality of publications. However, in my view, the number of deviations from this regression line are sufficient to make quantity useless as an overall indicator: for example, when people are pressured to salami-slice instead of producing an in-depth and coherent piece of work, or when bullshitters get tenure at the cost of more competent people leaving academia.

Second, the pressure to publish is harmful, because it creates the wrong incentives. The increase in the number of papers is exponential. Nobody has the time to read all these papers, and nobody has the time to review all these papers. On bluesky, Brad Wyble replied that the difficulty with getting reviewers is, empirically speaking, not as bad as is often assumed. However, in my experience, both as an editor and as the recipient of desperate-sounding invitations to review papers, I’d say it’s pretty bad. Needless to say, if editors are just happy to find anyone to review a paper, one can’t expect that all reviewers will have the in-depth knowledge and expertise that one might hope for.

It's not an encouraging thought for a researcher that they are producing papers that very few people actually have the time to read, and that will thus not even lead to an incremental progress in our understanding of their phenomenon of interest. But a substantially more harmful effect of the pressure to publish are the publications by paper mills. I won’t write too much about them, because other people who are much more knowledgeable about this topic, like Dorothy Bishop, have. But, in short, if you pressure people to write papers, they will write papers, leading to an inflation of all kinds of publications, including such that are useless at best.

How should we evaluate researchers, if not by the number of publications or related quantitative measures? Frankly, I have no idea. In answering this question, therefore, I will cheat by saying that this is an orthogonal issue that is beyond the scope of this blogpost.

To be fair to the concept of peer review, the issue here is not peer review, but the incentive structures that have been built around it.

Issue 4: Incentivising reviewers and editors

I’m sure most of us have had the experience of reading a published paper and thinking: “How on earth did this get past peer review?” The answer to that question can often be obtained by looking at the names of the editor and author, and checking, at the next conference, whether the two of them hang out with each other. At other times, the reviews are just bad: maybe there is a conflict of interest, maybe the reviewers didn’t get the point of the study or didn’t have the knowledge to identify some fundamental flaw, and maybe the reviewers just didn’t care and waved through the paper so they could move on to the next task. Or they didn’t want to ruffle any feathers – even in anonymous peer review, one can often guess who the reviewer is, and, you know, I scratch your back, you scratch mine.

How do we get high-quality reviews? How do we counteract the inevitable biases that exist both on the side of the editors and of the reviewers? How do we incentivise the reviewers to provide a detailed and honest report? Perhaps these are empirical questions – if we have open peer review, and can systematically investigate which factors affect the quality of the reviews. Again, this is not a killer argument against the whole concept of peer review. However, these are open questions, and the system, as it currently is, is definitely far away from some optimal solution.

Conclusion and ways forward

It’s easy to get side tracked in the debate about peer review, so let’s get it back to the basic questions: Does peer review, as it currently is, do its job? In terms of doing its job as a quality indicator of articles, I’d say: Kind of. In terms of determining which researchers deserve to get jobs and grants, I’d say: Fuck no. And in terms of a decision mechanism about which articles should be published and which ones should be stowed away in a file drawer forever, I’d say: no, not really, no.

Still, we should be careful not to conflate issues, and not to throw out the baby with the bathwater, as the saying goes. Ceteris paribus, a paper with positive peer reviews is likely to be better than a paper with negative peer reviews. Many of the issues are not intrinsic to the process of peer review per se and would not be impossible to change. Specifically, I refer to the lack of transparency and the incentive structure.

Transparency can easily be increased by publishing reviews with a paper. This leaves us with the problem, however, that we will never see the reviews of papers that get rejected. A way to fix this would be to publish preprints. Technically, it shouldn’t be difficult to allow for anyone to write public reviews for any preprint, thereby breaking the monopoly of academic journals to provide peer review. Even a thumbs up / thumbs down button next to a preprint could provide some indicator about whether a paper should be taken seriously or not. The problem that remains are all kinds of biases: For example, someone who is new to a field and not well-connected in the community might find it difficult to get any kind of reaction for their preprint. This might lead to a cycle where their paper won’t be noticed by the community because it’s not reviewed, and won’t get reviewed because it has gone unnoticed. This is why I like the concept of Peer Community In …, as it is a way to attach peer reviews to a paper when uploading a preprint.

And as for the incentive structure, there’s very little to say, aside from arm waving and prophecies of an apocalypse. It ought to be changed. But nobody seems to know how.

So, will peer review collapse in the coming years? Maybe not. Despite what it sometimes feels like, things are changing. Open peer review, a wide spread of preprints, and platforms such as PubPeer and PCI have only appeared in the past decade or so. Not everyone is convinced about these novelties, but they somehow co-exist with the traditional closed peer review system by for-profit journals. Perhaps the goal should not be to topple the existing system, but to further develop alternatives and experiment with what yields the best results. And hope for the best.

Friday, July 25, 2025

What can we learn from unsuccessful theories?

We still have a long way to go when it comes to designing good theories, at least in psychological science. On the topic of 'we have a long way to go', psychology is often juxtaposed with physics, where one might get the impression that physics is doing better than psychology, as a science, on all counts. Embarrassingly for psychologists, Popper famously compared Einstein's and Freud's theories, with Einstein's theory of relativity being an example of good science, as it made a verifiable prediction that hadn't been, but could be empirically tested. Freud's theories, on the other hand, were unfalsifiable, as any observed phenomenon could be explained post-hoc.

There are already many blog posts that discuss how we should be, or why we shouldn't be, more like physicists. Of course, physicists have advantages that we don't have, such as precise measurements and centuries of numerical models to build on. Aspiring to improve is always a good thing, but maybe we can also look at the flip side of the coin: To not only compare ourselves to a gold standard and potentially discover that we just can't, but also to see where others went wrong to avoid repeating mistakes.

With these thoughts in mind (or something along these lines), I started digging a bit into theories that didn't work out. So far, I haven't found that much, so I welcome any recommendations for reading on this topic!

The first unsuccessful theory that I started googling was alchemy. It's often described as what came before chemistry, except the people back then didn't really know what matter was made of, so they did meticulous work and maybe even discovered some principles that are still relevant for modern chemistry, but mainly went on wild goose chases to achieve immortality or to turn base metals into gold. I came across a historical character who sounds pretty cool: Cleopatra the Alchemist (https://en.wikipedia.org/wiki/Cleopatra_the_Alchemist), not to be confused with the empress, who lived in the same country of Egypt but in a different century. Alas, it seems that back in the days, reproducible working was not a thing yet. Apparently, the writings of alchemists are difficult to decipher, because they often wrote in code (https://www.youtube.com/watch?v=gxiLuz9kHi0).

What can we learn from that? First, that we should work reproducibly, even when it comes to documenting our ideas and trains of thought. Second, there may be a broader message there, something about not letting our own personal interests dominating our research. Achieving eternal life or unlimited wealth may be an ultimate aim for some people in life, but perhaps it's important to concentrate on the little steps and the scientific achievements that we make on the way there.

The second unsuccessful theory that came to my mind is Lamarckian evolution. This is a theory that, in my undergraduate course of biology, was juxtaposed to Darwin's theory of evolution by natural selection. Lamarck built on the obvious observation that children are similar to their parents, and suggested that parents can pass on acquired traits. The example from the textbook was the giraffe's neck: A giraffe stretches its neck to get to the leaves on the top of the tree, and because this makes its neck longer, its children also have longer necks. The example on Wikipedia is a blacksmith, who acquires muscles through his work, and then his children become physically stronger, too.

Interestingly, the Wikipedia page on Lamarckism (https://en.wikipedia.org/wiki/Lamarckism) has a whole section on "Textbook Lamarckism", criticising exactly what I described above: Naming Lamarckian and Darwinian evolution as two sides of the same coin, one being bad and the other one being good. Apparently, Darwin believed in the passing on of acquired traits, just as Lamarck did. What we learned in biology class was that Darwin's theory of evolution by natural selection stood the test of time because later research, namely the advent of genetics, showed support for a mechanism that could account for transmission across generations and didn't involve the passing on of inherited traits. I think the lesson that we were supposed to learn from this juxtaposition was how important the specification of mechanisms is: undoubtedly, this is an important lesson for psychological scientists. What I personally found cool about Darwin's evolution by natural selection was its reliance on deductive reasoning: If there is variability between individuals of the same species, and this variability allows some individuals to survive with a higher probability than others, and the individual differences are passed on across generations, then those with the more successful variant will survive, leading to survival of the fittest and evolution by natural selection. For psychologists, achieving theorising based on deductive reasoning may be as utopian as achieving the measurement accuracy of physicists, who apparently throw out a tool if its test-retest reliability is less than 0.99. But it's nice to dream.

Speaking of physicists, the third theory is Cold Fusion. I learned about it only when I met my husband, who is a physicist working with hot fusion. With hot fusion, a nuclear reaction happens when matter is heated up to hundreds of millions of degrees. Cold fusion was supposed to work at room temperature. The first mess underlying this theory was on the empirical level: after the initial study demonstrating fusion at room temperature was published, other labs failed to replicate it. The original study was apparently difficult to replicate in the first place because the experiment was not well-described, and when participants did, they were not able to find the excess heat that was supposed to be a product of the reaction. There is even talk about fraud in the original experiment. So, cold fusion went quickly out of fashion, and is not taken seriously by the overwhelming majority of physicists.

The celebratory conclusion of this whole fiasco is that replication studies can identify false positives: empirical phenomena that are just not there. The focus of this blogpost, however, is on theories, not on empirical replicability. So, what can we learn from cold fusion about theory building? Well, apparently there wasn't that much theory behind it in the first place. So yes, the implication is: Even physics has a story about how researchers published a sexy, unbelievable finding based on a wishy-washy theory, and led a whole research community down a rabbit hole trying to reproduce their results -- including a relatively recent replication failure published in Nature: https://www.nature.com/articles/s41586-019-1256-6.

What is the overall conclusion? Admittedly, I don't think we learn much from these three case studies that we didn't already know. It might be important insights, but those that are already part of the mainstream discussions -- otherwise, I probably wouldn't know about them in the first place. In theory building, as with most other complex things, it's easier to do things wrong than to do things right. This is because, for a theory building process to be right, all of the underlying processes have to be right. If one step is wrong, the theory is wrong. And there are many more ways to do things wrong than to do things right. Probabilistically, the odds are against us.

And yet it's probably worth considering exactly what has gone wrong when we have come up with wrong theories in the past. By eliminating possible mistakes, we can increase the ratio of right-to-wrong theory building processes. So, I'm looking forward to extending my collection on wrong theories! Please feel free to post any leads in the comments!

Monday, July 21, 2025

Redefining Reproducibility

A lot of good things start with “r”: Replicability, reproducibility, robustness, reading, running, relaxing, and if you include German words as well, then also “travelling” and “cycling”. Some of these r-words are more controversial than others. “Replicability” and “reproducibility” often occur with the word “crisis”, suggesting a negative connotation. On a more basic level, some r-words are more well-defined than others. While “running” is relatively well defined, everyone seems to insist on defining replicability and reproducibility in different ways. Perhaps this is one of the reasons why there is little consensus about whether or not there is a crisis, and arguably little progress in resolving a potential crisis.

In this blogpost, I aim to take a step back, and ask the question: What is replicability and reproducibility? Can we come up with a definition that is generalisable across fields and research steps? Perhaps, by re-thinking the terminology, we can get a tiny bit closer to narrowing down the issues and the reasons why these r-words are important for science.

How are the words “reproducibility” and “replicability” used? In my bubble, the most common way to use these words is as defined by The Turing Way (The Turing Way Community, 2022): Replicability refers to studies which apply identical experimental methods to an already published article, collect new data, and assess if the results are approximately in line with those in the published article. Reproducibility refers to analysing already existing data with identical methods. The implication is that reproducibility studies should obtain identical results as the original study, while we expect the results of replicability studies to vary due to sampling error.

There are two issues with the use of this terminology. The first is a lack of consensus across and even within fields. Famously, the study that arguably started the whole replication crisis debate referred to “reproducibility”, even though they were empirically estimating replicability. In fact, the title of the article was “Estimating the reproducibility in psychological science” (Open Science Collaboration, 2015). This is inconsistent with the definitions that were later proposed by The Turing Way. To complicate matters, other fields use different terminology. For example, in computational neuroscience, McDougal, Bulanova, and Lytton (2016) defined a replicable simulation as one that “can be repeated exactly (e.g., by rerunning the source code on the same computer)”, and a reproducible simulation as one that “can be independently reconstructed based on a description of the model” (p. 2). They further specify that “a replication should give precisely identical results, while a reproduction will give results which are similar but often not identical.” This is in contrast to the use of these terms in psychological science, where a replications suggests that the results should be approximately similar between an original study and a replication, and that a reproduction should yield identical results.

This brings us to the second issue, which I will pose as an open question. Are there any useful features that can be used to distinguish between reproducibility and replicability on a more general level? For example, using The Turing Way definition, reproducibility implies an exact repetition of a previous study’s processes. In a replicability study, some puzzle pieces are missing, which the replicator needs to re-create; classically, this would be the collection of data using existing experimental methods and materials. However, this feature of exactness versus approximateness is neither clear-cut nor generalisable across fields or research processes. For example, even in a reproducibility study, important information is often missing, and the reproducer needs to fill in the gaps, thus deviating from the concept of exactness (Seibold et al., 2021).

The Turing Way definition also applies neatly to the process of data analysis, as this is the focus of this community. However, how do we distinguish between reproducibility and replicability, for example, if we want to describe less new-data-heavy fields? For example, what are replicability versus reproducibility when one considers a systematic review? We can probably all agree that, when someone does a systematic review, they should transparently document their decision steps and search procedure. But how do we map the concepts of reproducibility and replicability onto this research process? Is a reproducibility possible or useful, given that across time, newly published studies may need to be included as the output of the systematic search?

To resolve these issues, it may be worth re-thinking how the word “replicability” is used. While the focus of The Turing Way – and possibly that of the main chunk of the scientific community – is on the level of the data analysis, we could consider shifting the focus from the data analysis to the process that a replicability study really wants to reproduce: The data collection process. This gives us a more narrow definition: A replicability study is one that aims to reproduce the data collection process. In this case, we are using the word “reproducibility” to define “replicability”. Does this lead us down a rabbit hole? Or, vice versa, does it help us to bring some clarity into what we actually mean when we use various r-words?

This may be one of these very rare occasions, when we can improve things by subtraction rather than addition (Winter, Fischer, Scheepers, & Myachykov, 2023). What if we remove the word “replication” from our vocabulary? This would leave us with “reproducibility”. If we want to refer to what we now call “replicability”, we would simply specify: “Reproducibility of the data collection process”. And if we want to talk about what we now call “reproducibility”, we would say: “Reproducibility of the data analysis process”, or if we want to be even more specific: “Reproducibility of the analysis script” or “Reproducibility of the reported results”.

There would be some advantages to such a shift. First, my impression is that explaining the difference between reproducibility and replicability, say, in my Open Science workshops, is more complicated than it should be. The proposed change in terminology would simplify things. Second, we’d create a more general terminology, that could be used across all fields in science and research. This should allow for more fruitful discussion across fields, allowing us to learn from each others’ mistakes and solutions. By using additional qualifiers and referring to the research steps that we have in mind when we talk about reproducibility, we wouldn’t lose any clarity and specificity. Third, we may shift the focus of the replication crisis debate away from the single step of data analysis and consider other research processes where reproducibility may be equally important.

Important for what, you may ask? A more generic definition calls for a more generic answer to the question what it is good for. Reproducibility exists on two levels: First, the researchers doing the original work should work in such a way that they document all relevant information. Second, reproducers ought to verify the original work. The obvious purpose of this is error detection. As much as everyone dislikes the idea of other people finding errors in one’s work, we can probably still agree that we don’t want to build on a research topic where the main finding reflects a banal coding error. The less obvious purpose is to identify alternative paths: For example, it may be clear that Researcher A inferred Y from X; Researcher B may question the validity of this inference and propose and test an alternative explanation. Also, perhaps less obvious to more experienced researchers, is the value in working reproducibly at all stages of the research process to allow others to learn from one’s work.

In summary, reproducibility is a good thing; terminological messes are not a good thing. The distinction between reproducibility and replicability may make matters overly complicated, and simplifying things by referring to “reproducibility” + specification of research process may be good.

References

Collaboration, O. S. (2015). Estimating the reproducibility of psychological science. Science, 349(6251).

Community, T. T. W. (2022). The Turing Way: A handbook for reproducible, ethical and collaborative research (1.0.2): Zenodo.

McDougal, R. A., Bulanova, A. S., & Lytton, W. W. (2016). Reproducibility in computational neuroscience models and simulations. IEEE Transactions on Biomedical Engineering, 63(10), 2021-2035.

Seibold, H., Czerny, S., Decke, S., Dieterle, R., Eder, T., Fohr, S., . . . Kopper, P. (2021). A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses. PloS One, 16(6), e0251194.

Winter, B., Fischer, M. H., Scheepers, C., & Myachykov, A. (2023). More is better: English language statistics are biased toward addition. Cognitive Science, 47(4), e13254.

Wednesday, June 4, 2025

On ghost writing in academia

LinkedIn suggested a job as a perfect match for my skills: Being a ghost writer. Curious, I clicked on the job description, thinking: “This can’t possibly be what it sounds like, right?” It was. According to the website, the ghost writing agency has their seat in Berlin, and they have anonymised reviews by happy university students who’d received a high grade on their assignment or thesis thanks to work written by the ghost writers and submitted in the students’ names, and non-anonymised profiles of ghost writers, along with their photos and the average rating by their customers. Three thoughts came to my mind, I don’t remember in which order: “How is this legal?”, “Well, I do have the skills for that job!”, and “Can we expect students not to rely on the services of ghost writers if many academics do the same?”

This blog post is about the third thought. In the form of paper mills, ghost writing may be more common that you’d think, but it’s a topic I don’t know much about and other people have written about it before (e.g., Anna Albakina & Dorothy Bishop, https://osf.io/preprints/psyarxiv/2yf8z_v1, https://osf.io/preprints/psyarxiv/6mbgv_v1). Relying on the services of paper mills is clearly beyond any grey areas, but there are other forms of what I’d personally call ghost writing and that seem to be perfectly acceptable in some circles in academia.

The specific case I’m thinking of is grant writing. It’s not uncommon that a grant is submitted in a professor’s name, but written by PhD students or postdocs. There are few cases that are black or white, as early career researchers contribute to varying numbers of sections to varying extents, ranging from brainstorming specific ideas to actually writing the whole proposal. As far as I know, there is no consensus about what is actually acceptable. Of course, there are many advantages of involving early career researchers in the grant writing process. They learn a lot: both about the scientific processes that are part of the steps between idea generation and having a plan, and about the arguably less pleasant sides of the academic profession associated with the pressure of grant writing. Furthermore, the idea is often that, if the grant is successful, the student or postdoc will be hired to work on it, so it’s nice to give them the opportunity to contribute their own ideas.

In my own grant proposals, I have received very valuable input from my PhD students. Though I’m sure that would have been able and genuinely happy to contribute much more than I allowed / asked them to, I always had the idea in the back of my mind that I didn’t want to submit their work as my own. Depending on the grant, it’s sometimes possible to add PhD students as contributors, but at other times there are formal limitations, such as all (co-)applicants needing to have a PhD, or that there can be only one applicant (e.g., as is the case for some ERC grants). My own policy is to limit colleagues’ contributions to providing comments and discussions when I can’t add them as co-applicants, so I don’t end up submitting sentences or even whole sections as my own intellectual work when I haven’t actually written them. However, I’m sure that’s not the only defensible thing to do. Thus, I don’t propose that what I do is what everyone should do – rather, maybe we should start rethinking the whole process of determining intellectual contributions, acknowledging people’s work, and awarding research funding?

Regardless of where we draw the line: Is grant ghost writing really that bad? Maybe it’s just common knowledge that a grant submitted by a professor was likely not actually written by the professor. I honestly don’t know how common this is – only that it’s common enough that, at a webinar about how to write academic CVs, there was a whole discussion about how to take credit for successful grant proposals that you wrote but are not listed on. This brings us to problem number one: Early career researchers don’t get formal credit for their work, even though they need it most. The second problem that I see is in the quality of the project: One would think (hope?) that the professor is more experienced and thus better able to write a high-quality proposal. By getting early career researchers involved too much, the chances of success are thus being diminished. In my view, this puts the professor in a bad light: By offloading their work, they are decreasing the chance that they will be able to support the early career researchers on their team by getting the grant to extend their contracts.

In a way, I think that this problem with grant ghost writing is yet another manifestation of the academic system not changing as fast as the world is. There are two changes that the grant writing system doesn’t seem to consider: First, the change from the lone genius idea to team science. Although it’s nice to have grants that specifically promote a promising researcher, it’s utopian, in most fields, to assume that a large project can be conceived, let alone executed, without intellectual contributions from others. Second, the change from getting a job via the good old boys' club to fierce competition. One of the arguments for involving early career researchers in the grant writing process is that they would have a job if the proposal is successful. Maybe in the past, appeasing your professor by helping them write the grant proposal would raise your chances of getting a job. Maybe the chances of getting the grant were higher back then. And if the grant proposal was not successful, the professor was probably in a better position to get you a job, anyway – either through his own funding, or by calling his old buddy and asking if they didn’t have a suitable position in their lab. Maybe this still happens more often than we think. Still, from anecdotal observations, it is much more important for an early career researcher to get on their own two feet, both subjectively and objectively. Subjectively, one doesn’t want to be known as “Professor such-and-such’s PhD student”, even years after graduation. Objectively, one doesn’t want to – and simply cannot – rely on a single person’s good will to get a job. Although connections, of course, help in getting a job, the competition is with colleagues who have worked on their own projects, associated with gaining funding as a principal investigator and publishing first-author papers on topics that are not spin-offs of the professor’s interests.

Is grant ghost writing morally better than students submitting ghost-written assignments and theses? “Real” academia, as opposed to the bachelor thesis of a student without academic ambitions, is a joint venture, where it may simply be understood that a single piece of writing is not the intellectual property of whoever wrote it, but of everyone who contributed, directly and indirectly. Still, it’s not easy to pinpoint exactly what makes grant ghost writing better than ghost writing by students in university assignments. In both cases, the person in whose name the work is submitted gets an advantage – either a good grade or a degree that they don’t deserve, or funding and a stronger CV. It’s difficult to say if more is at stake in the former or the latter case. And what about the ghost writers? Well, I could do some further research to see if postdocs and PhD students get a better salary, on average, than the ghost writers at the Berlin-based company. But somehow that feels like it would be beside the point…

Thursday, November 21, 2024

Conducting cross-linguistic research on reading: First lessons learned from my experience with recruiting international collaborators

Admittedly, I have yet to publish any large-scale cross-linguistic study. Actually, I have not even completed data collection for such a study yet. Cross-linguistic research on reading is hard. It is, however, very important, as has been argued in a couple of high-profile publications in the last years (Blasi et al., 2022; Huettig & Ferreira, 2022; Share, 2021; Siegelman et al., 2022; Vaid, 2022). So, despite not having anything to show in terms of a successfully completed study, I thought I would share my experiences with attempting to conduct cross-linguistic research, and specifically, recruiting collaborators in very different languages and cultures. Perhaps this will be useful for my fellow anglocentric or eurocentric researchers, or perhaps some of the readers of this blog post will have some ideas or insights about open questions or how I should do things better in the future.

Cognitive processing underlying reading across languages has been a focus of my research since my PhD. I’m afraid that I did not particularly contribute to overcoming the focus, in the published literature, on English and its close relatives, given that in my thesis, I compared reading in English and German. Afterwards, although I did some work on statistical learning and meta-science, I have found myself returning to the topic of reading across languages, as this topic has always fascinated me. A few years ago, I got a grant from the German Research Foundation to compare single-word reading aloud in a handful of European languages. In addition to working on this study, I am currently hoping to extend this work beyond Europe, and have by now reached out across a number of countries and continents to collect data in orthographies that, to date, we know relatively little about (relative to English, at any rate). For the purpose on this current blog post, I would like to talk about some of the challenges that I have come across. I don’t want to provide a list of all of the languages and countries where I have (successfully or unsuccessfully) approached potential collaborators: I don’t, by any means, want to imply that the challenges reflect anything bad, but I still prefer not to publicly map any of the challenges that I have experienced to any specific culture.

In conducting cross-linguistic research, finding collaborators is the first step. For pragmatic purposes, you need someone to recruit participants and co-ordinate the data collection. You also need someone who knows the language in question: Even if you are working with an amazing, high-quality corpus, you need someone to check your stimuli and remove any items that may be inappropriate for whatever reason (e.g., years ago, I heard a story about a non-English native speaker running a study with English-speaking children who had to get the “pseudoword” C*NT removed from her list of stimuli). You need to check if your instructions have been translated correctly, and if they even make sense. And, importantly, involving speakers of the language in question will allow you to identify aspects of the language that are of interest, but that may be so different from the features of your own language that you are not aware that they exist (Schmalz et al., 2024).

So, how does one go about finding collaborators? It is easy if the language is sufficiently well represented in your research community that you can approach people at conferences, or email researchers who have already published studies on your topic of interest in their respective language. However, the less well-represented the language is, the more difficult it becomes. I don’t want to pretend to know the best solution, but rather want to summarise the challenges that I have been facing, and some completely subjective thoughts about how to approach certain situations. Of course, international researchers are as diverse as the languages that they speak, and so are the cultures within which they live and work. Thus, the challenges that I list do not apply across the board, and other challenges may appear in different contexts. But as far as my experience goes, these are some considerations that I’ve come across:

1) Cognitive science is not established as a science everywhere. I study how children learn to read, what makes it challenging for them to read, and how reading works in adults. If I rattle off this elevator pitch, everyone, no matter their background, gets some idea of what I’m doing. However, the study of reading is not considered a science everywhere: In many places, the topic falls under “humanities”, and the attitude towards understanding how reading works may differ from our approach, which involves cognitive theories, computational models, and rigorous empirical testing. This may lead to some confusion about what it is that I’m doing exactly, and differences in our ideas about how to do experiments. I don’t have a solution to this, but have concluded that it is important to gauge in advance to what extent a collaborator is open towards a cognitive approach. After all, while there is value in more qualitative approaches, this just isn’t my expertise or research focus. Nevertheless, it is important to bear in mind that there will always be some differences in the scientific approach: After all, differences tend to increase with geographical distance, and unless your proposed collaborator has spent some time in the same lab as you, they will have different ideas about best practice in research. Incorporating fresh insights from their side will take your research to the next level.

2) There are cultural differences in communication. These go beyond the stereotypes that one may think of: when I started a large-scale international collaboration, I found myself wondering if I will need to give different deadlines to different countries to make sure that everyone will submit their work by the actual deadline; however, the speed at which the collaborators completed the task was not at all related to any stereotypes. Instead, one striking example of cultural differences was in providing feedback. Some cultures are more straight in providing feedback (e.g., “This is wrong, you made a mistake” vs. “I’m sure I’m missing something, but I’m wondering if you have considered the possibility…”). In addition, in some cultures, people might not want to express any criticism at all, whether it is because they assume that you know what you are doing, or that it’s your responsibility to take the consequences for your own mistakes (i.e., that you’re an idiot, but that’s none of their business), or because they see you as someone whose authority should not be challenged. In other cultures, people cannot stand watching someone else doing something that they consider wrong without providing unwanted advice or a not very diplomatic commentary (I’m guilty of this myself, and I blame my German half for this).

Then there are more subtle differences that may come across as rude or inconsiderate, without us even being able to put our finger on them. The way we address people when writing emails is very variable, with many personal pet peeves and cultural differences. Some people may automatically put an email to the spam folder if it addresses them without mentioning their names; in some cultures, starting an email with “Dear colleague” is considered very polite. Perhaps you have received emails from international students with unconventional formulations – I strongly encourage everyone to look past their personal pet peeves and potential spelling mistakes in their names, and respond to each email, taking the time and respect that they would show any other colleague. After all, a student enquiring about the possibility of doing a research thesis with you may be your international collaborator tomorrow, regardless if you are able to help them at the time.

3) Bureaucracy. Communicating with cross-linguistic collaborators has been stimulating, insightful, and fun. I can absolutely not say this about my local administration. If you have some funding for cross-linguistic research, you need to consider the bureaucracy that goes into transferring that money abroad to your collaborators. In my case, my local university’s administration stalled this process for over two years, because the relevant department is chronically understaffed. Maybe you are lucky and things are different at your department. But in any case, it may be worth doing some research about the relevant procedures in advance, and plan a very, very generous buffer in your planning. None of my collaborators’ universities have taken as long as my own institution to process the paperwork, but even here, processing times have varied, especially if any action was required at the time when most people in the country were on holidays (obviously, the timing and duration of the university holidays vary).

4) Language is more than just language. You might be super enthusiastic about a language that you are about to examine. But the native speakers of this language are very likely to have a deeper attachment to the language. For example, maybe their language is a part of a cultural identity that was historically repressed. Maybe I am stating the obvious (though I call myself a psycholinguist, my background and education are in psychology, not in linguistics). Nevertheless, it is important to treat each language with respect and to be mindful of people’s potential attachment to their language.

An example is a recent experience that I had in a non-academic context. When it comes to reading in Arabic, I know that there is some research showing the effect of diglossia: As most people in the Arabic world speak a dialect that has varying degrees of divergence from Modern Standard Arabic, they learn to read in a language that is different than what they learn at home. I mentioned this to an Arabic speaker, who started explaining to me why his dialect is the closest to Modern Standard Arabic. As it turns out, there seems to be (at least for some people) prestige associated with a dialect being closer to MSA, probably also for religious reasons. Being mindful of the importance that people (yes, researchers are people, too!) attach to their languages is important. At the same time, to avoid looking like you have a hidden agenda, it might be worth emphasising that you are neutral about certain aspects, and have a reason for studying a language that does not aim to either support or dispute a contentious claim.

5) People might be self-conscious about some things that you are not aware of. The previous point relates to attitudes that researchers may have towards their languages, but there may also be other beliefs and attitudes that may affect your communication with an international collaborator, or their willingness to collaborate with you. There may be some issues that have not even crossed your mind but that affect how a potential collaborator will evaluate you and your research proposal. If you are of European descent, people might be a priori suspicious about your coming in and pushing your own research idea. As an example, I once wanted to start a research project in collaboration with a country where multilingualism is the norm, and for reasons that have to do with colonialism, most people grow up with a language of instruction that is different from their home language. Unfortunately, that collaboration didn’t work out. In retrospect, I’m afraid that the reason for this is as follows: The way that I presented the project may have come across as wanting to show that it’s problematic that the people in that country study in a different language than they speak at home, or that people speak a different language than what is used at school and university. This was not my intention, as I genuinely believe that multilingualism brings nothing but benefits. It simply did not occur to me, at the time, that my project idea may be construed that way.

6) Political issues. Your research may be completely unpolitical, but unfortunately, political issues may affect if and how you can do cross-linguistic studies. For example, my funder no longer allows for its money to be used in a way that involves exchanging data with researchers based in Russia. Such sanctions affect collaborations on a formal level, even if all researchers involved share the same values as you, and might even be keen to build connections to escape an oppressive regime. If a project involves a collaboration with researchers in numerous countries, there may also be sanctions between the respective countries, and some may explicitly prohibit a researcher from Country X to collaborate with any researcher based in Country Y. If this is the case, you might end up with the dilemma: Do I exclude the researcher from Country X, the researcher from Country Y, or do I salami slice the project and make two separate publications out of it? The restrictions may be formal, issued by a funding body or university, but they may also be more subtle. Some people may be very nervous about being in contact with colleagues of a certain nationality, even in the absence of any official sanctions. From the outside, we cannot judge the extent to which this nervousness is justified. I see our role as trusting our collaborators, asking, when necessary, so we understand the limitations and boundary conditions, showing our moral support, and – above all – ensuring that we do not put collaborators into unpleasant or even dangerous situations.

On the personal level, my experience has been exclusively positive: Even when I’ve been working together with researchers whose home countries don’t get along at all, the individuals have been very respectful and friendly towards each other: as always, it is important not to assume that the actions of a government reflect the attitudes of the people.

The bottom line. All around the world, children start off with the same broad cognitive structures. The way that these structures deal with the different scripts and orthographies is a fascinating question, which we are only beginning to investigate systematically. There are certainly many reasons why the science of reading is focussed on English and its European relatives. Are researchers studying reading in English and other European orthographies reaching out to researchers abroad? My suspicion is that the answer to this question is “no”. A lack of experience with people from other cultures may be a major reason. In the past few years, I have worked with people from all continents aside from Antarctica, which has been a very enriching but humbling experience. Despite having started off as someone from a bicultural family and having lived on three different continents, I continue to learn from my international collaborators, both about how to be a better colleague and a better researcher. This is why I, despite being far from an expert on cross-cultural collaboration, have decided to write up my experiences. I hope that my experience report will encourage cross-cultural collaboration, increased awareness of things to think about when approaching or communicating with potential collaborators, and discussions about how to act in a culturally sensitive and open-minded way.

References

Blasi, D. E., Henrich, J., Adamou, E., Kemmerer, D., & Majid, A. (2022). Over-reliance on English hinders cognitive science. Trends in Cognitive Sciences.

Huettig, F., & Ferreira, F. (2022). The Myth of Normal Reading. Perspectives on Psychological Science, 17456916221127226.

Schmalz, X., Breuer, J., Haim, M., Hildebrandt, A., Knöpfle, P., Leung, A. Y., & Roettger, T. B. (2024). Let’s talk about language—and its role for replicability. https://osf.io/preprints/metaarxiv/4sb7c

Share, D. L. (2021). Is the science of reading just the science of reading English? Reading Research Quarterly, 56, S391-S402.

Siegelman, N., Schroeder, S., Acartürk, C., Ahn, H.-D., Alexeeva, S., Amenta, S., Bertram, R., Bonandrini, R., Brysbaert, M., & Chernova, D. (2022). Expanding horizons of cross-linguistic research on reading: The Multilingual Eye-movement Corpus (MECO). Behavior Research Methods, 1-21.

Vaid, J. (2022). Biscriptality: a neglected construct in the study of bilingualism. Journal of Cultural Cognitive Science, 6(2), 135-149.

Friday, April 19, 2024

Is Open Science passé?

I recently got a grant rejection - but I swear this blog post won't be whinging about rejections. Rather, I'd like to start by sharing a reviewer's comment that surprised me: The proposal is not novel, they wrote, because it's all about replicability and open science and blah blah blah, and we already know all about that since the Ioannidis 2005 paper, published almost 20 years ago now! Yes, I replied to the reviewer in my head. But have we actually solved this issue?

In a way, I understand where the reviewer is coming from. The other day, I opened the latest issue of the German academic journal, "Forschung und Lehre". On the first page was an article about the p-value, and how it doesn't mean what many researchers seem to think it means. "But we've been talking about that for decades now, surely everyone already knows this!", I thought and skipped to the next page.

Today, I taught a workshop on Open Science for a masters programme. I've been doing similar courses for similar audiences for a number of years now. Every year, I show a slide with the results of the Open Science Collaboration (2015) replication efforts. "Who has heard of this study before?" I ask. I started teaching in about 2016, and found that most students, including bachelor students, were familiar with the study and its provocative results. Today, what I was presenting seemed to be new to many students. On the one hand, that's good for me - I was able to tell the students something new, rather than repeating things they already knew, anyway. On the other hand, I wondered, do people not care about replicability any more?

The Open Science community, at the beginning, was a close-knit group on twitter. My reputation in academia (such as it is) is largely thanks to this community: from the beginning, I was active by tweeting and writing blog posts about Open Science, and within the community, such posts were spread widely. However, long before this community was scattered across various alternative platforms such as Mastodon and BlueSky, it had grown into fractions that spent a lot of their time fighting each other. Fashions come and go - I have learned that in my teenage years, after which I made the conscious decision to ignore all clothing trends. So maybe Open Science is just not cool anymore.

This raises the question: Has the open science movement failed in it mission to improve science? Or, on the contrary, did it solve the issues so efficiently that it is no longer needed? The first scenario is, unfortunately, more likely. I myself am guilty of having been too dogmatic and over-simplifying, in my mind, the ways in which Open Science can, and should, improve science. But has Open Science really unleashed its full potential in improving science? I sincerely believe that this is not the case. I feel like the discussions about how Open Science works and, indeed, what outcome we want to achieve, is only just starting to take shape. Many questions remain, such as: What is important for good research? Via what mechanisms do Open Science practices impact the research quality (positively or negatively)?

This blog post, again, has more open questions than answers. So, dear Reviewer 2, if you're reading this blog post: When you review proposals involving reproducibility and Open Science, please don't reject them on the basis that we already know everything already.

Wednesday, March 13, 2024

Why working as a postdoc under WissZeitVG is not compatible with family: An experience report

I had to force my hand to sign my last work contract. A work contract for 12 more month, plus an additional document with a justification that my activities will contribute to my further qualifications, so that the contract can fall under the Wissenschaftszeitvertragsgesetz - a blatant lie, as I've already finished my habilitation, the highest qualification one can achieve. I bit my tongue, knowing that any cynical comment from my side would achieve nothing but ruining the day of the admin lady. I left her office, not feeling happy, as I previously had whenever I signed a contract that enabled me to get paid for doing what I love. My main thought was that I'd drawn yet another line in a perverse game of hangman.

The Wissenschaftszeitvertragsgesetz - WissZeitVG - limits the amount of time that one can work as a postdoc. One has 6 years to either get a professorship or to quit academia. Of course, it's not easy to get a professorship at all, let alone one that doesn't require the uprooting and moving of the whole family. Having a realistic chance of a professorship comes with a lot of pressure to publish and get grants. Additionally, in my case, all of my salary - and that of my PhD students - is paid from grants. This puts existential pressure to receive even more grants. At the same time, of course, don't neglect the publications, which you need not only to get a position, but also to get more grants. In short, a vicious cycle.

Even without the additional factor of family, being in a senior postdoc position seems incompatible with doing high-quality research: I have to write strong, innovative grants, but don't have the time to write proposals that are strong enough that even I myself find them convincing. On top of that, I have to work on my ongoing project and publish as much as possible from them. Add to that the standard admin tasks, and one ends up with a bunch of half-finished projects and very little time to drive any of them forward.

I used to take pride in my ability to work efficiently. I didn't think that having a child would make a huge difference to this ability, but it does. I returned from parental leave after only 4 months. I gradually increased my working hours to 80%. In reality, I work more than that, but I still get less done that I would have BC (before child). I think what I miss most is having a large stretch of time. Now, it is no longer an option to spend all of Saturday working on a paper, or to stay in the office till 8pm to finish writing a grant section. It just isn't. And that seems to make the difference between working in a demanding, but fulfilling job, and constantly feeling like one is failing at juggling with raw eggs.

On the surface, these may seem to be unrelated problems - working in academia is hard, and being a working mum is hard. Academia has never been a walk in the park, but I don't make choices in life because I want to make things easy. Why I came to the conclusion that family life and working under the WissZeitVG is incompatible for me is that my productivity took a large hit - precisely at the time when I have to work harder than I ever did to have any chance to climb to the top. I'd love for someone to tell me: "While you have a small child, you *won't* be as productive as you were previously - and that's OK!" But it's not OK - because by the time he'll grow up and I'll be able to return to my previous levels of efficiency, I'll have been kicked out of the system a long time ago by the WissZeitVG.

I made a decision after I'd signed my last work contract: I will never sign another contract under the WissZeitVG again. And if this means I'll have to leave academia, then so be it. The alternative would be to stick around for yet another year, again and again, apply for more grants, hoping that something will come along. Maybe I'd win the lottery, but maybe I wouldn't. And in the meantime, I wouldn't have any time to do what I love, anyway, which is producing research of a quality that I'm happy with.

Like a love-sick teenager, I can't help but wonder: Can this really be it? How can it be over when it was so nice while it lasted? Would I give academia another chance if it wants me back? Of course, if I got a permanent position, the equation would change. But at some stage, the conclusion that some things are not worth it becomes inevitable.