Thursday, August 8, 2019

On grant proposal writing

The year 2018 was very successful for me in terms of grants: My success rate skyrocketed from close to 0% to 100%. It’s a never-ending story, though, so now I’m finding myself writing even more grant proposals, which led me to procrastinate and write a blog post about grant proposal writing. Given my recent successes, I could frame this blog post as a set of advices for other aspiring grant writers. However, frankly, I have no idea why my success rate changed so abruptly. Also, I don’t really want to sound like this guy. 

Nevertheless, I got a lot of advice from different people about grant writing over the years. Maybe it can be useful to other people. It will also allow me to organise my own thoughts about what I should consider while writing my proposals. So, here goes:

Advice # 1: Be lucky. Even if your proposal is amazing, the success rates tend to be low, and many factors aside from the grant quality will affect whether it is successful or not. You may want to repress this thought while writing the proposal. Otherwise the motivation to invest weeks and months into planning and getting excited about a project will plummet. However, as soon as I submit the proposal, I will try to assume an unsuccessful outcome. First, it will motivate me to think about back-up plans, and second, it will weaken the bitterness of the disappointment if the funding is not granted.

One aspect where luck plays a large role is that a lot depends on the reviewers. In most schemes that I have applied for, the reviewer may be the biggest expert in the field, but they may also be a researcher on a completely different topic in a vaguely related area. So a good grant proposal needs to be specific, to convince the biggest expert that you have excellent knowledge of the literature, that you have not missed any issues that could compromise the quality of your project, and that every single detail of your project is well-thought-through. At the same time, the proposal needs to be general, so a non-expert reviewer will be able to understand what exactly you are trying to do, and the importance of the project to your topic. Oh, and, on top of that, the proposal has to stay in the page limit.

Throughout the last years, I have received a lot of very useful advice about grant writing, and now that I’m trying to summarise it all, I realise how conflicting the advice sometimes is. I have asked many different people for advice, but most of them are regularly involved in evaluating grant proposals. This is one demonstration of how important luck is: Maybe you will get a grant reviewer who expects a short and sexy introduction which explains how your project will contribute to the bigger picture of some important, global social problem (e.g., cancer, global warming). Maybe you will get a reviewer who will get extremely annoyed at an introduction which overblows the significance of the project.

Advice #2: Think about your audience. When I search for possible reasons for my abrupt change in success rate, this is a possible candidate. The advice to think about one’s audience applies to everything, and it is widely known. However, for a beginning grant writer it is sometimes difficult to visualise the grant reviewer. Also, as I noted above, a reviewer may be the biggest expert in the field, or it could be someone who doesn’t know very much about it. Thus, in terms of the amount of detailed explanations that you put into the proposal, it is important to find the right balance: not to bore the reviewer with details, but provide enough details to be convincing. The prior probability of the reviewer being the biggest expert is rather low, if we consider that non-experts are much more common than people who have very specialised knowledge about your specific topic. Thus, when in doubt, try to explain things, and avoid acronyms, even if you think that it’s assumed knowledge for people in the field.

Reviewers are, in most cases, academics. This means that they are likely to be busy: make the proposal as easy-to-read as possible. Put in lots of colourful pictures: explaining as many things as possible in figures can also help to cut the word count.

This also means that they are likely to be elderly men. This realisation has brought up a very vivid image in my mind: if the proposal is ‘good’, the reviewer is should come home to his wife, and, while she passes him his daily glass of evening brandy, he will tell her (read this in a posh British accent, or translate in your head to untainted Hochdeutsch): “My dear, I learned the most interesting thing about dyslexia today…!”

Advice #3: Get as much feedback as possible. Feedback is always good: I try to incorporate everything anyone tells me, even if in some cases I don’t agree with it. Thoughts such as “Clearly, the person giving the feedback didn’t read the proposal thoroughly enough, otherwise they wouldn’t be confused about X!” are not very helpful: if someone giving you feedback stumbles over something, chances are that the reviewer will, too. Sometimes, the advice you get from two different people will conflict with each other. If at all possible, try to find a way to incorporate both points of view. Otherwise, use your best judgement.

Most universities have an office which helps with proposal writing: they are very helpful in giving advice from an administrative perspective. Different funding agencies have different requirements about the structure and the like (which is also why I’m trying to keep the advice I summarise here as general as possible). Grant offices are likely to give you good advice about the specific scheme you are applying for. They may also allow you to read through previous successful applications: this can be helpful in getting a better idea about how to structure the proposal, how to lay-out the administrative section, and some other issues that maybe you missed.  

Colleagues can give feedback about the content: they will point out if something is more controversial than you thought, if there are problems with some approaches than you have not thought about, and provide any important references that you may have missed. Ask colleagues with different backgrounds and theoretical ‘convictions’. Friends and relatives can help to make sure that the proposal is readable to a non-expert reviewer, and that the story, as a whole, makes sense.

In some ways, submitting a grant proposal is a lot like buying a lottery ticket that costs a lot of time and your career probably depends on it. However, it is also the daily bread of someone striving for an academic career, so it is important to try to make the best of it. In an attempt to end this on a positive note (so I feel motivated to get back to my proposal): Applying for ‘your own’ project may give you the flexibility to work on something that you really care about. It takes a lot of time, but this time is also spent on thinking through a project, which will make its execution run more smoothly afterwards.

The advice above is not comprehensive, and from my own biased view. I would be very happy to read any corrections or any other advice from the readers in the comments section.

Friday, August 2, 2019

Getting a precise RT estimate for single items in a reading aloud task

For Registered Reports, grant applications, ethics applications, and similar documents, researchers are expected to provide a power calculation. From my own experience, and from talking with colleagues in many different contexts, this is often a hurdle. Calculating power requires an effect size estimate. Sometimes, we try new things and have no idea what the size of the effect will be: even if we have some pilot data, we know that the observed effect size is variable when the sample size is small (Whitehead et al., 2016). We might have data from a previous study, but we also know that the presence of publication bias and questionable research practices leads to systematic over-estimation of the true effect size (Vasishth et al., 2018). The design of our study might be complex, and we don't really know which boxes to tick in G*Power. We might not even be sure what kind of effects we're looking for: if our study is more exploratory in nature, we will not know which statistical tests we will conduct, and calculating a formal power analysis would not make much sense, anyway (Nosek & Lakens, 2014). Still, we need to find some way to justify our sample size to the reviewers.

In justifying our sample size, an alternative to a power analysis is to plan for a certain degree of precision (e.g., Kelley et al., 2003). For estimating precision, we use our a priori expectation of the standard deviation to calculate a confidence interval that guarantees that, in the long run, our observed estimate is within an acceptable bound. Again, we have a freedom in deciding the width of the confidence interval (e.g., 80%, 90%, 95%), and we need to have an estimate of the standard deviation.

In the current blog post, I'd like to answer a question that is relevant to me at the moment: When we do a reading aloud study, a number of participants see a number of words, and are asked to read it aloud as accurately and quickly as possible. The variable which is analysed is often the Reaction Time (RT): the number of milliseconds between the appearance of the item and the onset of the vocal response. The items are generally chosen to vary in some linguistic characteristic, and subsequent statistical analyses would be conducted to see if the linguistic characteristics affect the RT.

In most cases, the data would be analysed using a Linear Mixed Effect model, where item- and participant-level characteristics can be included as predictor variables. More information about calculating power and required sample sizes for Linear Mixed Effect models can be found in Brysbaert and Stevens (2018) and Westfall et al. (2014); and a corresponding app can be found here. Here, I ask a different question: If we look at a single items, how many participants do we need to obtain stable estimates?

On the surface, the logic behind this question is very simple. For each item, we can calculate the average RT, across N participants. As N increases, the observed average should approach a hypothetical true value. If we want to see which item-level characteristics affect RTs, we should take care to have as precise an estimate as possible. If we have only a few participants responding to each item, the average observed RT is likely to vary extensively if we ask a couple of more participants to read aloud the same items.

As a complicating factor, the assumption that there is a true value for the average RTs is unreasonable. For example, familiarity with a given word will vary across participants: a psychology student is likely to respond faster to words that they encounter in their daily life, such as "depression", "diagnosis", "comorbidity", than someone who does not encounter these words on a regular basis (e.g., an economics student). Thus, the true RT is more likely to be a distribution rather than a single point.

Leaving this important caveat aside for a minute, we return to the basic principle that a larger number of observations should result in a more stable RT estimate. In a set of simulations, I decided to see what the trajectory of a given observed average RT is likely to look like, when we base it on the characteristics that we find, for various words, in the large-scale Lexicon projects. The English Lexicon Project (Balota et al., 2006) has responses for thousands of items, with up to 35 responses per item. In a first simulation, I focussed on the word "vanishes", which has 35 responses, and an average reading aloud RT of 743.4 ms (SD = 345.3), including only the correct responses. Based on the mean and SD, we can simulate the likely trajectories of the observed average RTs at different values of N. Using the item's mean and SD, we simulate a normal distribution, and draw a single value from it: We have an RT for N = 1. Then we draw the next value and calculate the average between this first and second values. We have an average RT for N = 2. We can repeat this procedure, while always plotting the observed average RT for each N. Here, I did this for 35 participants: this gives a single "walk", where the average RT approaches the RT which we specified as a parameter for our normal distribution. Then, we repeat the whole procedure, to simulate more "walks". The figure below shows 100 such "walks".

As expected, the initial average RTs tend to be all over the place: if we were to stop our simulated data collection at N = 5, we might be unlucky enough to get an estimate of 400 ms, or an estimate or 1200 ms. As the simulated data collection progresses, the variability between the "walks" diminishes, and at N = 30 we would expect the observed average RT to lie somewhere between 600 ms and 1,000 ms.

Analytically, the variability at different values of N can be quantified as confidence intervals: the proportion of times that we expect the average RT to exceed the interval, in the long run. The width of the confidence intervals depends (1) on the confidence level that we'd like to have (fixed here at 95%), (2) the population standard deviation (σ), and (3) the number of participants. Now, we don't really know what σ is, but we can get some kind of plausible range of σ-values, by looking at the data from the English Lexicon Project. I first removed all RTs < 250 ms, which are likely to be miscoded. Then I generated a box-plot of the SDs for all items:

The SDs are not normally distributed, with quite a lot of very large values. However, we can calculate a median, which happens to be SDmedian ≈ 200; a 20% quantile, SDlower ≈ 130; 80% quantile, SDupper ≈350, and a pessimistic estimate by taking the location of the upper bar in the boxplot above, SDpessimistic ≈ 600. For each of these SD estimates, we can calculate the 95% confidence interval for different values of N, with the formula: CIupper = 1.96*(σ/sqrt(N)); CIlower = CIupper * (-1). To calculate the expected range of average RTs, we would add these values to the average RTs. However, here we are more interested in the deviations from any hypothetical mean, therefore we can simply focus on the upper bound; the expected deviation is therefore CIupper * 2. 

Next, I plotted CIupper as a function of N for the different SD estimates (low, median, high, and pessimistic):

So, if we have 50 participants, the expected range of deviation (CIupper * 2) is 72 ms for the low estimate, 110 ms for the median estimate, 194 ms for the upper estimate, and 332 ms for the pessimistic estimate. For 100 participants, the range reduces to 50 ms, 78 ms, 137 ms, and 235 ms, respectively.

What does all of this mean? Well, at the end of this blog post we are still left with the situation that the researcher needs to decide on an acceptable range of deviation. This is likely to be a trade-off between the precision one wants to achieve and practical considerations. However, the simulations and calculations should give a feeling of what number of observations is typically needed to achieve what level of precision, when we look at the average RTs of single items. The general take-home messages can be summarised as: (1) It could be fruitful to consider precision when planning psycholinguistic experiments, and (2) the more observations, the more stable the average RT estimate, i.e., the less likely it is to vary across samples.


Link to the analyses and simulations:


Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., ... & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445-459.

Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1(1).
Kelley, K., Maxwell, S. E., & Rausch, J. R. (2003). Obtaining power or obtaining precision: Delineating methods of sample-size planning. Evaluation & the Health Professions, 26(3), 258-287.

Nosek, B. A., & Lakens, D. (2014). Registered Reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137-141.

Vasishth, S., Mertzen, D., Jäger, L. A., & Gelman, A. (2018). The statistical significance filter leads to overoptimistic expectations of replicability. Journal of Memory and Language, 103, 151-175.

Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143(5), 2020.
Whitehead, A. L., Julious, S. A., Cooper, C. L., & Campbell, M. J. (2016). Estimating the sample size for a pilot randomised trial to minimise the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Statistical Methods in Medical Research, 25(3), 1057-1073.

Friday, May 24, 2019

The perfect article

Last year, I went to an R Ladies event. This event took place at the Süddeutsche Zeitung, one of the biggest and most serious newspapers in Germany. The workshop was presented by two R Ladies from the data-driven journalism department of the newspaper. The event was extremely interesting: as it turns out, the job of a data-driven journalist is to collect or find data, and present it to the readers in an understandable way. One project which was presented included an analysis of the transcripts from the Bundestag meetings, presented in easy-to-digest graphs. Another project contained new data on the very socially relevant question of housing prices in Germany.

Throughout the event, I kept thinking: They are much further in terms of open communication than we are. As an essential part of their job, data-driven journalists need to present often complex data in a way that any interested reader can interpret it. At the same time, the R Ladies at the event kept emphasising that the data and R/RMarkdown scripts were publicly available, for anyone who doubted their conclusions or wanted to try out things for themselves.

This brings me to the idea of what the perfect article would look like. I guess you know where this is going, but before I go there, to avoid disappointment, I will add that, in this blog post, I will not give any advice on how to actually write such a perfect article, nor how to achieve a research world where such articles will be the norm. I will just provide a dreamer’s description of a utopian world, and finish off with some questions that I have no answer for.

The perfect article would have a pyramidal structure. At the top layer would be a description of the study, written at a level that a high school student should understand it. The data could be presented in an interactive shiny app, and there would be easy-to-read explanations of the research question, its importance, how the data should be interpreted to answer this research question, and any limitations that may affect the interpretation of the data.

Undergraduate students in the field of study (or very interested readers) would be navigated to a more detailed description of the study, which describes the research methods in more detail. Here, the statistical analyses and the theoretical relevance would need to be explained, and a more thorough description of methodological limitations should be provided.

The next level would be aimed at researchers in the field of study. Here, the study would need to be placed in relation to previous work on this topic, and a very thorough discussion of the theoretical implications would be needed.

The final level would include all the data, all the materials, and all the analysis script. This level would be aimed at researchers who plan to build on this work. It will allow them to double check that the results are robust and that there are no mistakes in the data analysis. They would also be able to get the materials, allowing them to build as closely as possible on previous work.

Even in an ideal world, this format would not be suitable for all fields. For example, in theoretical mathematics, it would probably be very difficult to come up with a project that could be explained to a lay audience through a shiny app. More applied mathematics could, however, be presented as the deeper layers of a project where these methods are applied.

Many practical concerns jump out of my perfect article proposal. Most obviously, an article of this form would be unsuitable for a paper format. It would, however, be relatively straight-forward to implement in online journals. This, however, would require expertise that not all academic authors have. (In fact, I would guess: an expertise that most academic authors don’t have.) Even for those that do have the skills, it would require much more time, and as we all know, time is something that we don’t have, because we need to publish in large quantities if we want to have a job. Another issue with this format is: many studies are incremental, and they would not be at all interesting to a general audience. So why spend time on creating the upper layers of the pyramid?

A solution to the last issue would be to completely re-think the role that papers have in the academic process. Instead of publishing papers, the mentality could switch to publishing projects. Often, a researcher or lab is concerned with a broader research question. Perhaps what would be, in our current system, ten separate publications could be combined to make a more general point about such a broad research question, which would be of interest to a general public. Such a switch in mind set would also give researchers a greater sense of purpose, as they would need to keep this broad research question in the back of their minds while they conduct separate studies.

Another question would fall out of this proposal to publish projects rather than individual studies: What would happen with authorship? If five different PhD students conducted the individual studies, some of them would need to give up their first authorship if their work is combined into a single project. Here, the solution would be to move away from an authorship model, and instead list each researcher’s contribution along with the project’s content. And, as part of the team, one could also find a programmer (or data-driven journalist), who would be able to contribute to the technical side of presenting the content, and to making sure that the upper layers of the presentation are really understandable to the intended audience.

The problem would remain that PhD students would go without first authorship. But, in an ideal world, this would not matter, because their contributions to the project would be clearly acknowledged, and potential employers could actually judge them based on the quality, not the quantity of their work. In an ideal world…

Thursday, May 16, 2019

Why I stopped signing my reviews

Since the beginning of this year, I stopped signing my peer reviews. I had systematically signed my reviews for a few years: I think I started this at the beginning of my first post-doc, back in 2015. My reasons for signing were the following: (1) Science should be about an open exchange of ideas. I have previously benefitted from signed reviews, because I could contact the reviewer with follow-up questions, which has resulted in very fruitful discussion. (2) Something ideological about open science (I don’t remember the details). (3) As an early career researcher, one is still very unknown. Signing reviews might help colleagues to associate your name with your work. As for the draw-backs, there is the often-cited concern that authors may want to take revenge if they receive a negative review, and even in the absence of any bad intentions, they may develop implicit biases against you. I weighed this disadvantage against the advantages listed above, and I decided that it’s worth the risk.

So then, why did I stop? There was a specific review that made me change my mind, because I realised that by signing reviews, one might get into all kinds of unanticipated awkward situations. I will recount this particular experience, of course, removing all details to protect the authors’ identity (which, by the way, I don’t know, but perhaps others might be able to guess with sufficient detail).

A few months ago, I was asked to review a paper about an effect, which I had not found in one of my previous studies. This study reported a significant effect. I could not find anything wrong with the methods or analyses, but the introduction was rather biased, in the sense that it cited only studies that did show this effect, and did not cite my study. I asked the authors to cite my study. I also asked them to provide a scatterplot of their data.

The next version of this manuscript that I received included the scatterplot, as I’d asked, and a citation of my study. Except, my study was cited in the following context (of course, fully paraphrased): “The effect was found in a previous study (citation). Schmalz et al. did not find the effect, but their study sucks.” At the same time, I noticed something very strange about the scatterplot. After asking several stats-savvy colleagues to verify that this strange thing was, indeed, very strange, I wrote in my review that I don’t believe the results, because the authors must have made a coding error during data processing.

I really did not like sending this review, because I was afraid that it would look (both to the editor and to the authors) like I had picked out a reason to dismiss the study because they had criticised my paper. However, I had signed my previous review, and whether or not I would sign during this round, it would be clear to the authors that it was me.

In general, I still think that signing reviews has a lot of advantages. Whether the disadvantages outweigh the benefits depends on each reviewer’s preference. For myself, the additional drawback that there may be unexpected awkward situations that one really doesn’t want to get into as an early career researcher tipped the balance, but it’s still a close call.

Thursday, April 4, 2019

On being happy in academia

tl;dr: Don’t take your research too seriously.

I like reading blog posts with advice about how to survive a PhD, things one wished one had known before one started a PhD, and other similar topics. Here goes my own attempt at writing such a blog post. I’m not a PhD student anymore, so I can’t talk about my current PhD experiences, nor am I a professor who can look back and list all of the personal mistakes and successes that have led to “making it” in academia. It has been a bit over 4 years since I finished my PhD and started working as a post-doc, and comparing myself now and then I realise that I’m happier working in academia now. This is not to say that I was ever unhappy during my time in academia, but some changes in attitude have lead to – let’s say – a healthier relationship to my research. This is what I would like to write this blog post about.

Don’t let your research define you
In the end, all of the points below can be summarised as: Don’t take your research too seriously. Research inevitably involves successes and failures; everybody produces some good research and some bad research, and it’s not always easy for the researcher to decide which it is at the time. So there will always be criticism, some of it justified, some of it reflecting the bad luck of meeting Reviewer 2 on a bad day.

Receiving criticism has become infinitely easier for me over the years: after getting an article rejected, it used to take at least one evening of moping and a bottle of wine to recover, while now I only shrug. It’s difficult to identify exactly why my reaction to rejection changed over time, but I think it has something to do with seeing my research less as an integral part of my identity. I sometimes produce bad research, but this doesn’t make me a bad person. This way, even if a reviewer rightfully tears my paper to shreds, my ego remains intact.

Picking a research topic
Following up from the very abstract point above, I’ll try to isolate some more concrete suggestions that, in my case, may or may not have contributed to my changed mindset. The first one is about picking a research topic. At the beginning of my PhD, I wanted to pick a topic that is of personal relevance, such as bilingualism or reading in different orthographies. Then, becoming more and more cynical about the research literature, I started following up on topics where I’d read a paper and think: “That’s gotta be bullshit!”

Now, I’ve moved away from both approaches. On the one hand, picking a topic that one is too passionate about can, in my view, lead to a personal involvement which can (a) negatively impact one’s ability to view the research from an objective perspective, and (b) become an unhealthy obsession. To take a hypothetical example: if I had followed up on my interest in bilingualism, it is – just theoretically – possible that I would consistently find that being bilingual comes with some cognitive disadvantages. As someone who strongly believes in the benefit of a multilingual society, it would be difficult for me to objectively interpret and report my findings.

On the other hand, focussing on bad research can result in existential crises, anger at poor researchers, a permanently bad mood, and from a practical perspective, annoying some people with high statuses while having a relatively small impact on improving the state of the literature.

My conclusion has been that it’s good to choose topics that I find interesting, where there is good ground work, and where I know that, no matter what the outcome of my research, I will be comfortable to report it.

Working 9-to-5
My shift in mindset coincides with having met my husband (during my first post-doc in Italy). As a result, I started spending less time working outside of office hours. Coming home at a reasonable time, trying out some new hobbies (cross-country skiing, hiking, cycling), and spending weekends together or catching up with my old hobbies (music, reading) distracts from research, in a good way. When I get to work, I can approach my research with a fresh mind and potentially from a new perspective.

Having said this, I’ve always been good at not working too hard, which is probably the reason why I’ve always been pretty happy during my time in academia. (Having strong Australian and Russian cultural ties, I have both the “she’ll be right” and the “авось повезёт” attitudes. Contrary to popular belief, a relaxed attitude towards work is also compatible with a German mindset: in Germany, people tend to work hard during the day, but switch off as soon as they leave the office.) At the beginning of my PhD, one of the best pieces of advice that I received was to travel as much as possible. I tried to combine my trips with lab or conference visits, but I also spent a lot of time discovering new places and not thinking about research at all. During my PhD in Sydney, I also pursued old and new hobbies: I joined a book club, an orchestra, a French conversation group, took karate lessons, and thereby met lots of great people and have many good memories from my time in Sydney.

Stick to your principles
For me, this point is especially relevant from an Open Science perspective. Perhaps, if I spent less time on doing research in a way that is acceptable for me, I’d have double the amount of publications. This could, of course, be extremely advantageous on the job market. On the flip side, there are also more and more researchers who value quality over quantity: a job application and CV with lots of shoddy publications may be valued by some professors, but may be immediately trashed by others who are more onboard with the open science movement.

The moral of this story is: One can’t make everyone happy, so it’s best to stick to one’s own principles, which also has the side effect that you’ll be valued by researchers who share your principles. 

A project always takes longer than one initially thinks
Writing a research proposal of any kind involves writing a timeline. In my experience, the actual project will always take much longer than anticipated, often due to circumstances beyond your control (e.g., recruitment takes longer than expected, collaborators take a long time to read drafts). For planning purposes, it’s good to add a couple of months to account for this. And if you notice that you can’t keep up with your timeline: that’s perfectly normal.

Have a backup plan
For a long time, I saw the prospect of leaving academia as the ultimate personal failure. This changed when I made the decision that my priority is to work within commutable distance of my husband, which, in the case of an academic couple, may very well involve one or both leaving academia at some stage. It helped to get a more concrete idea of what leaving academia would actually mean. It is ideal if there is a “real world” profession where one’s research experience would be an advantage. In my case, I decided to learn more about statistics and data science. In addition to opening job prospects that sound very interesting and involve a higher salary than the one I would get in academia, it gave me an opportunity to learn things that helped take my research to a different level.

Choosing a mentor
From observing colleagues, I have concluded that the PhD supervisor controls at least 90% of a student’s PhD experience. For prospective PhD students, my advice would be to be very careful in choosing a supervisor. One of the biggest warning signs (from observing colleagues’ experiences) is a supervisor who reacts negatively when a (female) PhD student or post-doc decides to start a family. If you get the possibility to talk to your future colleagues before starting a PhD, ask them about their family life, and how easy they find it to combine family with their PhD or post-doc work. If you’re stuck in a toxic lab, my advice would be: Get out as soon as you can. Graduate as soon as possible and get a post-doc in a better lab; start a new PhD in a better lab, even if it means losing a few years; leave academia altogether. I’ve seen friends and colleagues getting long-lasting physical and psychological health problems because of a toxic research environment: nothing is worth going through this.

Having a backup plan, as per the point above, could be particularly helpful in getting away from a toxic research environment. Probably one would be much less willing to put up with an abusive supervisor if one is confident that there are alternatives out there.

Choosing collaborators
Collaborators are very helpful when it comes to providing feedback about aspects that you may not have thought about. One should bear in mind, though, that they have projects of their own: chances are, they will not be as enthusiastic about your project as you are, and may not have time to contribute as much as you expect. This is good to take into account when planning a project: assuming that you will need to do most of the work yourself will reduce misunderstandings and stress due to the perception of collaborators not working hard enough on this project.

Be aware of the Imposter Syndrome
During my PhD, there were several compulsory administrative events that, at the time, I thought were a waste of time. Among other things, we were told about the imposter syndrome at one such event (also, we were given the advice to travel as much as possible, by a recently graduated PhD student). It was relatively recently that I discovered that many other early-career researchers have never heard of the imposter syndrome before, and often feel inadequate, guilty, and tired from their research. Putting a label on this syndrome may help researchers to become more aware that most people often feel like an impostor in academia, and take this feeling less seriously.

Thursday, March 14, 2019

Why I plan to move away from statistical learning research

Statistical learning is a hot topic, with papers about a link between statistical learning ability and reading and/or dyslexia mushrooming all over the place. In this blog post, I am very sceptical about statistical learning, but before I continue, I should make it clear that it is, in principle, an interesting topic, and there are a lot of studies which I like very much.

I’ve published two papers on statistical learning and reading/dyslexia. My main research interest is in cross-linguistic differences in skilled reading, reading acquisition, and dyslexia, which was also the topic of my PhD. The reason why, during my first post-doc, I became interested in the statistical learning literature, was, in retrospect, exactly the reason why I should have stayed away from it: It seemed relevant to everything I was doing.

From the perspective of cross-linguistic reading research, statistical learning seemed to be integral to understanding cross-linguistic differences. This is because the statistical distributions underlying the print-to-speech correspondences differ across orthographies: in orthographies such as English, children need to extract statistical regularities such as a being often pronounced as /ɔ/ when it succeeds a w (e.g., in “swan”). The degree to which these statistical regularities provide reliable cues differ across orthographies: for example, in Finnish, letter-phoneme correspondences are reliable, such that children don’t need to extract a large number of subtle regularities in order to be able to read accurately.

From a completely different perspective, I became interested in the role of letter bigram frequency during reading. One can count how often a given letter pair co-occurs in a given orthography. The question is whether the average (or summed) frequency of the bigrams within a word affects the speed with which this word is processed. This is relevant to psycholinguistic experiments from a methodological perspective: if letter bigram frequency affects reading efficiency, it’s a factor that needs to be controlled while selecting items for an experiment. Learning the frequency of letter combinations can be thought of as a sort of statistical learning task, because it involves the conditional probabilities of a letter given the other.

The relevance of statistical learning to everything should have been a warning sign, because, as we know from Karl Popper, something that explains everything actually explains nothing. This becomes clearer when we ask the first question that a researcher should ask: What is statistical learning? I don’t want to claim that there is no answer to this question, nor do I want to provide an extensive literature review of the studies that do provide a precise definition. Suffice it to say: Some papers have definitions of statistical learning that are extremely broad, which is the reason why it is often used as a hand-wavy term denoting a mechanism that explains everything. This is an example of a one-word explanation, a term coined by Gerd Gigerenzer in his paper “Surrogates for theories” (one of my favourite papers). Other papers provide more specific definitions, for example, by defining statistical learning based on a specific task that is supposed to measure it. However, I have found no consensus among these definitions: and given that different researchers have different definitions for the same terminology, the resulting theoretical and empirical work is (in my view) a huge mess.

In addition to these theoretical issues, there is also a big methodological mess when it comes to the literature on statistical learning and reading or dyslexia. I’ve written about this in more detail in our two papers (linked above), but here I will list the methodological issues in a more compact manner: First, when we’re looking at individual differences (for example, by correlating reading ability and statistical learning ability), the lack of a task with good psychometric properties becomes a huge problem. This issue has been discussed in a number of publications by Noam Siegelman and colleagues, who even developed a task with good psychometric properties for adults (e.g., here and here). However, as far as I’ve seen, there are still no published studies on reading ability or dyslexia using improved tasks. Furthermore, recent evidence suggests that a statistical learning task which works well with adults still has very poor psychometric properties when applied to children.

Second, the statistical learning and reading literature is a good illustration of all the issues that are associated with the replication crisis. Some of these are discussed in our systematic review about statistical learning and dyslexia (linked above). The publication bias in this area (selective publication of significant results) became even clearer to me when I presented our study on statistical learning and reading ability – where we obtained a null result – at the SSSR conference in Brighton (2018). There were several proponents of the statistical learning theory (if we can call it that) of reading and dyslexia, but none of them came to my poster to discuss this null result. Conversely, a number of people dropped by to let me know that they’ve conducted similar studies and also gotten null results.

Papers on statistical learning and reading/dyslexia continue to be published, and at some point, I was close to being convinced that maybe, visual statistical learning is related to learning to read in orthographies with a visually complex orthography. But then, some major methodological or statistical issue always jumps out at me when I read a paper closely enough. The literature reviews of these papers tend to be biased, often listing studies with null-results as evidence for the presence of an effect, or else picking out all the flaws of papers with null results, while treating the studies with positive results as a holy grail. I have stopped reading such papers, because it does not feel like a productive use of my time.

I have also stopped accepting invitations to review papers about statistical learning and reading/dyslexia, because I have started to doubt my ability to give an objective review. By now, I have a strong prior that there is no link between domain-general statistical learning ability and reading/dyslexia. I could be convinced otherwise, but would require very strong evidence (i.e., a number large-scale pre-registered studies from independent labs with psychometrically well-established tasks). While I strongly believe that such evidence is required, I realise that it is unreasonable to expect such studies from most researchers who conduct this type of research, who are mainly early-career researchers who base their methodology on previous studies.

I also stopped doing or planning any studies on domain-general statistical learning. The amount of energy necessary to refute bullshit is an order of magnitude bigger than to produce it, as Alberto Brandolini famously tweeted. This is not to say that everything to do with statistical learning and reading/dyslexia is bullshit, but – well, some of it definitely is. I hope that good research will continue to be done in this area, and that the state of the literature will become clearer because of this. In the meantime, I have made the personal decision to move away from this line of research. I have received good advice from one of my PhD supervisors: not to get hung up on research that I think is bad, but to pick an area where I think there is good work and to build on that. Sticking to this advice definitely makes the research process more fun (for me). Statistical learning studies are likely to yield null results, which end up uninterpretable because of the psychometric issues with statistical learning tasks. Trying to publish this kind of work is not a pleasant experience.

Why did I write this blog post? Partly, just to vent. I wrote it as a blog post and not as a theoretical paper, because it lacks the objectivity and a systematic approach which would be required for a scientifically sound piece of writing. If I were to write a scientifically sound paper, I would need to break my resolution to stop doing research on statistical learning, so a blog post it is. Some of the issues above have been discussed in our systematic review about statistical learning and dyslexia, but I also thought it would be good to summarise these arguments in a more concise form. Perhaps some beginning PhD student who is thinking about doing their project on statistical learning and reading will come across this post. In this case, my advice would be: pick a different topic. 

Sunday, March 10, 2019

What’s next for Registered Reports? Selective summary of a meeting (7.3.2019)

Last week, I attended a meeting about Registered Reports. It was a great opportunity, not only to discuss Registered Reports, but also to meet some people whom I had previously only known from twitter, over a glass of Whiskey close to London Bridge.

The meeting felt very productive, and I took away a lot of new information, about the Registered Report Format in general, and also some specific things that will be useful to me when I submit my next Registered Report. Here, I don’t aim to summarise everything that was discussed, but to focus on those aspects that could be of practical importance to individual researchers.

What’s stopping researchers from submitting Registered Reports?
We dedicated the entire morning to discussing how to increase the submission rate of Registered Reports. Before the meeting, I had done an informal survey among colleagues and on twitter to see what reasons people had for not submitting Registered Reports. The response rate was pretty low, suggesting that a lack of interest may be a leading factor (due either to apathy or scepticism – from my informal survey, I can’t tell). From people who did respond, the main reason was time: often, especially younger researchers are on short-term contracts (1-3 years), and are pressured for various reasons to start data collection as soon as possible. Among such reasons, people mentioned grants: funders often expect strict adherence to a timeline. And, unfortunately, such timing pressures disproportionately affect earlier career researchers, exactly the demographic which is most open to trying out a new way of conducting and publishing research.

Submitting a Registered Report may take a while – there is no point sugar-coating this. In contrast to standard studies, authors of Registered Reports need to spend more time to plan the study, because writing the report involves planning in detail; there may be several rounds of review before in-principle acceptance, and addressing reviewers’ comments may involve collecting pilot data. Given my limited experience, I would estimate that about 6-9 months would need to be added to the study timeline before one can count with in-principle acceptance and data collection can be started.

Of course, the increase in time that you spend before conducting the experiment will substantially improve the quality of the paper. A Registered Report is very likely to cut a lot of time at the end of the research cycle: when realising how long it may take to get in-principle acceptance, you should always bear in mind the painstakingly slow and frustrating process of submitting a paper to one journal after the other, accumulating piles of reviews varying in constructiveness and politeness, being told about methodological flaws that now you can’t fix, about how your results should have been different, and eventually unceremoniously throwing the study which you started with such great enthusiasm into the file-drawer.

Long-term benefits aside, unfortunately the issue of time remains for researchers on short-term contracts and with grant pressures. We could not think of any quick fix to this problem. In the long term, solutions may involve planning in this time when you write your next grant application. One possibility could be to write that you plan to conduct a systematic review during the time that you wait for in-principle acceptance. In my recently approved grant from the Deutsche Forschungsgemeinschaft, I proposed two studies: for the first study, I optimistically included a period of three months for “pre-registration and set-up”, and for the second study a period of twelve months (because this would happen in parallel to data collection for the first study). This somewhat backfired, because, while the grant was approved, they cut 6 months from my proposed timeline because they considered 12 months to be way too long for “pre-registration and set-up”. So, the strategy of planning for registered reports in grant applications may work, but bear in mind that it’s not risk-free.

A new thing that I learned about during the meeting are Registered Report Research Grants: Here, journals pair up with funding agencies, and reviews of the Registered Report happens in parallel to the review of the funding proposal. This way, once in-principle acceptance is in, the funding is released and data collection can start. This sounds like an amazingly efficient win-win-win solution, and I sincerely hope that funding agencies will routinely offer such grants.

How to encourage researchers to publish Registered Reports?
Here, I’ll list a few bits and pieces that were suggested as solutions. Some of these are aimed at the individual researcher, though many would require some top-down changes. The demographic most happy to try out a new publication system, as mentioned above, are likely to be early-career researchers, especially PhD students.

Members at the meeting reported positive experiences with department-driven working groups, such as the ReproducibiliTea initiative or Open Science Cafés. In some departments, such working groups have led to PhD students taking the initiative and proposing to their advisors that they would like to do their next study as a Registered Report. We discussed that encouraging PhD students to publish one study as a Registered Report could be a good recommendation. For departments which have formal requirements about the number of publications that are needed in order to graduate, a Registered Report could count more than a standard publication: let’s say, they either need to publish three standard papers, or one standard paper and a Registered Report (or two Registered Reports).

Deciding to publish a Registered Report is like jumping into cold water: the format requires some pretty big changes in the way that a study is conducted, and one is unsure if it will really impress practically important people (such as potential employers or grant reviewers) over pumping out standard publications. Taking a step back, taking a deep breath and thinking about the pros and cons, I would say that, in many cases, the advantages outweigh the disadvantages. Yes, the planning stage may take longer, but you will cut time at the end,  during the publication process, with a much higher success that the study will be published. A fun fact I learned during the meeting: At the journal Cortex, once a Registered Report gets past the editorial desk (i.e., the editors established that the paper fits the scope of the journal), the rejection rate is only 10% (which is why we need more journals adopting the Registered Report format: this way, any paper, including those of interest to a specialised audience, will be able to find a good home). And, once you have in-principle acceptance, you can list the paper in on your CV, which is (to many professors) much more impressive than a list of "in preparation"/"submitted" publications. If the Stage 1 review process takes unusually long and you're running out of time in your contract, you can withdraw the Registered Report, incorporate the comments to date, and conduct the experiment as a Preregistered Study. 

Some of the suggestions listed above are aimed at individual researchers. The meeting was encouraging and helpful in terms of getting some suggestions that could be applied here and now. It also made it clear that top-down changes are required: the Registered Report format involves a different timeline compared to standard submissions, so university expectations (e.g., in terms of the required number of publications for PhD students, short-term post-doc contracts) and funding structures need to be changed.