Xenia Schmalz's blog: August 2019

The year 2018 was very successful for me in terms of grants: My success rate skyrocketed from close to 0% to 100%. It’s a never-ending story, though, so now I’m finding myself writing even more grant proposals, which led me to procrastinate and write a blog post about grant proposal writing. Given my recent successes, I could frame this blog post as a set of advices for other aspiring grant writers. However, frankly, I have no idea why my success rate changed so abruptly. Also, I don’t really want to sound like this guy.

Nevertheless, I got a lot of advice from different people about grant writing over the years. Maybe it can be useful to other people. It will also allow me to organise my own thoughts about what I should consider while writing my proposals. So, here goes:

Advice # 1: Be lucky. Even if your proposal is amazing, the success rates tend to be low, and many factors aside from the grant quality will affect whether it is successful or not. You may want to repress this thought while writing the proposal. Otherwise the motivation to invest weeks and months into planning and getting excited about a project will plummet. However, as soon as I submit the proposal, I will try to assume an unsuccessful outcome. First, it will motivate me to think about back-up plans, and second, it will weaken the bitterness of the disappointment if the funding is not granted.

One aspect where luck plays a large role is that a lot depends on the reviewers. In most schemes that I have applied for, the reviewer may be the biggest expert in the field, but they may also be a researcher on a completely different topic in a vaguely related area. So a good grant proposal needs to be specific, to convince the biggest expert that you have excellent knowledge of the literature, that you have not missed any issues that could compromise the quality of your project, and that every single detail of your project is well-thought-through. At the same time, the proposal needs to be general, so a non-expert reviewer will be able to understand what exactly you are trying to do, and the importance of the project to your topic. Oh, and, on top of that, the proposal has to stay in the page limit.

Throughout the last years, I have received a lot of very useful advice about grant writing, and now that I’m trying to summarise it all, I realise how conflicting the advice sometimes is. I have asked many different people for advice, but most of them are regularly involved in evaluating grant proposals. This is one demonstration of how important luck is: Maybe you will get a grant reviewer who expects a short and sexy introduction which explains how your project will contribute to the bigger picture of some important, global social problem (e.g., cancer, global warming). Maybe you will get a reviewer who will get extremely annoyed at an introduction which overblows the significance of the project.

Advice #2: Think about your audience. When I search for possible reasons for my abrupt change in success rate, this is a possible candidate. The advice to think about one’s audience applies to everything, and it is widely known. However, for a beginning grant writer it is sometimes difficult to visualise the grant reviewer. Also, as I noted above, a reviewer may be the biggest expert in the field, or it could be someone who doesn’t know very much about it. Thus, in terms of the amount of detailed explanations that you put into the proposal, it is important to find the right balance: not to bore the reviewer with details, but provide enough details to be convincing. The prior probability of the reviewer being the biggest expert is rather low, if we consider that non-experts are much more common than people who have very specialised knowledge about your specific topic. Thus, when in doubt, try to explain things, and avoid acronyms, even if you think that it’s assumed knowledge for people in the field.

Reviewers are, in most cases, academics. This means that they are likely to be busy: make the proposal as easy-to-read as possible. Put in lots of colourful pictures: explaining as many things as possible in figures can also help to cut the word count.

This also means that they are likely to be elderly men. This realisation has brought up a very vivid image in my mind: if the proposal is ‘good’, the reviewer is should come home to his wife, and, while she passes him his daily glass of evening brandy, he will tell her (read this in a posh British accent, or translate in your head to untainted Hochdeutsch): “My dear, I learned the most interesting thing about dyslexia today…!”

Advice #3: Get as much feedback as possible. Feedback is always good: I try to incorporate everything anyone tells me, even if in some cases I don’t agree with it. Thoughts such as “Clearly, the person giving the feedback didn’t read the proposal thoroughly enough, otherwise they wouldn’t be confused about X!” are not very helpful: if someone giving you feedback stumbles over something, chances are that the reviewer will, too. Sometimes, the advice you get from two different people will conflict with each other. If at all possible, try to find a way to incorporate both points of view. Otherwise, use your best judgement.

Most universities have an office which helps with proposal writing: they are very helpful in giving advice from an administrative perspective. Different funding agencies have different requirements about the structure and the like (which is also why I’m trying to keep the advice I summarise here as general as possible). Grant offices are likely to give you good advice about the specific scheme you are applying for. They may also allow you to read through previous successful applications: this can be helpful in getting a better idea about how to structure the proposal, how to lay-out the administrative section, and some other issues that maybe you missed.

Colleagues can give feedback about the content: they will point out if something is more controversial than you thought, if there are problems with some approaches than you have not thought about, and provide any important references that you may have missed. Ask colleagues with different backgrounds and theoretical ‘convictions’. Friends and relatives can help to make sure that the proposal is readable to a non-expert reviewer, and that the story, as a whole, makes sense.

Conclusion

In some ways, submitting a grant proposal is a lot like buying a lottery ticket that costs a lot of time and your career probably depends on it. However, it is also the daily bread of someone striving for an academic career, so it is important to try to make the best of it. In an attempt to end this on a positive note (so I feel motivated to get back to my proposal): Applying for ‘your own’ project may give you the flexibility to work on something that you really care about. It takes a lot of time, but this time is also spent on thinking through a project, which will make its execution run more smoothly afterwards.

The advice above is not comprehensive, and from my own biased view. I would be very happy to read any corrections or any other advice from the readers in the comments section.

For Registered Reports, grant applications, ethics applications, and similar documents, researchers are expected to provide a power calculation. From my own experience, and from talking with colleagues in many different contexts, this is often a hurdle. Calculating power requires an effect size estimate. Sometimes, we try new things and have no idea what the size of the effect will be: even if we have some pilot data, we know that the observed effect size is variable when the sample size is small (Whitehead et al., 2016). We might have data from a previous study, but we also know that the presence of publication bias and questionable research practices leads to systematic over-estimation of the true effect size (Vasishth et al., 2018). The design of our study might be complex, and we don't really know which boxes to tick in G*Power. We might not even be sure what kind of effects we're looking for: if our study is more exploratory in nature, we will not know which statistical tests we will conduct, and calculating a formal power analysis would not make much sense, anyway (Nosek & Lakens, 2014). Still, we need to find some way to justify our sample size to the reviewers.

In justifying our sample size, an alternative to a power analysis is to plan for a certain degree of precision (e.g., Kelley et al., 2003). For estimating precision, we use our a priori expectation of the standard deviation to calculate a confidence interval that guarantees that, in the long run, our observed estimate is within an acceptable bound. Again, we have a freedom in deciding the width of the confidence interval (e.g., 80%, 90%, 95%), and we need to have an estimate of the standard deviation.

In the current blog post, I'd like to answer a question that is relevant to me at the moment: When we do a reading aloud study, a number of participants see a number of words, and are asked to read it aloud as accurately and quickly as possible. The variable which is analysed is often the Reaction Time (RT): the number of milliseconds between the appearance of the item and the onset of the vocal response. The items are generally chosen to vary in some linguistic characteristic, and subsequent statistical analyses would be conducted to see if the linguistic characteristics affect the RT.

In most cases, the data would be analysed using a Linear Mixed Effect model, where item- and participant-level characteristics can be included as predictor variables. More information about calculating power and required sample sizes for Linear Mixed Effect models can be found in Brysbaert and Stevens (2018) and Westfall et al. (2014); and a corresponding app can be found here. Here, I ask a different question: If we look at a single items, how many participants do we need to obtain stable estimates?

On the surface, the logic behind this question is very simple. For each item, we can calculate the average RT, across N participants. As N increases, the observed average should approach a hypothetical true value. If we want to see which item-level characteristics affect RTs, we should take care to have as precise an estimate as possible. If we have only a few participants responding to each item, the average observed RT is likely to vary extensively if we ask a couple of more participants to read aloud the same items.

As a complicating factor, the assumption that there is a true value for the average RTs is unreasonable. For example, familiarity with a given word will vary across participants: a psychology student is likely to respond faster to words that they encounter in their daily life, such as "depression", "diagnosis", "comorbidity", than someone who does not encounter these words on a regular basis (e.g., an economics student). Thus, the true RT is more likely to be a distribution rather than a single point.

Leaving this important caveat aside for a minute, we return to the basic principle that a larger number of observations should result in a more stable RT estimate. In a set of simulations, I decided to see what the trajectory of a given observed average RT is likely to look like, when we base it on the characteristics that we find, for various words, in the large-scale Lexicon projects. The English Lexicon Project (Balota et al., 2006) has responses for thousands of items, with up to 35 responses per item. In a first simulation, I focussed on the word "vanishes", which has 35 responses, and an average reading aloud RT of 743.4 ms (SD = 345.3), including only the correct responses. Based on the mean and SD, we can simulate the likely trajectories of the observed average RTs at different values of N. Using the item's mean and SD, we simulate a normal distribution, and draw a single value from it: We have an RT for N = 1. Then we draw the next value and calculate the average between this first and second values. We have an average RT for N = 2. We can repeat this procedure, while always plotting the observed average RT for each N. Here, I did this for 35 participants: this gives a single "walk", where the average RT approaches the RT which we specified as a parameter for our normal distribution. Then, we repeat the whole procedure, to simulate more "walks". The figure below shows 100 such "walks".

As expected, the initial average RTs tend to be all over the place: if we were to stop our simulated data collection at N = 5, we might be unlucky enough to get an estimate of 400 ms, or an estimate or 1200 ms. As the simulated data collection progresses, the variability between the "walks" diminishes, and at N = 30 we would expect the observed average RT to lie somewhere between 600 ms and 1,000 ms.

Analytically, the variability at different values of N can be quantified as confidence intervals: the proportion of times that we expect the average RT to exceed the interval, in the long run. The width of the confidence intervals depends (1) on the confidence level that we'd like to have (fixed here at 95%), (2) the population standard deviation (σ), and (3) the number of participants. Now, we don't really know what σ is, but we can get some kind of plausible range of σ-values, by looking at the data from the English Lexicon Project. I first removed all RTs < 250 ms, which are likely to be miscoded. Then I generated a box-plot of the SDs for all items:

The SDs are not normally distributed, with quite a lot of very large values. However, we can calculate a median, which happens to be SDmedian ≈ 200; a 20% quantile, SDlower ≈ 130; 80% quantile, SDupper ≈350, and a pessimistic estimate by taking the location of the upper bar in the boxplot above, SDpessimistic ≈ 600. For each of these SD estimates, we can calculate the 95% confidence interval for different values of N, with the formula: CIupper = 1.96*(σ/sqrt(N)); CIlower = CIupper * (-1). To calculate the expected range of average RTs, we would add these values to the average RTs. However, here we are more interested in the deviations from any hypothetical mean, therefore we can simply focus on the upper bound; the expected deviation is therefore CIupper * 2.

Next, I plotted CIupper as a function of N for the different SD estimates (low, median, high, and pessimistic):

So, if we have 50 participants, the expected range of deviation (CIupper * 2) is 72 ms for the low estimate, 110 ms for the median estimate, 194 ms for the upper estimate, and 332 ms for the pessimistic estimate. For 100 participants, the range reduces to 50 ms, 78 ms, 137 ms, and 235 ms, respectively.

What does all of this mean? Well, at the end of this blog post we are still left with the situation that the researcher needs to decide on an acceptable range of deviation. This is likely to be a trade-off between the precision one wants to achieve and practical considerations. However, the simulations and calculations should give a feeling of what number of observations is typically needed to achieve what level of precision, when we look at the average RTs of single items. The general take-home messages can be summarised as: (1) It could be fruitful to consider precision when planning psycholinguistic experiments, and (2) the more observations, the more stable the average RT estimate, i.e., the less likely it is to vary across samples.

--------------------

Link to the analyses and simulations: https://osf.io/mrnzj/

References

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., ... & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445-459.

Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1(1).

Kelley, K., Maxwell, S. E., & Rausch, J. R. (2003). Obtaining power or obtaining precision: Delineating methods of sample-size planning. Evaluation & the Health Professions, 26(3), 258-287.

Nosek, B. A., & Lakens, D. (2014). Registered Reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137-141.

Vasishth, S., Mertzen, D., Jäger, L. A., & Gelman, A. (2018). The statistical significance filter leads to overoptimistic expectations of replicability. Journal of Memory and Language, 103, 151-175.

Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143(5), 2020.

Whitehead, A. L., Julious, S. A., Cooper, C. L., & Campbell, M. J. (2016). Estimating the sample size for a pilot randomised trial to minimise the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Statistical Methods in Medical Research, 25(3), 1057-1073.

Xenia Schmalz's blog

Thursday, August 8, 2019

On grant proposal writing

Friday, August 2, 2019

Getting a precise RT estimate for single items in a reading aloud task

Blog Archive