Tuesday, September 17, 2019

How to make your full texts openly available

TL;DR: Please spend some time to make sure that the full texts to your articles are freely available, and remind your colleagues to do the same.

As it turns out, I'm a huge hypocrite. I regularly do talks about how to make your research workflow open. “Start with little steps”, I say. “The first thing you can do is to make your papers openly available by posting pre-prints”, I say. “It's easy!”, I say. “You do a good thing for the world, and people will read and cite your paper more. It's a win-win!”

I'm not at all an expert on open access publishing. I've been to several talks and workshops which provided an introduction to open access, and my take-home message is generally that it would be good to spend more time to really understand the legal and technical issues. So this blog post does not aim to give professional advice, but rather contains my notes about issues I came across in trying to make my own papers open-access.

There are multiple ways to find legal full-text versions of academic papers. Of course, there is also sci-hub, which – let's just say – when you average the legal and the moral aspects of it, is in a grey zone. In an ideal world, all of our research outputs would be available legally, and the good news is, that it's in the hand of the authors to make full texts available to anyone from anywhere. Self-archiving papers that have been published elsewhere is called green open access, and it's a good way to be open even if you are forced (by the incentive system) to published in closed journals.

Many people, even those who are not in the open science scene, use researchgate to upload full texts. I have created a researchgate account around the time I started publishing, and I have conscientiously uploaded every single article's full text right after I got the acceptance by a journal. Problem solved, I thought.

Then, I learned about the Open Access Button (openaccessbutton.org) and Unpaywall (unpaywall.org/). You can download both as add-ons to your browser, and when you've opened the link to a paper, you can click them to get the full texts. Below is a screen shot that shows what these buttons look like (circled in red); clicking them should get you right to the legal PDF: 

That is, if a legal, open-access version is available. In the screenshot above, the lock on the button of the unpaywall add-on is grey and locked, as opposed to green and open. If you click on the openaccessbutton in the top right corner, it takes you to a page saying that the full text is not available. This is despite the full text being available on researchgate.

Then, I decided to have a look at my record on google scholar. When one searches a paper, a link to any open access version appears next to the journal link. The screenshot below makes me look really bad: 

Though, to my defense, when we scroll down, it looks better:

Strangely, some of my full texts are linked via researchgate, and others are not, even though all full texts have been uploaded. The Collabra and Frontiers journals are open access by default: I did not need to do anything to make the full text freely accessible to everyone. The paper at the bottom is available through the OSF: I'd uploaded a pre-print at some stage when I'd given up trying to publish it.*

Still, when I go to the journal's link to my OSF pre-print paper, I cannot access the full text:

When you press the Open Access Button (the orange lock in the top right corner), you can request a full text from the authors. Alternatively, if it's your own paper, you can indicate this, and it will take you to a website where you can either link to a full text, or upload it. I tried uploading the full text to a couple of my papers. Open Access Button uploads the papers to Zenodo:

But, unfortunately, there seem to be some technical issues:

What seems to work, though, is the following:
  1. Uploading the paper as a pre-print on OSF, and
  2. Instead of uploading the pre-print through the Open Access Button, linking to the OSF pre-print.

An academic paper is our blood, sweat and tears. We want people to read it. We don't do our work only with the intention to hide it behind a paywall so that nobody can ever access it. I sometimes try to find full texts through my institution's library, and it often happens that I don't have access to papers. And I'm at a so-called “elite university” in Germany! Imagine how many people are blocked from having access to any publication if there are no open-access full texts available. And then ask yourself: What is the purpose of my work? Your work can certainly not achieve the impact that you hope for unless people can read about it.

So uploading pre-prints is definitely the right thing to do. After I realised my own short-comings, I am less impatient with authors when I come across a paywall when trying to read their papers. Making your work open access and findable is a bit more tricky than simply uploading the full text on researchgate. As a course of action, for each individual researcher, I would recommend the following:

  1. Check whether your publications have freely available full texts which are findable through google scholar, the open access button, and/or the unpaywall button. This is a good task for a Friday afternoon, when you've finished a task, but don't really have the time to start with something new. Or anytime, really. Making sure that people can read about your research is at least as important as conducting this research in the first place. It's part of our jobs.
  2. When you can't find a full text, email the corresponding author. The Open Access Button makes this easy. All you have to do is give a quick reason, and the author will receive the following email:

I believe you need an account to request an email to be sent to the authors on your behalf. Of course, you're also free (and strongly encouraged by me) to send an email yourself: the journal's link to the paper will have the corresponding author's email address. All you have to do is take the following template that I created: https://osf.io/fh73t/, fill in the blanks, and send it to the corresponding author.

* I like to use this paper as a success story about posting pre-prints: The manuscript had been rejected by numerous journals, so I thought it will never be published. As I didn't want the work that I'd put into it go completely to waste, I uploaded the pre-print on the OSF. A few weeks (or even days) later, the pre-print appeared on google scholar; a few days after that, I got emails from two colleagues with suggestions of journals where I could try to submit this paper. I tried one of these journals, and a few months later, the paper was officially published, after only one round of minor revisions.

Thursday, September 12, 2019

Bayes Factors 101: Justifying prior parameters in JASP

Update (27.12.2021): The content of this blog post has been extended and published in the following article:

Schmalz, X., Biurrun Manresa, J., & Zhang, L. (2021). What is a Bayes factor? Psychological Methods. Preprint: https://osf.io/5geqt/.


TL;DR: Do you need a justification for your prior parameters in JASP? Scroll down to find fill-in-the-blank sentences which you can use, and a table where you can pick a range of effect sizes which you expect and the corresponding prior parameters.

With many psychologists turning Bayes-curious, softwares are appearing that make it easy to calculate Bayes Factors. JASP (Love et al., 2015) has a similar layout to SPSS, and allows the user to perform Bayesian analyses which are equivalent to a series of popular frequentist tests. Here, I will describe the priors which are implemented in JASP for two frequently used tests: the t-test and Pearson's correlation. I will also explain what we do when we change them. The aim is to provide the basis for a better understanding of what priors mean, and how we can justify our choice of prior parameters.

Both frequentist and Bayesian statistics rely on a series of underlying assumptions and calculations, which are important to understand in order to interpret the value that the software spits out (i.e., a p-value or a Bayes Factor). Given that very few psychologists have been schooled in Bayesian statistics, the assumptions underlying the Bayes Factor are often not intuitive.

One important difference between Bayesian and frequentist data analyses is the use of a prior. The prior represents the beliefs or knowledge that we have about our effect of interest, before we consider the data which we aim to analyse. The prior is a distribution which can be specified by the experimenter. This distribution becomes updated, once we have data, to give a posterior distribution. For calculating a Bayes Factor, we have two priors: one that describes one hypothesis (e.g., a null hypothesis: no difference between groups, or no correlation between two variables), and one that describes a different hypothesis. JASP then computes the probability of the observed data under each of these hypotheses, and divides one by the other to obtain the Bayes Factor: the degree to which the data is compatible with one hypothesis over the other.

To some extent, then, the inference depends on the prior. The degree to which the prior matters depends on how much data one has: when there is a lot of data, it “overrides” the prior, and the Bayes Factor becomes very similar across a wide range of plausible priors. Choosing an appropriate prior becomes more important, though, when (1) we do not have a lot of data, (2) when we need to justify why we use a particular prior (e.g., for a Registered Report or grant proposal), or (3) when we would just like to get a better idea of how the Bayes Factor is calculated. The aim of the current blog post is to provide an introduction to the default parameters of JASP, and what it means when we change them around, while assuming very little knowledge of probability and statistics from the reader.

Let's start with t-tests. JASP has the option to do a Bayesian independent samples t-test. It also provides some toy data: here, I'm using the data set “Kitchen Rolls”. Perhaps we want to see if age differs as a function of sex (which makes no sense, theoretically, but we need one dichotomous and one continuous variable for the t-test). Below the fields where you specify the variables, you can adjust two prior parameters: (1) The hypothesis (two-tailed or directional), and (2) the prior (Cauchy prior width). Let's start with the Cauchy. The default parameter is set to 0.707. Contrary to what is often believed, this does not represent the size of the effect that we expect. To understand what it represents, we need to take a step back to explain what a Cauchy is.

A Cauchy is a probability distribution. (Wikipedia is a very good source for finding information about the properties of all kinds of distributions.) Probability distributions describe the probability of possible occurrences in an experiment. Each type of distribution takes a set of parameters, with which we can infer the exact shape of the distribution. The shape of our well-familiar normal distribution, for example, depends both on the mean and on the variance: if you look up the normal distribution on Wikipedia, you will indeed see in the box on the right that the two parameters for this distribution are μ and σ2. On the Wikipedia distribution pages, the top figure in the box shows how the shape of the distribution changes if we change around the parameters. Visually, the Cauchy distribution is similar to the normal distribution: it is also symmetrical and kind-of bell-shaped, but it has thicker tails. It also takes two parameters: the location parameter and a scale parameter. The location parameter determines where the mode of the distribution is. The scale parameter determines its width. The latter is what we're after: in the context of Cauchy priors, it is also often called the width parameter.

Back to JASP: when we change the Cauchy prior width, we don't change the mode of our distribution, but its width (i.e., the scale parameter): we are not saying that we are considering certain values to be more or less likely, but that we consider the range of likely effect sizes to be more or less narrow. The Cauchy, in JASP, is by default centred on zero, which gives us a bidirectional test. Overall, small effect sizes are considered to be more likely than large effect sizes (as shown by the general upside-down-U shape of the distribution). If we have a directional hypothesis, rather than shifting the location parameter, JASP allows us to pick which group we expect to have higher values (Group 1 > Group 2, or Group 1 < Group 2). This simply cuts the distribution in half. We can try this with our Kitchen Rolls data: If, under the section “Plots”, we tick “Prior and posterior”, we will see a figure, in addition to the Bayes Factor, which shows the prior for the alternative hypothesis, as well as the posterior (which we will ignore in the current blog post). The default settings show the following plot (note the symmetrical prior distribution):

When we anticipate that Group 1 will have higher values than Group 2, half of the prior distribution is cut:

And when we anticipate that Group 2 will have higher values than Group 1:

So, what do you do when you plan to use the Bayes Factor t-test for inference and the reviewer of the Registered Report asks you to justify your prior? What the Cauchy can tell us is how confident we are that the effect lies within a certain range. We might write something like:

The prior is described by a Cauchy distribution centred around zero and with a width parameter of x. This corresponds to a probability of P% that the effect size lies between -y and y. [Some literature to support that this is a reasonable expectation of the effect size.]”

So, how do you determine x, P, and y? For P, that's a matter of preference. For a registered report of mine, I chose 80%, but this is rather arbitrary. The y you pick in such a way that it describes what you believe about your effect size. If you think it cannot possibly be bigger than Cohen's d = 0.5, that could be your y. And once you've picked your y, you can calculate the x. This is the tricky part, though it can be done relatively easily in R. We want to find a the parameter x where we have an 80% probability of obtaining values between -y and y. To do this, we use the cumulative distribution function, which measures the area under the curve of a probability distribution (i.e., the cumulative probability of a range of values). The R function pcauchy takes the values of y, assuming a location parameter and a scale parameter, to get the probability that an observation randomly drawn from this distribution is greater than y. To get the probability that an observation randomly drawn from this distribution lies between y and -y, we type:

pcauchy(2,0,0.707) - pcauchy(-2,0,0.707)

This is for the default settings of JASP (location parameter = 0, scale parameter = 0.707). This gives us the following probability:

[1] 0.7836833

Thus, if we use the default JASP parameters, we could write (rounding the output up 0.78 to 80%):
The prior is described by a Cauchy distribution centred around zero and with a width parameter of 0.707. This corresponds to a probability of 80% that the effect size lies between -2 and 2. [Some literature to support that this is a reasonable expectation of the effect size.]”

An effect size of 2 is rather large for most psychology studies: we might be sure that we're looking for smaller effects than this. To check how we would need to change the scale parameter set to obtain an 80% probability (or any other value of P) to get the expected effect sizes, you can copy-and-paste the code above into R, change the effect size range (2 and -2) to your desired ys, and play around with the scale parameters until you get the output you like. Or, if you would like to stick with the 80% interval, you can pick the scale parameter for a set of effect size ranges from the table below (the percentage and the scale parameter are rounded):

Range of effect sizes (non-directional)
Range of effect sizes (directional)
Scale parameter required for 80% probability
-2 to 2
0 to 2 or -2 to 0
0.71 (default)
-1.5 to 1.5
0 to 1.5 or -1.5 to 0
-1.3 to 1.3
0 to 1.3 or -1.3 to 0
-1.1 to 1.1
0 to 1.1 or -1.1 to 0
-0.9 to 0.9
0 to 0.9 or -0.9 to 0
-0.7 to 0.7
0 to 0.7 or -0.7 to 0
-0.5 to 0.5
0 to 0.5 or -0.5 to 0
-0.3 to 0.3
0 to 0.3 or -0.3 to 0
The middle column shows what happens when we have a directional hypothesis. Basically, the probability of finding a range between 0 and y under the cut-in-half Cauchy is the same as the probability of finding a range between -y and y in the full Cauchy. I explain in a footnote1 why this is the case.

How does the choice of prior affect the results? In JASP, after you have collected your data, you can check this by ticking the “Bayes factor robustness check” box under “Plots”. Below is what this plot looks like for our age as a function of sex example. The grey dot marks the Bayes Factor value for the prior which we chose: here, I took the scale parameter of 0.1, corresponding to an 80% chance of effect sizes between -0.3 and 0.3.

After having played around with different parameters in R and doing the calculations above, E.J. Wagenmakers drew my attention to the fact that, when we choose the range width to be 50%, not 80%, the width parameter is equal to the range of values that we expect. So, if we are less confident about how big we expect the effect to be (and less keen to mess around with the different parameter values in R), we can simply write (below, I assume the default prior; if you have different expectations about the effect size, replace all mentions of the value “0.707” with your preferred effect size):

The prior is described by a Cauchy distribution centred around zero and with a width parameter of 0.707. This corresponds to a probability of 50% that the effect size lies between -0.707 and 0.707. [Some literature to support that this is a reasonable expectation of the effect size.]”

After having written most of the above, I also realised that I had not updated JASP for a while, and the newer version allows us to change the location parameter of the Cauchy, as well as its width. Thus, it is possible to change the mode of the distribution to the effect size that you consider the most likely. Then, you can calculate the new effect size range by taking the values from the table above, and adding the location parameter to the upper and lower bound, for example:
The prior is described by a Cauchy distribution centred around 0.707 and with a width parameter of 0.707. This corresponds to a probability of 50% that the effect size lies between 0 and 1.141. [Some literature to support that this is a reasonable expectation of the effect size.]”

You can find more information about location parameter shifting in Gronau, Q. F., Ly, A., & Wagenmakers, E.-J. (in press). Informed Bayesian t-tests. The American Statistician. https://arxiv.org/abs/1704.02479. For a step-by-step instruction, or in order to get hands-on experience with construction your own prior parameters, I also recommend going through this blogpost by Jeff Rouder: http://jeffrouder.blogspot.com/2016/01/what-priors-should-i-use-part-i.html.

Now, let's move on to correlations. Again, our goal is to make the statement:

The prior is described by a beta-distribution centred around zero and with a width parameter of x. This corresponds to a probability of P% that the correlation coefficient lies between -y and y. [Some literature to support that this is a reasonable expectation of the effect size.]”

When you generate a Bayesian correlation matrix in JASP, it gives you two things: The Pearson's correlation coefficient (r) that we all know and love, and the Bayes Factor, which quantifies the degree to which the observed r is compatible with the presence of a correlation over the absence of a correlation. The prior for the alternative hypothesis is now described by a beta-distribution, not by a Cauchy. More details about the beta-distribution can be found in the footnote2. For the less maths-inclined people, suffice it to say that the statistical parameters of the distribution do not directly translate into the parameters that you input in JASP, but never fear: the table and text below explain how you can easily jump from one to the other, if you want to play around with the different parameters yourself.

The default parameter for the correlation alternative prior is 1. This corresponds to a flat line, and is identical to a so-called uniform distribution. Beware that describing this distribution as “All possible values of r are equally likely” will trigger anything from a long lecture to a condescending snort from maths nerds: as we're dealing with a continuous distribution, a single value does not have a probability associated with it. The mathematically correct way to put it is: “If we take any two intervals (A and B) of the same length from the continuous uniform distribution, the probability of the observation falling interval A will equal to the probability of the observation falling into interval B.Basically, if you have no idea what the correlation coefficient will be like, you can keep the prior as it is. As with the t-test, you can test directional hypotheses (r > 0 or r < 0).

Changing the parameter will either make the prior convex (U-shaped) or concave (upside-down-U-shaped). In the former case, you consider values closer to -1 and 1 (i.e., very strong correlations) to be more likely. Perhaps this could be useful, for example, if you want to show that a test has a very high test-retest correlation. In the latter case, you consider smaller correlation coefficients to be more likely. This is probably closer to the type of data that, as a psychologist, you'll be dealing with.

So, without further ado, here is the table from which you can pick the prior (first column), based on the effect size range (possible correlation coefficients) that you expect with 80% certainty:

JASP parameter (A)
Range of effect sizes (r)
Statistical parameters (a, b)
Statistical inputs (R)
-0.8 to 0.8
1, 1
0.1 to 0.9
-0.5 to 0.5
3, 3
0.25 to 0.75
-0.25 to 0.25
7, 7
0.375 to 0.625
Update (18.11.2020): After a comment by Katrin pointed out that the numbers in the table above don't add up, I had a look and indeed found some discrepancies, was not able to follow exactly how I got them, and it also took me a while to understand the text I'd written below. I changed the numbers in the table above, and rewrote the section below to make this more clear. 
The R code (if you want to play around with the parameters and ranges), for the first row, is:
pbeta(0.925,1,1) - pbeta(0.125,1,1)
If you have a range of correlations, centered on zero, that you are X% confident about, you can do the following: 
Step 1: Convert the effect sizes from your desired range (rl, ru) to the statistical inputs in the fourth row (Rl, Ru) with the formula: R = 0.5 + 0.5*r.
Step 2: Insert the values (Rl, Ru) into the code above, for the first parameter in the two pbeta-commands. The output should be a number between 0 and 1: This is your confidence level.
Step 3: The second and third parameter in the pbeta-command (a, b) should be identical (i.e., a = b): these are the parameters of the beta-distribution, which need to be the same to ensure that the distribution is symmetrical (as written above, JASP, at least in its current implementation, only contains symmetric distributions, and hence there is only one parameter that you can input to determine prior width). The statistical parameters for the R-function (a, b; third column) are not the same as the JASP parameter (A; 1st column), but can be obtained by the simple formula: A = 1/a. There must be a more elegant way to do this, but to get to the desired confidence level, you can change around the beta-parameter until you get to the confidence level you desire.
As with the t-test, when you chop the beta-distribution in half (i.e., when you test a directional hypothesis), the upper bound (y) should be identical to the upper bound in the non-directional prior.

The blogpost aims to provide a psychologist reader with a sufficiently deep understanding to justify the choice of prior, e.g., for a Registered Report. If you've worked your way through the text above (in which case: thank you for bearing with me!), you should now be able to choose a prior parameter in JASP in such as way that it translates directly to the expectations you have about possible effect size ranges.

To end the blogpost on a more general note, here are some random thoughts. The layout of JASP is based on SPSS, but unlike SPSS, JASP is open source and based on the programming language R. JASP aims to provide an easy way for researchers to switch from frequentist testing in SPSS to doing Bayesian analyses. Moving away from SPSS is always a good idea. However, due to the similar, easy-to-use layout, JASP inherits one of the problems of SPSS: it's possible to do analyses without actually understanding what the output means. Erik-Jan Wagenmakers (?) once wrote on Twitter (?) that JASP aims to provide “training wheels” for researchers moving away from frequentist statistics and SPSS, who will eventually move to more sophisticated analysis tools such as R. I hope that this blogpost will contribute a modest step to this goal, by giving a more thorough understanding of possible prior parameters in the context of Bayes Factor hypothesis testing.

I thank E.J. Wagenmakers for his comments on an earlier version of this blog post. Any remaining errors are my own.
Edit (17.9.2019): I changed the title of the blogpost, to mirror one that I wrote a few months ago: "P-values 101: An attempt at an intuitive but mathematically correct explanation".
1 If we simply cut the Cauchy distribution in half, we no longer have a probability distribution: a probability distribution, by definition, needs to integrate to 1 across the range of all possible values. If we think about a discrete distribution (e.g., the outcome of a die toss), it's intuitive that the sum of all possible outcomes should be 1: that's how we can infer that the probability of throwing a 6 is 1/6, given a cubical die (because we also know that we have 6 possible, equiprobable outcomes). For continuous distributions, we have an infinite range of values, so we can't really sum them. Integrating is therefore the continuous-distribution-equivalent to summing. Anyhow: If we remove half of our Cauchy distribution, we end up with a distribution which integrates out to 0.5 (across the range from 0 to infinity). To change this back to a probability distribution, we need to multiply the function by a constant, in this case, by 2. If you look at the plots for the full Cauchy prior versus the directional priors, you will notice that, at x = 0, y ≈ 0.5 for the full Cauchy, and y ≈ 1 for the two truncated Cauchys. For calculating the probability of a certain range, this means that we need to multiply it by two. Which is easy for our case: We start off with a given range (-y to y) our full Cauchy and cut off half (so we get the range 0 to y), so we lose half of the area and need to divide the probability of getting values in this range by half. Then we multiply our function by 2, because we need to turn it back to a probability distribution: we also multiply the area between 0 and y by two, which gives us the same proportion that we started with in the first place.

2 A beta-distribution looks very different to a Cauchy or normal, and takes different parameters (on wikipedia, denoted α and β). When both of the parameters are equal (α = β), the distribution is symmetrical. In JASP, you can only adjust one number: the same value is then used for both parameters, so the prior distribution is always symmetrical. The number which you can adjust in the JASP box (let's call it A) does not equal to the true parameter that defines the beta-function (let's call it a), as it has an inverse relationship to it (A = 1/a). The other difference between the actual beta-distribution and the JASP prior is that the beta-distribution is defined for values between 0 and 1: the JASP prior is stretched between values of -1 and 1. Thus, when using the function to calculate the probabilities of different ranges of r under different parameters, we need to transform r to to a value between 0 and 1 before we can make a statement about the size of the correlations. I hope to make this clearer when I present the table with parameters and effect size ranges.

Thursday, August 8, 2019

On grant proposal writing

The year 2018 was very successful for me in terms of grants: My success rate skyrocketed from close to 0% to 100%. It’s a never-ending story, though, so now I’m finding myself writing even more grant proposals, which led me to procrastinate and write a blog post about grant proposal writing. Given my recent successes, I could frame this blog post as a set of advices for other aspiring grant writers. However, frankly, I have no idea why my success rate changed so abruptly. Also, I don’t really want to sound like this guy. 

Nevertheless, I got a lot of advice from different people about grant writing over the years. Maybe it can be useful to other people. It will also allow me to organise my own thoughts about what I should consider while writing my proposals. So, here goes:

Advice # 1: Be lucky. Even if your proposal is amazing, the success rates tend to be low, and many factors aside from the grant quality will affect whether it is successful or not. You may want to repress this thought while writing the proposal. Otherwise the motivation to invest weeks and months into planning and getting excited about a project will plummet. However, as soon as I submit the proposal, I will try to assume an unsuccessful outcome. First, it will motivate me to think about back-up plans, and second, it will weaken the bitterness of the disappointment if the funding is not granted.

One aspect where luck plays a large role is that a lot depends on the reviewers. In most schemes that I have applied for, the reviewer may be the biggest expert in the field, but they may also be a researcher on a completely different topic in a vaguely related area. So a good grant proposal needs to be specific, to convince the biggest expert that you have excellent knowledge of the literature, that you have not missed any issues that could compromise the quality of your project, and that every single detail of your project is well-thought-through. At the same time, the proposal needs to be general, so a non-expert reviewer will be able to understand what exactly you are trying to do, and the importance of the project to your topic. Oh, and, on top of that, the proposal has to stay in the page limit.

Throughout the last years, I have received a lot of very useful advice about grant writing, and now that I’m trying to summarise it all, I realise how conflicting the advice sometimes is. I have asked many different people for advice, but most of them are regularly involved in evaluating grant proposals. This is one demonstration of how important luck is: Maybe you will get a grant reviewer who expects a short and sexy introduction which explains how your project will contribute to the bigger picture of some important, global social problem (e.g., cancer, global warming). Maybe you will get a reviewer who will get extremely annoyed at an introduction which overblows the significance of the project.

Advice #2: Think about your audience. When I search for possible reasons for my abrupt change in success rate, this is a possible candidate. The advice to think about one’s audience applies to everything, and it is widely known. However, for a beginning grant writer it is sometimes difficult to visualise the grant reviewer. Also, as I noted above, a reviewer may be the biggest expert in the field, or it could be someone who doesn’t know very much about it. Thus, in terms of the amount of detailed explanations that you put into the proposal, it is important to find the right balance: not to bore the reviewer with details, but provide enough details to be convincing. The prior probability of the reviewer being the biggest expert is rather low, if we consider that non-experts are much more common than people who have very specialised knowledge about your specific topic. Thus, when in doubt, try to explain things, and avoid acronyms, even if you think that it’s assumed knowledge for people in the field.

Reviewers are, in most cases, academics. This means that they are likely to be busy: make the proposal as easy-to-read as possible. Put in lots of colourful pictures: explaining as many things as possible in figures can also help to cut the word count.

This also means that they are likely to be elderly men. This realisation has brought up a very vivid image in my mind: if the proposal is ‘good’, the reviewer is should come home to his wife, and, while she passes him his daily glass of evening brandy, he will tell her (read this in a posh British accent, or translate in your head to untainted Hochdeutsch): “My dear, I learned the most interesting thing about dyslexia today…!”

Advice #3: Get as much feedback as possible. Feedback is always good: I try to incorporate everything anyone tells me, even if in some cases I don’t agree with it. Thoughts such as “Clearly, the person giving the feedback didn’t read the proposal thoroughly enough, otherwise they wouldn’t be confused about X!” are not very helpful: if someone giving you feedback stumbles over something, chances are that the reviewer will, too. Sometimes, the advice you get from two different people will conflict with each other. If at all possible, try to find a way to incorporate both points of view. Otherwise, use your best judgement.

Most universities have an office which helps with proposal writing: they are very helpful in giving advice from an administrative perspective. Different funding agencies have different requirements about the structure and the like (which is also why I’m trying to keep the advice I summarise here as general as possible). Grant offices are likely to give you good advice about the specific scheme you are applying for. They may also allow you to read through previous successful applications: this can be helpful in getting a better idea about how to structure the proposal, how to lay-out the administrative section, and some other issues that maybe you missed.  

Colleagues can give feedback about the content: they will point out if something is more controversial than you thought, if there are problems with some approaches than you have not thought about, and provide any important references that you may have missed. Ask colleagues with different backgrounds and theoretical ‘convictions’. Friends and relatives can help to make sure that the proposal is readable to a non-expert reviewer, and that the story, as a whole, makes sense.

In some ways, submitting a grant proposal is a lot like buying a lottery ticket that costs a lot of time and your career probably depends on it. However, it is also the daily bread of someone striving for an academic career, so it is important to try to make the best of it. In an attempt to end this on a positive note (so I feel motivated to get back to my proposal): Applying for ‘your own’ project may give you the flexibility to work on something that you really care about. It takes a lot of time, but this time is also spent on thinking through a project, which will make its execution run more smoothly afterwards.

The advice above is not comprehensive, and from my own biased view. I would be very happy to read any corrections or any other advice from the readers in the comments section.

Friday, August 2, 2019

Getting a precise RT estimate for single items in a reading aloud task

For Registered Reports, grant applications, ethics applications, and similar documents, researchers are expected to provide a power calculation. From my own experience, and from talking with colleagues in many different contexts, this is often a hurdle. Calculating power requires an effect size estimate. Sometimes, we try new things and have no idea what the size of the effect will be: even if we have some pilot data, we know that the observed effect size is variable when the sample size is small (Whitehead et al., 2016). We might have data from a previous study, but we also know that the presence of publication bias and questionable research practices leads to systematic over-estimation of the true effect size (Vasishth et al., 2018). The design of our study might be complex, and we don't really know which boxes to tick in G*Power. We might not even be sure what kind of effects we're looking for: if our study is more exploratory in nature, we will not know which statistical tests we will conduct, and calculating a formal power analysis would not make much sense, anyway (Nosek & Lakens, 2014). Still, we need to find some way to justify our sample size to the reviewers.

In justifying our sample size, an alternative to a power analysis is to plan for a certain degree of precision (e.g., Kelley et al., 2003). For estimating precision, we use our a priori expectation of the standard deviation to calculate a confidence interval that guarantees that, in the long run, our observed estimate is within an acceptable bound. Again, we have a freedom in deciding the width of the confidence interval (e.g., 80%, 90%, 95%), and we need to have an estimate of the standard deviation.

In the current blog post, I'd like to answer a question that is relevant to me at the moment: When we do a reading aloud study, a number of participants see a number of words, and are asked to read it aloud as accurately and quickly as possible. The variable which is analysed is often the Reaction Time (RT): the number of milliseconds between the appearance of the item and the onset of the vocal response. The items are generally chosen to vary in some linguistic characteristic, and subsequent statistical analyses would be conducted to see if the linguistic characteristics affect the RT.

In most cases, the data would be analysed using a Linear Mixed Effect model, where item- and participant-level characteristics can be included as predictor variables. More information about calculating power and required sample sizes for Linear Mixed Effect models can be found in Brysbaert and Stevens (2018) and Westfall et al. (2014); and a corresponding app can be found here. Here, I ask a different question: If we look at a single items, how many participants do we need to obtain stable estimates?

On the surface, the logic behind this question is very simple. For each item, we can calculate the average RT, across N participants. As N increases, the observed average should approach a hypothetical true value. If we want to see which item-level characteristics affect RTs, we should take care to have as precise an estimate as possible. If we have only a few participants responding to each item, the average observed RT is likely to vary extensively if we ask a couple of more participants to read aloud the same items.

As a complicating factor, the assumption that there is a true value for the average RTs is unreasonable. For example, familiarity with a given word will vary across participants: a psychology student is likely to respond faster to words that they encounter in their daily life, such as "depression", "diagnosis", "comorbidity", than someone who does not encounter these words on a regular basis (e.g., an economics student). Thus, the true RT is more likely to be a distribution rather than a single point.

Leaving this important caveat aside for a minute, we return to the basic principle that a larger number of observations should result in a more stable RT estimate. In a set of simulations, I decided to see what the trajectory of a given observed average RT is likely to look like, when we base it on the characteristics that we find, for various words, in the large-scale Lexicon projects. The English Lexicon Project (Balota et al., 2006) has responses for thousands of items, with up to 35 responses per item. In a first simulation, I focussed on the word "vanishes", which has 35 responses, and an average reading aloud RT of 743.4 ms (SD = 345.3), including only the correct responses. Based on the mean and SD, we can simulate the likely trajectories of the observed average RTs at different values of N. Using the item's mean and SD, we simulate a normal distribution, and draw a single value from it: We have an RT for N = 1. Then we draw the next value and calculate the average between this first and second values. We have an average RT for N = 2. We can repeat this procedure, while always plotting the observed average RT for each N. Here, I did this for 35 participants: this gives a single "walk", where the average RT approaches the RT which we specified as a parameter for our normal distribution. Then, we repeat the whole procedure, to simulate more "walks". The figure below shows 100 such "walks".

As expected, the initial average RTs tend to be all over the place: if we were to stop our simulated data collection at N = 5, we might be unlucky enough to get an estimate of 400 ms, or an estimate or 1200 ms. As the simulated data collection progresses, the variability between the "walks" diminishes, and at N = 30 we would expect the observed average RT to lie somewhere between 600 ms and 1,000 ms.

Analytically, the variability at different values of N can be quantified as confidence intervals: the proportion of times that we expect the average RT to exceed the interval, in the long run. The width of the confidence intervals depends (1) on the confidence level that we'd like to have (fixed here at 95%), (2) the population standard deviation (σ), and (3) the number of participants. Now, we don't really know what σ is, but we can get some kind of plausible range of σ-values, by looking at the data from the English Lexicon Project. I first removed all RTs < 250 ms, which are likely to be miscoded. Then I generated a box-plot of the SDs for all items:

The SDs are not normally distributed, with quite a lot of very large values. However, we can calculate a median, which happens to be SDmedian ≈ 200; a 20% quantile, SDlower ≈ 130; 80% quantile, SDupper ≈350, and a pessimistic estimate by taking the location of the upper bar in the boxplot above, SDpessimistic ≈ 600. For each of these SD estimates, we can calculate the 95% confidence interval for different values of N, with the formula: CIupper = 1.96*(σ/sqrt(N)); CIlower = CIupper * (-1). To calculate the expected range of average RTs, we would add these values to the average RTs. However, here we are more interested in the deviations from any hypothetical mean, therefore we can simply focus on the upper bound; the expected deviation is therefore CIupper * 2. 

Next, I plotted CIupper as a function of N for the different SD estimates (low, median, high, and pessimistic):

So, if we have 50 participants, the expected range of deviation (CIupper * 2) is 72 ms for the low estimate, 110 ms for the median estimate, 194 ms for the upper estimate, and 332 ms for the pessimistic estimate. For 100 participants, the range reduces to 50 ms, 78 ms, 137 ms, and 235 ms, respectively.

What does all of this mean? Well, at the end of this blog post we are still left with the situation that the researcher needs to decide on an acceptable range of deviation. This is likely to be a trade-off between the precision one wants to achieve and practical considerations. However, the simulations and calculations should give a feeling of what number of observations is typically needed to achieve what level of precision, when we look at the average RTs of single items. The general take-home messages can be summarised as: (1) It could be fruitful to consider precision when planning psycholinguistic experiments, and (2) the more observations, the more stable the average RT estimate, i.e., the less likely it is to vary across samples.


Link to the analyses and simulations: https://osf.io/mrnzj/


Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., ... & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445-459.

Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1(1).
Kelley, K., Maxwell, S. E., & Rausch, J. R. (2003). Obtaining power or obtaining precision: Delineating methods of sample-size planning. Evaluation & the Health Professions, 26(3), 258-287.

Nosek, B. A., & Lakens, D. (2014). Registered Reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137-141.

Vasishth, S., Mertzen, D., Jäger, L. A., & Gelman, A. (2018). The statistical significance filter leads to overoptimistic expectations of replicability. Journal of Memory and Language, 103, 151-175.

Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143(5), 2020.
Whitehead, A. L., Julious, S. A., Cooper, C. L., & Campbell, M. J. (2016). Estimating the sample size for a pilot randomised trial to minimise the overall trial sample size for the external pilot and main trial for a continuous outcome variable. Statistical Methods in Medical Research, 25(3), 1057-1073.

Friday, May 24, 2019

The perfect article

Last year, I went to an R Ladies event. This event took place at the Süddeutsche Zeitung, one of the biggest and most serious newspapers in Germany. The workshop was presented by two R Ladies from the data-driven journalism department of the newspaper. The event was extremely interesting: as it turns out, the job of a data-driven journalist is to collect or find data, and present it to the readers in an understandable way. One project which was presented included an analysis of the transcripts from the Bundestag meetings, presented in easy-to-digest graphs. Another project contained new data on the very socially relevant question of housing prices in Germany.

Throughout the event, I kept thinking: They are much further in terms of open communication than we are. As an essential part of their job, data-driven journalists need to present often complex data in a way that any interested reader can interpret it. At the same time, the R Ladies at the event kept emphasising that the data and R/RMarkdown scripts were publicly available, for anyone who doubted their conclusions or wanted to try out things for themselves.

This brings me to the idea of what the perfect article would look like. I guess you know where this is going, but before I go there, to avoid disappointment, I will add that, in this blog post, I will not give any advice on how to actually write such a perfect article, nor how to achieve a research world where such articles will be the norm. I will just provide a dreamer’s description of a utopian world, and finish off with some questions that I have no answer for.

The perfect article would have a pyramidal structure. At the top layer would be a description of the study, written at a level that a high school student should understand it. The data could be presented in an interactive shiny app, and there would be easy-to-read explanations of the research question, its importance, how the data should be interpreted to answer this research question, and any limitations that may affect the interpretation of the data.

Undergraduate students in the field of study (or very interested readers) would be navigated to a more detailed description of the study, which describes the research methods in more detail. Here, the statistical analyses and the theoretical relevance would need to be explained, and a more thorough description of methodological limitations should be provided.

The next level would be aimed at researchers in the field of study. Here, the study would need to be placed in relation to previous work on this topic, and a very thorough discussion of the theoretical implications would be needed.

The final level would include all the data, all the materials, and all the analysis script. This level would be aimed at researchers who plan to build on this work. It will allow them to double check that the results are robust and that there are no mistakes in the data analysis. They would also be able to get the materials, allowing them to build as closely as possible on previous work.

Even in an ideal world, this format would not be suitable for all fields. For example, in theoretical mathematics, it would probably be very difficult to come up with a project that could be explained to a lay audience through a shiny app. More applied mathematics could, however, be presented as the deeper layers of a project where these methods are applied.

Many practical concerns jump out of my perfect article proposal. Most obviously, an article of this form would be unsuitable for a paper format. It would, however, be relatively straight-forward to implement in online journals. This, however, would require expertise that not all academic authors have. (In fact, I would guess: an expertise that most academic authors don’t have.) Even for those that do have the skills, it would require much more time, and as we all know, time is something that we don’t have, because we need to publish in large quantities if we want to have a job. Another issue with this format is: many studies are incremental, and they would not be at all interesting to a general audience. So why spend time on creating the upper layers of the pyramid?

A solution to the last issue would be to completely re-think the role that papers have in the academic process. Instead of publishing papers, the mentality could switch to publishing projects. Often, a researcher or lab is concerned with a broader research question. Perhaps what would be, in our current system, ten separate publications could be combined to make a more general point about such a broad research question, which would be of interest to a general public. Such a switch in mind set would also give researchers a greater sense of purpose, as they would need to keep this broad research question in the back of their minds while they conduct separate studies.

Another question would fall out of this proposal to publish projects rather than individual studies: What would happen with authorship? If five different PhD students conducted the individual studies, some of them would need to give up their first authorship if their work is combined into a single project. Here, the solution would be to move away from an authorship model, and instead list each researcher’s contribution along with the project’s content. And, as part of the team, one could also find a programmer (or data-driven journalist), who would be able to contribute to the technical side of presenting the content, and to making sure that the upper layers of the presentation are really understandable to the intended audience.

The problem would remain that PhD students would go without first authorship. But, in an ideal world, this would not matter, because their contributions to the project would be clearly acknowledged, and potential employers could actually judge them based on the quality, not the quantity of their work. In an ideal world…

Thursday, May 16, 2019

Why I stopped signing my reviews

Since the beginning of this year, I stopped signing my peer reviews. I had systematically signed my reviews for a few years: I think I started this at the beginning of my first post-doc, back in 2015. My reasons for signing were the following: (1) Science should be about an open exchange of ideas. I have previously benefitted from signed reviews, because I could contact the reviewer with follow-up questions, which has resulted in very fruitful discussion. (2) Something ideological about open science (I don’t remember the details). (3) As an early career researcher, one is still very unknown. Signing reviews might help colleagues to associate your name with your work. As for the draw-backs, there is the often-cited concern that authors may want to take revenge if they receive a negative review, and even in the absence of any bad intentions, they may develop implicit biases against you. I weighed this disadvantage against the advantages listed above, and I decided that it’s worth the risk.

So then, why did I stop? There was a specific review that made me change my mind, because I realised that by signing reviews, one might get into all kinds of unanticipated awkward situations. I will recount this particular experience, of course, removing all details to protect the authors’ identity (which, by the way, I don’t know, but perhaps others might be able to guess with sufficient detail).

A few months ago, I was asked to review a paper about an effect, which I had not found in one of my previous studies. This study reported a significant effect. I could not find anything wrong with the methods or analyses, but the introduction was rather biased, in the sense that it cited only studies that did show this effect, and did not cite my study. I asked the authors to cite my study. I also asked them to provide a scatterplot of their data.

The next version of this manuscript that I received included the scatterplot, as I’d asked, and a citation of my study. Except, my study was cited in the following context (of course, fully paraphrased): “The effect was found in a previous study (citation). Schmalz et al. did not find the effect, but their study sucks.” At the same time, I noticed something very strange about the scatterplot. After asking several stats-savvy colleagues to verify that this strange thing was, indeed, very strange, I wrote in my review that I don’t believe the results, because the authors must have made a coding error during data processing.

I really did not like sending this review, because I was afraid that it would look (both to the editor and to the authors) like I had picked out a reason to dismiss the study because they had criticised my paper. However, I had signed my previous review, and whether or not I would sign during this round, it would be clear to the authors that it was me.

In general, I still think that signing reviews has a lot of advantages. Whether the disadvantages outweigh the benefits depends on each reviewer’s preference. For myself, the additional drawback that there may be unexpected awkward situations that one really doesn’t want to get into as an early career researcher tipped the balance, but it’s still a close call.

Thursday, April 4, 2019

On being happy in academia

tl;dr: Don’t take your research too seriously.

I like reading blog posts with advice about how to survive a PhD, things one wished one had known before one started a PhD, and other similar topics. Here goes my own attempt at writing such a blog post. I’m not a PhD student anymore, so I can’t talk about my current PhD experiences, nor am I a professor who can look back and list all of the personal mistakes and successes that have led to “making it” in academia. It has been a bit over 4 years since I finished my PhD and started working as a post-doc, and comparing myself now and then I realise that I’m happier working in academia now. This is not to say that I was ever unhappy during my time in academia, but some changes in attitude have lead to – let’s say – a healthier relationship to my research. This is what I would like to write this blog post about.

Don’t let your research define you
In the end, all of the points below can be summarised as: Don’t take your research too seriously. Research inevitably involves successes and failures; everybody produces some good research and some bad research, and it’s not always easy for the researcher to decide which it is at the time. So there will always be criticism, some of it justified, some of it reflecting the bad luck of meeting Reviewer 2 on a bad day.

Receiving criticism has become infinitely easier for me over the years: after getting an article rejected, it used to take at least one evening of moping and a bottle of wine to recover, while now I only shrug. It’s difficult to identify exactly why my reaction to rejection changed over time, but I think it has something to do with seeing my research less as an integral part of my identity. I sometimes produce bad research, but this doesn’t make me a bad person. This way, even if a reviewer rightfully tears my paper to shreds, my ego remains intact.

Picking a research topic
Following up from the very abstract point above, I’ll try to isolate some more concrete suggestions that, in my case, may or may not have contributed to my changed mindset. The first one is about picking a research topic. At the beginning of my PhD, I wanted to pick a topic that is of personal relevance, such as bilingualism or reading in different orthographies. Then, becoming more and more cynical about the research literature, I started following up on topics where I’d read a paper and think: “That’s gotta be bullshit!”

Now, I’ve moved away from both approaches. On the one hand, picking a topic that one is too passionate about can, in my view, lead to a personal involvement which can (a) negatively impact one’s ability to view the research from an objective perspective, and (b) become an unhealthy obsession. To take a hypothetical example: if I had followed up on my interest in bilingualism, it is – just theoretically – possible that I would consistently find that being bilingual comes with some cognitive disadvantages. As someone who strongly believes in the benefit of a multilingual society, it would be difficult for me to objectively interpret and report my findings.

On the other hand, focussing on bad research can result in existential crises, anger at poor researchers, a permanently bad mood, and from a practical perspective, annoying some people with high statuses while having a relatively small impact on improving the state of the literature.

My conclusion has been that it’s good to choose topics that I find interesting, where there is good ground work, and where I know that, no matter what the outcome of my research, I will be comfortable to report it.

Working 9-to-5
My shift in mindset coincides with having met my husband (during my first post-doc in Italy). As a result, I started spending less time working outside of office hours. Coming home at a reasonable time, trying out some new hobbies (cross-country skiing, hiking, cycling), and spending weekends together or catching up with my old hobbies (music, reading) distracts from research, in a good way. When I get to work, I can approach my research with a fresh mind and potentially from a new perspective.

Having said this, I’ve always been good at not working too hard, which is probably the reason why I’ve always been pretty happy during my time in academia. (Having strong Australian and Russian cultural ties, I have both the “she’ll be right” and the “авось повезёт” attitudes. Contrary to popular belief, a relaxed attitude towards work is also compatible with a German mindset: in Germany, people tend to work hard during the day, but switch off as soon as they leave the office.) At the beginning of my PhD, one of the best pieces of advice that I received was to travel as much as possible. I tried to combine my trips with lab or conference visits, but I also spent a lot of time discovering new places and not thinking about research at all. During my PhD in Sydney, I also pursued old and new hobbies: I joined a book club, an orchestra, a French conversation group, took karate lessons, and thereby met lots of great people and have many good memories from my time in Sydney.

Stick to your principles
For me, this point is especially relevant from an Open Science perspective. Perhaps, if I spent less time on doing research in a way that is acceptable for me, I’d have double the amount of publications. This could, of course, be extremely advantageous on the job market. On the flip side, there are also more and more researchers who value quality over quantity: a job application and CV with lots of shoddy publications may be valued by some professors, but may be immediately trashed by others who are more onboard with the open science movement.

The moral of this story is: One can’t make everyone happy, so it’s best to stick to one’s own principles, which also has the side effect that you’ll be valued by researchers who share your principles. 

A project always takes longer than one initially thinks
Writing a research proposal of any kind involves writing a timeline. In my experience, the actual project will always take much longer than anticipated, often due to circumstances beyond your control (e.g., recruitment takes longer than expected, collaborators take a long time to read drafts). For planning purposes, it’s good to add a couple of months to account for this. And if you notice that you can’t keep up with your timeline: that’s perfectly normal.

Have a backup plan
For a long time, I saw the prospect of leaving academia as the ultimate personal failure. This changed when I made the decision that my priority is to work within commutable distance of my husband, which, in the case of an academic couple, may very well involve one or both leaving academia at some stage. It helped to get a more concrete idea of what leaving academia would actually mean. It is ideal if there is a “real world” profession where one’s research experience would be an advantage. In my case, I decided to learn more about statistics and data science. In addition to opening job prospects that sound very interesting and involve a higher salary than the one I would get in academia, it gave me an opportunity to learn things that helped take my research to a different level.

Choosing a mentor
From observing colleagues, I have concluded that the PhD supervisor controls at least 90% of a student’s PhD experience. For prospective PhD students, my advice would be to be very careful in choosing a supervisor. One of the biggest warning signs (from observing colleagues’ experiences) is a supervisor who reacts negatively when a (female) PhD student or post-doc decides to start a family. If you get the possibility to talk to your future colleagues before starting a PhD, ask them about their family life, and how easy they find it to combine family with their PhD or post-doc work. If you’re stuck in a toxic lab, my advice would be: Get out as soon as you can. Graduate as soon as possible and get a post-doc in a better lab; start a new PhD in a better lab, even if it means losing a few years; leave academia altogether. I’ve seen friends and colleagues getting long-lasting physical and psychological health problems because of a toxic research environment: nothing is worth going through this.

Having a backup plan, as per the point above, could be particularly helpful in getting away from a toxic research environment. Probably one would be much less willing to put up with an abusive supervisor if one is confident that there are alternatives out there.

Choosing collaborators
Collaborators are very helpful when it comes to providing feedback about aspects that you may not have thought about. One should bear in mind, though, that they have projects of their own: chances are, they will not be as enthusiastic about your project as you are, and may not have time to contribute as much as you expect. This is good to take into account when planning a project: assuming that you will need to do most of the work yourself will reduce misunderstandings and stress due to the perception of collaborators not working hard enough on this project.

Be aware of the Imposter Syndrome
During my PhD, there were several compulsory administrative events that, at the time, I thought were a waste of time. Among other things, we were told about the imposter syndrome at one such event (also, we were given the advice to travel as much as possible, by a recently graduated PhD student). It was relatively recently that I discovered that many other early-career researchers have never heard of the imposter syndrome before, and often feel inadequate, guilty, and tired from their research. Putting a label on this syndrome may help researchers to become more aware that most people often feel like an impostor in academia, and take this feeling less seriously.