Update (27.12.2021): The content of this blog post has been extended and published in the following article:
Schmalz, X., Biurrun Manresa, J., & Zhang, L. (2021). What is a Bayes factor? Psychological Methods. Preprint: https://osf.io/5geqt/.
TL;DR: Do you
need a justification for your prior parameters in JASP? Scroll down
to find fill-in-the-blank sentences which you can use, and a table
where you can pick a range of effect sizes which you expect and
the corresponding prior parameters.
----------------------------
With many psychologists
turning Bayes-curious, softwares are appearing that make it easy to
calculate Bayes Factors.
JASP
(Love et al., 2015) has a similar layout to SPSS, and allows the user
to perform Bayesian analyses which are equivalent to a series of
popular frequentist tests. Here, I will describe the priors which are
implemented in JASP for two frequently used tests: the
t-test
and Pearson's correlation. I will also explain what we do when we
change them. The aim is to provide the basis for a better
understanding of what priors mean, and how we can justify our choice
of prior parameters.
Both frequentist and
Bayesian statistics rely on a series of underlying assumptions and
calculations, which are important to understand in order to interpret
the value that the software spits out (i.e., a p-value or a
Bayes Factor). Given that very few psychologists have been schooled
in Bayesian statistics, the assumptions underlying the Bayes Factor
are often not intuitive.
One important
difference between Bayesian and frequentist data analyses is the use
of a prior. The prior represents the beliefs or knowledge that we
have about our effect of interest, before we consider the data which
we aim to analyse. The prior is a distribution which can be specified
by the experimenter. This distribution becomes updated, once we have
data, to give a posterior distribution. For calculating a Bayes
Factor, we have two priors: one that describes one hypothesis (e.g.,
a null hypothesis: no difference between groups, or no correlation
between two variables), and one that describes a different
hypothesis. JASP then computes the probability of the observed data
under each of these hypotheses, and divides one by the other to
obtain the Bayes Factor: the degree to which the data is compatible
with one hypothesis over the other.
To some extent, then,
the inference depends on the prior. The degree to which the prior
matters depends on how much data one has: when there is a lot of
data, it “overrides” the prior, and the Bayes Factor becomes very
similar across a wide range of plausible priors. Choosing an
appropriate prior becomes more important, though, when (1) we do not
have a lot of data, (2) when we need to justify why we use a
particular prior (e.g., for a Registered Report or grant proposal), or (3) when we
would just like to get a better idea of how the Bayes Factor is
calculated. The aim of the current blog post is to provide an
introduction to the default parameters of JASP, and what it means
when we change them around, while assuming very little knowledge of
probability and statistics from the reader.
T-tests
Let's start with
t-tests. JASP has the option to do a Bayesian independent
samples t-test. It also provides some toy data: here, I'm
using the data set “Kitchen Rolls”. Perhaps we want to see if age
differs as a function of sex (which makes no sense, theoretically,
but we need one dichotomous and one continuous variable for the
t-test). Below the fields where you specify the variables, you
can adjust two prior parameters: (1) The hypothesis (two-tailed or
directional), and (2) the prior (Cauchy prior width). Let's start
with the Cauchy. The default parameter is set to 0.707. Contrary to
what is often believed, this does not represent the size of the
effect that we expect. To understand what it represents, we need to
take a step back to explain what a Cauchy is.
A Cauchy is a
probability distribution. (
Wikipedia
is a very good source for finding information about the properties of
all kinds of distributions.) Probability distributions describe the
probability of possible occurrences in an experiment. Each type of
distribution takes a set of parameters, with which we can infer the
exact shape of the distribution. The shape of our well-familiar
normal distribution, for example, depends both on the mean and on the
variance: if you look up the normal distribution on
Wikipedia,
you will indeed see in the box on the right that the two parameters
for this distribution are μ and σ
2. On the Wikipedia
distribution pages, the top figure in the box shows how the shape of
the distribution changes if we change around the parameters.
Visually, the Cauchy distribution is similar to the normal
distribution: it is also symmetrical and kind-of bell-shaped, but it
has thicker tails. It also takes two parameters: the location
parameter and a scale parameter. The location parameter determines
where the mode of the distribution is. The scale parameter determines
its width. The latter is what we're after: in the context of Cauchy
priors, it is also often called the width parameter.
Back to JASP: when we
change the Cauchy prior width, we don't change the mode of our
distribution, but its width (i.e., the scale parameter): we are not
saying that we are considering certain values to be more or less
likely, but that we consider the range of likely effect sizes to be
more or less narrow. The Cauchy, in JASP, is by default centred on
zero, which gives us a bidirectional test. Overall, small effect
sizes are considered to be more likely than large effect sizes (as
shown by the general upside-down-U shape of the distribution). If we
have a directional hypothesis, rather than shifting the location
parameter, JASP allows us to pick which group we expect to have
higher values (Group 1 > Group 2, or Group 1 < Group 2). This
simply cuts the distribution in half. We can try this with our
Kitchen Rolls data: If, under the section “Plots”, we tick “Prior
and posterior”, we will see a figure, in addition to the Bayes
Factor, which shows the prior for the alternative hypothesis, as well
as the posterior (which we will ignore in the current blog post). The
default settings show the following plot (note the symmetrical prior
distribution):
When we anticipate that
Group 1 will have higher values than Group 2, half of the prior
distribution is cut:
And when we anticipate
that Group 2 will have higher values than Group 1:
So, what do you do when
you plan to use the Bayes Factor t-test
for inference and the reviewer of the Registered Report asks you to
justify your prior? What the Cauchy can tell us is how confident we
are that the effect lies within a certain range. We might write
something like:
“The prior is
described by a Cauchy distribution centred around zero and with a
width parameter of x. This corresponds to a probability of P% that
the effect size lies between -y and y. [Some literature to support
that this is a reasonable expectation of the effect size.]”
So,
how do you determine x,
P, and y?
For P, that's a matter
of preference. For a registered report of mine, I chose 80%, but this
is rather arbitrary. The y
you pick in such a way that it describes what you believe about your
effect size. If you think it cannot possibly be bigger than Cohen's d
= 0.5, that could be your y.
And once you've picked your y,
you can calculate the x.
This is the tricky part, though it can be done relatively easily in
R. We want to find a the parameter x
where we have an 80% probability of obtaining values between -y
and y.
To do this, we use the cumulative distribution function, which
measures the area under the curve of a probability distribution
(i.e., the cumulative probability of a range of values). The R
function pcauchy
takes the values of y,
assuming a location parameter and a scale parameter, to get the
probability that an observation randomly drawn from this distribution
is greater than y. To
get the probability that an observation randomly drawn from this
distribution lies between y
and -y, we type:
pcauchy(2,0,0.707)
- pcauchy(-2,0,0.707)
This
is for the default settings of JASP (location parameter = 0, scale
parameter = 0.707). This gives us the following probability:
[1]
0.7836833
Thus,
if we use the default JASP parameters, we could write (rounding the
output up 0.78 to 80%):
“The prior is
described by a Cauchy distribution centred around zero and with a
width parameter of 0.707. This corresponds to a probability of 80%
that the effect size lies between -2 and 2. [Some literature to
support that this is a reasonable expectation of the effect size.]”
An
effect size of 2 is rather large for most psychology studies: we
might be sure that we're looking for smaller effects than this. To
check how we would need to change the scale parameter set to obtain
an 80% probability (or any other value of P)
to get the expected effect sizes, you can copy-and-paste the code
above into R, change the effect size range (2 and -2) to your desired
ys, and play around
with the scale parameters until you get the output you like. Or, if
you would like to stick with the 80% interval, you can pick the scale
parameter for a set of effect size ranges from the table below (the
percentage and the scale parameter are rounded):
Range of effect sizes (non-directional)
|
Range of effect sizes (directional)
|
Scale parameter required for 80% probability
|
-2 to 2
|
0 to 2 or -2 to 0
|
0.71 (default)
|
-1.5 to 1.5
|
0 to 1.5 or -1.5 to 0
|
0.47
|
-1.3 to 1.3
|
0 to 1.3 or -1.3 to 0
|
0.41
|
-1.1 to 1.1
|
0 to 1.1 or -1.1 to 0
|
0.35
|
-0.9 to 0.9
|
0 to 0.9 or -0.9 to 0
|
0.3
|
-0.7 to 0.7
|
0 to 0.7 or -0.7 to 0
|
0.22
|
-0.5 to 0.5
|
0 to 0.5 or -0.5 to 0
|
0.16
|
-0.3 to 0.3
|
0 to 0.3 or -0.3 to 0
|
0.1
|
The
middle column shows what happens when we have a directional
hypothesis. Basically, the probability of finding a range between 0
and y
under the cut-in-half Cauchy is the same as the probability of
finding a range between -y
and y
in the full Cauchy. I explain in a footnote1
why this is the case.
How
does the choice of prior affect the results? In JASP, after you have
collected your data, you can check this by ticking the “Bayes
factor robustness check” box under “Plots”. Below is what this
plot looks like for our age as a function of sex example. The grey
dot marks the Bayes Factor value for the prior which we chose: here,
I took the scale parameter of 0.1, corresponding to an 80% chance of
effect sizes between -0.3 and 0.3.
After having played around with different
parameters in R and doing the calculations above, E.J. Wagenmakers
drew my attention to the fact that, when we choose the range width to
be 50%, not 80%, the width parameter is equal to the range of values
that we expect. So, if we are less confident about how big we expect
the effect to be (and less keen to mess around with the different parameter values in R), we can simply write (below, I assume the default
prior; if you have different expectations about the effect size,
replace all mentions of the value “0.707” with your preferred
effect size):
“The prior is
described by a Cauchy distribution centred around zero and with a
width parameter of 0.707. This corresponds to a probability of 50%
that the effect size lies between -0.707 and 0.707. [Some literature
to support that this is a reasonable expectation of the effect
size.]”
After having written most of the above, I also
realised that I had not updated JASP for a while, and the newer
version allows us to change the location parameter of the Cauchy, as well as its width.
Thus, it is possible to change the mode of the distribution to the effect
size that you consider the most likely. Then, you can calculate the new
effect size range by taking the values from the table above, and
adding the location parameter to the upper and lower bound, for example:
“The prior is
described by a Cauchy distribution centred around 0.707 and with a
width parameter of 0.707. This corresponds to a probability of 50%
that the effect size lies between 0 and 1.141. [Some literature to
support that this is a reasonable expectation of the effect size.]”
Correlations
Now, let's move on to
correlations. Again, our goal is to make the statement:
“The prior is
described by a beta-distribution centred around zero and with a width
parameter of x. This corresponds to a probability of P% that the
correlation coefficient lies between -y and y. [Some literature to
support that this is a reasonable expectation of the effect size.]”
When you generate
a Bayesian correlation matrix in JASP, it gives you two things: The
Pearson's correlation coefficient (r)
that we all know and love, and the Bayes Factor, which quantifies the
degree to which the observed r
is compatible with the presence of a correlation over the absence of
a correlation. The
prior for the alternative hypothesis is now described by a
beta-distribution, not by a Cauchy. More details about the
beta-distribution can be found in the footnote2.
For the less maths-inclined people, suffice it to say that the
statistical parameters of the distribution do not directly translate
into the parameters that you input in JASP, but never fear: the table
and text below explain how you can easily jump from one to the other,
if you want to play around with the different parameters yourself.
The default
parameter for the correlation alternative prior is 1. This
corresponds to a flat line, and is identical to a so-called uniform
distribution. Beware that describing this distribution as “All
possible values of r
are equally likely” will trigger anything from a long lecture to a
condescending snort from maths nerds: as we're dealing with a
continuous distribution, a single value does not have a probability
associated with it. The mathematically correct way to put it is: “If
we take any two intervals (A
and B) of the same
length from the continuous uniform distribution, the probability of
the observation falling interval A will
equal to the probability of the observation falling into interval B.”
Basically, if you have no idea what the correlation
coefficient will be like, you can keep the prior as it is. As with
the t-test, you can test
directional hypotheses (r
> 0 or r < 0).
Changing the parameter
will either make the prior convex (U-shaped) or concave
(upside-down-U-shaped). In the former case, you consider values
closer to -1 and 1 (i.e., very strong correlations) to be more
likely. Perhaps this could be useful, for example, if you want to
show that a test has a very high test-retest correlation. In the
latter case, you consider smaller correlation coefficients to be more
likely. This is probably closer to the type of data that, as a
psychologist, you'll be dealing with.
So, without further
ado, here is the table from which you can pick the prior (first
column), based on the effect size range (possible correlation
coefficients) that you expect with 80% certainty:
JASP parameter (A)
|
Range of effect sizes (r)
|
Statistical parameters (a, b)
|
Statistical inputs (R)
|
1
|
-0.8 to 0.8
|
1, 1
|
0.1 to 0.9
|
1/3
|
-0.5 to 0.5
|
3, 3
|
0.25 to 0.75
|
1/13
|
-0.25 to 0.25
|
7, 7
|
0.375 to 0.625
|
--------------
Update (18.11.2020): After a comment by Katrin pointed out that the numbers in the table above don't add up, I had a look and indeed found some discrepancies, was not able to follow exactly how I got them, and it also took me a while to understand the text I'd written below. I changed the numbers in the table above, and rewrote the section below to make this more clear.
-------------
The R code (if you want
to play around with the parameters and ranges), for the first row,
is:
pbeta(0.925,1,1)
- pbeta(0.125,1,1)
If you have a range of correlations, centered on zero, that you are X% confident about, you can do the following:
Step 1: Convert the effect sizes from your desired range (rl, ru) to the statistical inputs in the fourth row (Rl, Ru) with the formula: R
= 0.5 + 0.5*r.
Step 2: Insert the values (Rl, Ru) into the code above, for the first parameter in the two pbeta-commands. The output should be a number between 0 and 1: This is your confidence level.
Step 3: The second and third parameter in the pbeta-command (a, b) should be identical (i.e., a = b): these are the parameters of the beta-distribution, which need to be the same to ensure that the distribution is symmetrical (as written above, JASP, at least in its current implementation, only contains symmetric distributions, and hence there is only one parameter that you can input to determine prior width). The statistical parameters for the R-function (a, b; third column) are not the same as the JASP parameter (A; 1st column), but can be obtained by the simple formula: A
= 1/a. There must be a more elegant way to do this, but to get to the desired confidence level, you can change around the beta-parameter until you get to the confidence level you desire.
As with the
t-test, when you chop
the beta-distribution in half (i.e., when you test a directional
hypothesis), the upper bound (y)
should be identical to the upper bound in the non-directional prior.
Conclusion
The
blogpost aims to provide a psychologist reader with a sufficiently
deep understanding to justify the choice of prior, e.g., for a
Registered Report. If you've worked your way through the text above
(in which case: thank you for bearing with me!), you should now be
able to choose a prior parameter in JASP in such as way that it
translates directly to the expectations you have about possible
effect size ranges.
To
end the blogpost on a more general note, here are some random thoughts. The layout of JASP is based on SPSS, but unlike SPSS,
JASP is open source and based on the programming language R. JASP
aims to provide an easy way for researchers to switch from
frequentist testing in SPSS to doing Bayesian analyses. Moving away
from SPSS is always a good idea. However, due to the similar,
easy-to-use layout, JASP inherits one of the problems of SPSS: it's
possible to do analyses without actually understanding what the
output means. Erik-Jan Wagenmakers (?) once wrote on Twitter (?) that
JASP aims to provide “training wheels” for researchers moving
away from frequentist statistics and SPSS, who will eventually move
to more sophisticated analysis tools such as R. I hope that this
blogpost will contribute a modest step to this goal, by giving a more
thorough understanding of possible prior parameters in the context of
Bayes Factor hypothesis testing.
-----------------------------
-----------------------------
1
If we simply cut the Cauchy distribution in half, we no longer have a
probability distribution: a probability distribution, by definition,
needs to integrate to 1 across the range of all possible values. If
we think about a discrete distribution (e.g., the outcome of a die
toss), it's intuitive that the sum of all possible outcomes should be
1: that's how we can infer that the probability of throwing a 6 is
1/6, given a cubical die (because we also know that we have 6
possible, equiprobable outcomes). For continuous distributions, we
have an infinite range of values, so we can't really sum them.
Integrating is therefore the continuous-distribution-equivalent to
summing. Anyhow: If we remove half of our Cauchy distribution, we end
up with a distribution which integrates out to 0.5 (across the range
from 0 to infinity). To change this back to a probability
distribution, we need to multiply the function by a constant, in this
case, by 2. If you look at the plots for the full Cauchy prior versus
the directional priors, you will notice that, at x
= 0, y ≈ 0.5 for the
full Cauchy, and y ≈
1 for the two truncated Cauchys. For calculating the probability of a
certain range, this means that we need to multiply it by two. Which
is easy for our case: We start off with a given range (-y
to y)
our full Cauchy and cut off half (so we get the range 0 to y),
so we lose half of the area and need to divide the probability of
getting values in this range by half. Then we multiply our function
by 2, because we need to turn it back to a probability distribution:
we also multiply the area between 0 and y
by two, which gives us the same proportion that we started with in
the first place.
2
A beta-distribution
looks very different to a Cauchy or normal, and takes different
parameters (on wikipedia, denoted α
and β). When both of
the parameters are equal (α
= β), the
distribution is symmetrical. In JASP, you can only adjust one number:
the same value is then used for both parameters, so the prior
distribution is always symmetrical. The number which you can adjust
in the JASP box (let's call it A)
does not equal to the true parameter that defines the beta-function
(let's call it a), as
it has an inverse relationship to it (A
= 1/a).
The other difference between the actual beta-distribution and the
JASP prior is that the beta-distribution is defined for values
between 0 and 1: the JASP prior is stretched between values of -1 and
1. Thus, when using the function to calculate the probabilities of
different ranges of r
under different parameters, we need to transform r
to to a value between 0 and 1 before
we can make a statement about the size of the correlations. I hope to
make this clearer when I present the table with parameters and effect
size ranges.