Wednesday, September 21, 2016

Some thoughts on methodological terrorism

Yesterday, I woke up to a shitstorm on Twitter, caused by an editorial-in-press by social psychologist Susan Fiske (who wrote my undergraduate Social Psych course textbook). The full text of the editorial, along with a superb commentary from Andrew Gelman, can be found here. This editorial, which launches an attack against so-called methodological terrorists who have the audacity to criticise their colleagues in public, has already inspired blog posts such as this one by Sam Schwarzkopf and this one which broke the time-space continuum by Dorothy Bishop

However, I would like to write about about one aspect of Susan Fiske’s commentary, which also emerged in a subsequent discussion with her at the congress of the German Society for Psychology (which, alas, I followed only on twitter). In the editorial Fiske states that psychological scientists at all stages of their career are being bullied; she seems especially worried about graduate students who are leaving academia. In the subsequent discussion, as cited by Malte Elson, she specifies that >30 graduate students wrote to her, in fear of cyberbullies.*

Being an early career researcher myself, I can try to imagine myself in a position where I would be scared of “methodological terrorists”. I can’t speak for all ECRs, but for what it’s worth, I don’t see any reason to stifle public debate. Of course, there is internet harassment which is completely inexcusable and should be punished (as covered by John Oliver in this video). But I have never seen, nor heard of, a scientific debate which dropped to the level of violence, rape or death threats.

So, what is the worst thing that can happen in academia? Someone finds a mistake in your work (or thinks they have found a mistake), and makes it public, either through the internet (twitter, blog), a peer-reviewed paper, or by screaming it out at an international conference after your talk. Of course, on a personal level, it is preferable that before or instead of making it public, the critic approaches you privately. On the other hand, the critic is not obliged to do this – as others build on your work, it is only fair that the public should be informed about a potential mistake. It is therefore, in practice, up to the critic to decide whether they will approach you first, or whether they think that a public approach would be more effective in getting an error fixed. Similarly, it would be nice of the critic to adopt a kind, constructive tone. It would probably make the experience more pleasant (or less unpleasant) for both parties, and be more effective in convincing the person who is criticised to think about the criticiser’s point and to decide rationally whether or not this is a valid point. But again, the critic is not obliged to be nice – someone who stands up at a conference to publicly destroy an early career researcher’s work is an a-hole, but not a criminal. (Though I can even imagine scenarios where such behaviour would be justified, for example, if the criticised researcher has been unresponsive to private expressions of concern about this work.)

As an early career researcher, it can be very daunting to face an audience of potential critics. It is even worse if someone accuses you of having done something wrong (whether it’s a methodological shortcoming of your experiment, or a possibly intentional error in your analysis script). I have received some criticism throughout my five-year academic career; some of it was not fair, though most of it was (even though I would sometimes deny it, in the initial stages). Furthermore, there are cultural differences in how researchers express their concern with some aspect of somebody’s work: in English-speaking countries (Australia, UK, US), much softer words seem to be used for criticising than in many mainland European countries (Italy, Germany). When I spent six months during my PhD in Germany, I was shocked at some of the conversations I had overheard between other PhD students and their supervisors – being used to the Australian style of conversation it seemed to me that German supervisors could be straight-out mean. Someone who is used to being told about a mistake with the phrase: “This is good, but you might want to consider…” is likely to be shocked and offended if they go to an international conference and someone tells them straight out: “This is wrong.” This could lead to some people feeling personally attacked due to what is more or less a cultural misunderstanding.

In any event, it is inevitable that one makes mistakes from time to time, and that someone finds something to criticise about your work. Indeed, this is how science progresses. We make mistakes, and we learn from them. We learn from others’ mistakes. Learning is what science is all about. Someone who doesn’t want to learn cannot be a scientist. And if nobody ever tells you that you made a mistake, you cannot learn from it. Yes, criticism stings, and some people are more sensitive than others. However, responding to criticism in a constructive way, and being aware of potential cultural differences in how criticism is conveyed, is part of the job description of an academic. Somebody who reacts explosively or defensively to criticism cannot be a scientist just like someone who is afraid of water cannot be an Olympic swimmer.

In response to this, Daniël Lakens wrote, in a series of tweets (I can’t phrase it better): “100+ students told me they think of quitting because science is no longer about science. [… They are the] ones you want to stay in science, because they are not afraid, they know what to do, they just doubt if a career in science is worth it.”

Monday, June 27, 2016

What happens when you try to publish a failure to replicate in 2015/2016

Anyone who has talked to me in the last year would have heard me complain about my 8-times-failure-to-replicate which nobody wants to publish. The preprint, raw data and analysis scripts are available here, so anyone can judge for themselves if they think the rejections to date are justified. In fact, if anyone can show me that my conclusions are wrong – that the data are either inconclusive, or that they actually support an opposite view – I will buy them a bottle of drink of their choice*. So far, this has not happened.

I promise to stop complaining about this after I publish this blog post. I think it is important to be aware of the current situation, but I am, by now, just getting tired of debates which go in circles (and I’m sure many others feel the same way). Therefore, I pledge that from now on I will stop writing whining blog posts, and I will only write happy ones – which have at least one constructive comment or suggestion about how we could improve things.

So, here goes my last ever complaining post. I should stress that the sentiments and opinions I describe here are entirely my own; although I’ve had lots of input from my wonderful co-authors in preparing the manuscript of my unfortunate paper, they would probably not agree with many of the things I am writing here.

Why is it important to publish failures to replicate?

People who haven’t been convinced by the arguments put forward to date will not be convinced by a puny little blogpost. In fact, they will probably not even read this. Therefore, I will not go into details about why it is important to publish failures to replicate. Suffice it to say that this is not my opinion – it’s a truism. If we combine a low average experimental power with selective publishing of positive results, we – to use Daniel Lakens’ words – get “a literature that is about as representative of real science as porn movies are representative of real sex”. We get over-inflated effect sizes across experiments, even if an effect is non-existent; or, in the words of Michael Inzlicht, “meta-analyses are fucked”.

Our study

The interested reader can look up further details of our study in the OSF folder I linked above ( The study is about the Psycholinguistic Grain Size Theory (Ziegler & Goswami, 2005)**. If you type the name of this theory into google – or some other popular search terms, such as “dyslexia theory”, “reading across languages”, or “reading development theory” – you will see this paper on the first page. It has 1650 citations, at the time of writing of this blogpost. In other words, this theory is huge. People rely on it to interpret their data, and to guide their experimental designs and theories in diverse topics of reading and dyslexia.

The evidence for the Psycholinguistic Grain Size Theory is summarised in the preprint linked above; the reader can decide for themselves if they find it convincing. During my PhD, I decided to do some follow-up experiments on the body-N effect (Ziegler & Perry, 1998; Ziegler et al., 2001; Ziegler et al., 2003). Why? Not because I wanted to build my career on the ruins of someone else’s work (which is apparently what some people think of replicators), but because I found the theory genuinely interesting, and I wanted to do further work to specify the locus of this effect. So I did study after study after study – blaming myself for the messy results – until I realised: I had conducted eight experiments, and the effect just isn’t there. So I conducted a meta-analysis on all of our data, plus an unpublished study by a colleague with whom I’d talked about this effect, wrote it up and submitted it.

Surely, in our day and age, journals should welcome null-results as much as positive results? And any rejections would be based on flaws in the study?

Well, here is what happened:

Submission 1: Relatively high-impact journal for cognitive psychology

Here is a section directly copied-and-pasted from a review:

“Although the paper is well-written and the analyses are quite substantial, I find the whole approach rather irritating for the following reasons:

1. Typically meta-analyses are done one [sic] published data that meet the standards for publishing in international peer-reviewed journals. In the present analyses, the only two published studies that reported significant effects of body-N and were published in Cognition and Psychological Science were excluded (because the trial-by-trial data were no longer available) and the authors focus on a bunch of unpublished studies from a dissertation and a colleague who is not even an author of the present paper. There is no way of knowing whether these unpublished experiments meet the standards to be published in high-quality journals.”

Of course, I picked the most extreme statement. Other reviewers had some cogent points – however, nothing that would compromise the conclusions. The paper was rejected because “the manuscript is probably too far from what we are looking for”.

Submission 2: Very high-impact psychology journal

As a very ambitious second plan, we submitted the paper to one of the top journals in psychology. It’s a journal which “publishes evaluative and integrative research reviews and interpretations of issues in scientific psychology. Both qualitative (narrative) and quantitative (meta-analytic) reviews will be considered, depending on the nature of the database under consideration for review” (from their website). They have even announced a special issue on Replicability and Reproducibility, because their “primary mission […] is to contribute a cohesive, authoritative, theory-based, and complete synthesis of scientific evidence in the field of psychology” (again, from their website). In fact, they published the original theoretical paper, so surely they would at least consider a paper which argues against this theory? As in, send it out for review? And reject it based on flaws, rather than the standard explanation of it being uninteresting to a broad audience? Given that they published the original theoretical article, and all? Right?

Wrong, on all points.

Submission 3: A well-respected, but not huge impact factor journal in cognitive psychology

I agreed to submit this paper to a non-open-access journal again, but only under the condition that at least one of my co-authors would have a bet with me: if it got rejected, I would get a bottle of good whiskey. Spoiler alert: I am now the proud owner of a 10-year aged bottle of Bushmills.

To be fair, this round of reviews brought some cogent and interesting comments. The first reviewer provided some insightful remarks, but their main concern was that “The main message here seems to be a negative one.” Furthermore, the reviewer “found the theoretical rationale [for the choice of paradigm] to be rather simplistic”. Your words, not mine! However, for a failure to replicate, this is irrelevant. As many researchers rely on what may or may not be a simplistic theoretical framework which is based on the original studies, we need to know whether the evidence put forward by the original studies is reliable.

I could not quite make sense of all of the second reviewer’s comment, but somehow they argued that the paper was “overkill”. (It is very long and dense, to be fair, but I do have a lot of data to analyse. I suspect most readers will skip from the introduction to the discussion, anyway – but anyone who wants the juicy details of the analyses should have easy access to them.)

Next step: Open-access journal

I like the idea of open-access journals. However, when I submitted previous versions of the manuscript I was somewhat swayed by the argument that going open access would decrease the visibility and credibility of the paper. This is probably true, but without any doubt, the next step will be to submit the paper to an open-access journal. Preferably one with open review. I would like to see a reviewer calling a paper “irritating” in a public forum.

At least in this case, traditional journals have shown – well, let’s just say that we still have a long way to go in improving replicability in psychological sciences. For now, I have uploaded a pre-print of the paper on OSF and on researchgate. On researchgate, the article has over 200 views, suggesting that there is some interest in this theory; the finding that the key study is not replicable seems relevant to researchers. Nevertheless, I wonder if the failure to provide support for this theory will ever gain as much visibility as the original study – how many researchers will put their trust into a theory that they might be more sceptical about if they knew the key study is not as robust as it may seem?

In the meantime, my offer of a bottle of beverage for anyone who can show that the analyses or data are fundamentally flawed, still stands.


* Beer, wine, whiskey, brandy: You name it. Limited only by my post-doc budget.
** The full references of all papers cited throughout the blogpost can be found in the preprint of our paper.


Edit 30/6: Thanks all for the comments so far, I'll have a closer look at how I can implement your helpful suggestions when I get the chance!

Please note that I will delete comments from spammers and trolls. If you feel the urge to threaten physical violence, please see your local counsellor or psychologist.

Thursday, June 16, 2016

Naming, not shaming: Criticising a weak result is not the same as launching a personal attack

You are working on a theoretical paper about the proposed relationship between X and Y. A two-experiment study has previously shown that X and Y are correlated, and you are trying to explain the cognitive mechanisms that drive this correlation. This previous study makes conclusions based on partial correlations which take into account a moderator that has not been postulated a priori; raw correlations are not reported. The p-values for each of the two partial correlations are < 0.05, but > 0.04. In a theoretical paper, you stress that although it makes theoretical sense that there would be a correlation between these variables, we cannot be sure about this link.

In a different paradigm, several studies have found a group difference in a certain task. In most studies, this group difference has a Cohen’s d of around 0.2. However, three studies which all come from the same lab report Cohen’s ds ranging between 0.8 and 1.1. You calculate that it is very unlikely to obtain three huge effects such as these by chance alone (probability < 1%). 

For a different project, you fail to find an effect which has been reported by a previously published experiment. The authors of this previous study have published their raw data a few years after the original paper came out. You take a close look at this raw data, and find some discrepancies with the means reported in the paper. When you analyse the raw data, the effect disappears.

What would you do in each of the scenarios above? I would be very happy to hear about it in the comments!

From each of these scenarios, I would draw two conclusions: (1) The evidence reported by these studies is not strong, to say the least, and (2) it is likely that the authors used what we now call questionable research practices to obtain significant results. The question is what we can conclude in our hypothetical paper, where the presence or absence of the effect is critical. Throwing around accusations of p-hacking can turn ugly. First, we cannot be absolutely sure that there is something fishy. Even if you calculate that the likelihood of obtaining a certain result is minimal, it is still greater than zero – you can never be completely sure that there really is something questionable going on. Second, criticising someone else’s work is always a hairy issue. Feelings may get hurt, and the desire for revenge may arise; careers can get destroyed. Especially as an early-career researcher, one wants to stay clear of close-range combat.

Yet, if your work rests on these results, you need to make something of them. One could just ignore them – not cite these papers, pretend they don’t exist. It is difficult to draw conclusions from studies with questionable research practices, so they may as well not be there. But ignoring relevant published work would be childish and unscientific. Any reader of your paper who is interested in the topic will notice this omission. Therefore, one needs to at least explain why one thinks the results of these studies may not be reliable.

One can’t explain why one doesn’t trust a study without citing it – a general phrase such as: “Previous work has shown this effect, but future research is needed to confirm its stability” will not do. We could remain general in our accusations: “Previous work has shown this effect (Lemmon & Matthau, 2000), but future research is needed to confirm its stability”. This, again, does not sound very convincing.

There are therefore two possibilities: either we drop the topic altogether, or we write down exactly why the results of the published studies would need to be replicated before we would trust them, kind of like what I did in the examples at the top of the page. This, of course, could be misconstrued as a personal attack. Describing such studies in my own papers is an exercise involving very careful phrasing and proofreading for diplomacy by very nice colleagues. Unfortunately, this often leads to the watering down of arguments, and tip-toeing around the real issue, which is the believability of a specific result. And when we think about it, this is what we are criticising – not the original researchers. Knowledge about questionable research practices is spreading gradually; many researchers are still in the process of realising that they can really damage a research area. Therefore, judging researchers for what they have done in the past would be neither productive, nor wise.

Should we judge a scientist for having used questionable research practices? In general, I don’t think so. I am convinced that the majority of researchers don’t intend to cheat, but they are convinced that they have legitimately maximised their chance to find a very small and subtle effect. It is, of course, the responsibility of a criticiser to make it clear that a problem is with the study, not with the researcher who conducted it. But the researchers whose work is being criticised should also consider whether the criticism is fair, and respond accordingly. If they are prepared to correct any mistakes – publishing file-drawer studies, releasing untrimmed data, conducting a replication, or in more extreme cases publishing a correction or even retracting a paper – it is unlikely that they will be judged negatively by the scientific community, quite on the contrary.

But there are a few hypothetical scenarios where my opinion of the researcher would decrease: (1) If the questionable research practice was data fabrication rather than something more benign such as creative outlier removal, (2) if the researchers use any means possible to suppress studies which criticise or fail to replicate their work, or (3) if the researchers continue to engage in questionable research practices, even after they learn that it increases their false-positive rate. This last point bears further consideration, because pleading ignorance is becoming less and less defensible. By now, a researcher would need to live under a rock if they have not even heard about the replication crisis. And a good, curious researcher should follow up on hearing such rumours, to check whether issues in replicability could also apply to them.

In summary, criticising existing studies is essential for scientific progress. Identifying potential issues with experiments will save time as researchers won’t go off on a wild-goose-chase for an effect that doesn’t exist; it will help us to narrow down on studies which need to be replicated before we consider that they are backed up by evidence. The criticism of a study, however, should not be conflated with criticism of the researcher – either by the criticiser or by the person being criticised. A strong distinction between the criticism of a study versus criticism of a researcher would result in a climate where discussions about reproducibility of specific studies will lead to scientific progress rather than a battlefield.

Saturday, May 21, 2016

What would the ideal research world look like?

Recently, I was asked: “What made you interested in research methods?” I’m afraid I didn’t give a good answer, but instead started complaining about my eight-times failure to replicate that nobody wants to publish. I have been thinking about this question some more, and realised that my interest in research methods and good science is driven by predominantly selfish reasons. This gave me the idea to write a blog post: I think it is important to realise that striving towards good science is, in the long run, beneficial to a researcher. So let’s ignore the “how” for the time being (there are already many articles and blog posts on this issue; see, for example, entries for an essay contest by The Winnower) – let’s focus on the “why”.

The world as it should be
Let’s imagine the research world as it should (or could) be. Presumably, we all went into research because we wanted to learn more about the world – and we wanted to actively contribute to discovering new knowledge. Imagine that we live in a world where we can trust the existing literature. Theories are based on experiments that are sound and replicable. The job of a researcher is to keep up to date on this literature, find gaps, and design experiments that can fill these gaps, thus providing a more complete picture of the phenomenon they are studying.

The world as it is
The research world as it is provides two sources of frustrations (at least, for me): (1) Playing Russian Roulette when it comes to conducting experiments, and (2) sifting through a literature which consists of an unknown ratio of manure to pearls, and trying to find the pearls.

Russian Roulette
I have conducted numerous experiments during my PhD and post-doc so far, and a majority of them “didn’t work”. By “didn’t work”, I mean they showed non-significant p-values when I expected an effect, showed different results from published experiments (again, my eight-times failure to replicate), and occasionally, they were just not designed very well and I would get floor/ceiling effects. I attributed this to my own lack of experience and competence. I looked to my colleagues had many published experiments, and considered alternative career paths. In the last year of my PhD, I came to a realisation: even professors have the same problem.

In the research world as it is, a researcher may come up with an idea for an experiment. It can be a great idea, based on a careful evaluation of theories and models. The experiment can be well-designed and neat, providing a pertinent test of the researcher’s hypothesis. Then the data is collected and analysed – and it is discovered that the experiment “didn’t work”. Shoulders are shrugged – the researcher moves on. Occasionally, one experiment will “work” and can be published.

How is it possible, I asked myself, that so much good research goes to waste, just because an experiment “didn’t work”? Is it really necessary to completely discard a promising question or theory, just because a first attempt at getting an answer “didn’t work”? How many labs conduct experiments that “don’t work”, not knowing that other labs have already tried and failed with the same approach? These are, as of now, rhetorical questions, but I firmly believe that learning more about research methods and how these can be used to produce sound and efficient experiments can answer them.

Sifting through manure
Some theories are intuitively appealing, apparently elegant, and elicit a lot of enthusiasm with a lot of people. New PhD students want to “do something with this theory”, and try to do follow-up studies, only to find that their follow-up experiments “don’t work”, replications of the experiments that support the theory “don’t work”, and the theory doesn’t even make sense when you really think about it. *

Scientists stand on the shoulders of giants. Science cannot be done without relying on existing knowledge at least to some extent. In an ideal world, our experiments and theories should build on previous work. However, I often get the feeling that I am building on manure instead of a sound foundation.

So, in order to try and understand whether I can trust an effect, I sift through the papers on it. I look for evidence of publication bias, dodgy-sounding post-hoc moderators or trimming decisions, statistical and logical errors (such as concluding that the difference between two groups is significant because one is significantly above chance while the other is not); check whether studies with larger sample sizes tend to give negative results, while positive results are predominantly supported by studies with small samples. It’s a thankless job. I criticise and question the work of colleagues, who are often in senior positions and may well one day make decisions that affect my livelihood. At the same time, I lack the time to conduct experiments to test and develop my own ideas. But what else should I do? Close my eyes to these issues and just work on my own line of research? Spending less or no time scrutinising the existing literature would mean that I don’t know whether I am building my research agenda on pearls or manure. This would mean that I could waste months or years on a question that I should have known to be a dead end from the very beginning.

So, why am I interested in research methods? Because it will make research more efficient, for me personally. It is difficult to conduct a good study, but in the long run, it should be no more difficult than running a number of crappy studies and publishing the one that “worked”. It should also be much less frustrating, much more rewarding, and in the end, we will do what we (presumably) love: contribute to discovering new knowledge about how the world works.


* This example is fictional. Any resemblance to real persons or events is purely coincidental.

Wednesday, April 27, 2016

The power is in collaboration: Developing international networks to increase the reproducibility of science

This is an essay that I wrote for the Winnower's essay contest: "How do we ensure that research is reproducible?" 

The field of psychological science is in a pandemonium. With failures to replicate well-established effects, evidence for a skewed picture of science in the published literature, and a media hype about the replication crisis – what is left for us to believe in these days?

Luckily, researchers have done what they do best – research – to try to establish the causes, and possible solutions to this replication crisis. A coherent picture has emerged. There are three key factors that seem to have led to the replication crisis: (1) Underpowered studies, (2) publication bias, and (3) questionable research practices. Studies in psychology often test a small number of participants. As effects tend to be small and measures noisy, larger samples are required to reliably detect an effect. An underpowered study, trying to find a small effect with a small sample sizes, runs a high probability of not finding an effect, even if it is real (Button et al., 2013; Cohen, 1962; Gelman & Weakliem, 2009).

By itself, this would not be a problem, because a series of underpowered studies can be, in principle, combined in a meta-analysis to provide a more precise effect size estimate. However, there is also publication bias, as journals tend to prefer publishing articles which show positive results. Authors often do not even bother trying to submit papers with non-significant results, leading to a file-drawer problem (Rosenthal, 1979). As the majority of research papers are underpowered, the studies that do show a significant effect capture the outliers of a normal distribution around a true effect size (Ioannidis, 2005; Schmidt, 1992, 1996). This creates a biased literature: even if an effect is small or non-existent, a number of published studies can provide apparently consistent evidence for a large effect size.

The problems of low power and publication bias are further exacerbated by questionable research practices, where researchers – often unaware that they are doing something wrong – use little tricks to get their effects above a significance threshold, such as removing outliers until the threshold is reached, or including post-hoc covariates (John, Loewenstein, & Prelec, 2012; Simmons, Nelson, & Simonsohn, 2011).

As a lot of research and discussion exists on how to fix the problem of publication bias and questionable research practices – which mostly require a top-down change of the incentive structure. Here, I focus on the issue of underpowered studies, as this can be addressed by individual researchers. Increasing power is in everyone’s best interests: It strengthens science, but it also gives the researcher a better chance to provide a meaningful answer to their question of interest.

On the surface, the solution to the problem of underpowered studies is very simple: we just have to run bigger studies. The simplicity is probably why this issue is not discussed very much. However, the solution is only simple if you have the resources to increase your sample sizes. Running participants takes time and money. Therefore, this simple solution poses another problem: the possibility of creating a Matthew effect, where the rich get richer by producing large quantities of high-quality research, while researchers with fewer resources can produce either very few good studies, or numerous underpowered experiments for which they will get little recognition*.

On the surface, the key to avoiding the Matthew effect is also simple: if the rich collaborate with the poor, even researchers with few resources can produce high-powered studies. However, in practice, there are few perceived incentives for the rich to reach out to the poor. There are also practical obstacles for the poor in approaching the rich. These issues can be addressed, and it takes very little effort from an average researcher to do so. Below, I describe why it is important to promote collaborations in order to improve replicability in social sciences, and how this could be achieved.

In order to ensure the feasibility of creating a large-scale collaboration network, it would be necessary to promote the incentives for reaching out to the poor. Collecting data for someone with fewer resources may seem like charity. However, I argue that it is a win-win situation. Receivers are likely to reciprocate. If they cannot collect a large amount of data for you, perhaps they can help you in other ways. For example, they could provide advice on a project with which you got stuck and which you had abandoned years ago; they could score that data that you never got around to having a look at; or simply discuss new ideas, which could give you a fresh insight into your topic. If they collect even a small amount of data, this could improve a dataset. In the case of international collaborations, you would be able to recruit a culturally diverse sample. This would ensure that our view of psychological processes is generalisable beyond a specific population (Henrich, Heine, & Norenzayan, 2010).

There are numerous ways in which researchers can reach out to each other. Perhaps one could create an online platform for this purpose. Here, anyone can write an entry for their study, which can be at any stage: it could be just an idea, or a quasi-finished project which just needs some additional analyses or tweaks before publication.

Anyone can browse a list of proposed projects by topic, and contact the author if they find something interesting. The two researchers can then discuss further arrangements: whether together, they can execute this project, whether the input of the latter will be sufficient for co-authorship, or whether the former will be able to reciprocate by helping out with another project.

Similarly, if someone is conducting a large-scale study, and if they have time to spare in the experimental sessions, they could announce this in a complementary forum. They would provide a brief description of their participants, and offer to attach another task or two for anyone interested in studying this population.

To reach a wider audience, we could rely on social media. Perhaps a hashtag could be used on twitter. Perhaps #LOOC (“LOOking for Collaborator”)**? One could tweet: “Testing 100 children, 6-10 yo. Could include another task up to 15 minutes. #LOOC”. Or: “Need more participants for a study on statistical learning and dyslexia. #LOOC”, and attach a screen shot or link with more information.  

In summary, increasing sample sizes would break one of the three pillars of the replication crisis: large studies are more informative than underpowered studies, as they lead to less noisy and more precise effect size estimates. This can be achieved through collaboration, though only if researchers with resources are prepared to take on some amount of additional work by offering to help others out. While this may be perceived as a sacrifice, in the long run it should be beneficial for all parties. It will both become easier to diversify one’s sample, and help researchers who study small, specific populations (e.g., a rare disorder), to collaborate with others to recruit enough participants to draw meaningful conclusions. It will provide a possibility to connect with researchers from all over the world with similar interests and possibly complementary expertise. And in addition, it will lead to an average increase in sample sizes, and reported effects which can be replicated across labs.

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Confidence and precision increase with high statistical power. Nature Reviews Neuroscience, 14(8). doi:10.1038/nrn3475-c4
Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65(3), 145-153.
Gelman, A., & Weakliem, D. (2009). Of beauty, sex and power: Too little attention has been paid to the statistical challenges in estimating small effects. American Scientist, 97(4), 310-316.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61-83.
Ioannidis, J. P. A. (2005). Why most published research findings are false. Plos Medicine, 2(8), 696-701. doi:10.1371/journal.pmed.0020124
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 0956797611430953.
Rosenthal, R. (1979). The "File Drawer Problem" and Tolerance for Null Results. Psychological Bulletin, 86(3), 638-641.
Schmidt, F. L. (1992). What do data really mean? Research findings, meta-analysis, and cumulative knowledge in psychology. American Psychologist, 47(10), 1173.
Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115-129. doi:10.1037//1082-989x.1.2.115
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 0956797611417632.

* One may or may not consider this a problem – after all, the issue of the replicability crisis is solved.

** Urban dictionary tells me that “looc” means “Lame. Stupid. Wack. The opposite of cool. (Pronounced the same as Luke.)”