Accelerating Psychological Science with Large-Scale Collaborations

Science is the collaborative attempt to understand ourselves and the world around us better by gathering and evaluating evidence. Ironically enough, we are pretty bad at evaluating evidence. Luckily, others rejoice in pointing out our flaws. It is this reciprocal corrective process which is at the core of science, and the reason why it works so well. Working collaboratively helps us catch and correct each other’s mistakes.

Image of a Collaboration

So, are psychology researchers collaborating? Sure, they will say. They organise scientific conferences together, come up with interesting hypotheses over dinner and drinks, meticulously plan and execute experimental studies, and jointly write and publish papers. In this sense, asking whether researchers collaborate is silly; of course they do. But these types of collaborations are usually engaged in by only a handful of researchers with limited resources.

In this JEPS Bulletin post, we discuss how Crowdsourced Science could accelerate and scale collaborations. Crowdsourced science pools the resources of individual researchers together in order to achieve goals more efficiently (McCarthy, 2017). We interview Randy McCarthy and Christopher Chartier, two psychology researchers who recently started StudySwap, an online platform to facilitate crowdsourced research, and the Psychological Science Accelerator, a large, distributed network of psychology labs, putting collaboration on a much larger scale.

How do those projects work? How did Randy and Chris came first in contact with crowdsourced science, and what are its benefits? What are difficulties in conducting crowdsourced research, and how can you get started yourself? Enjoy the read!

Randy, you recently wrote an excellent blog post on the benefits of crowdsourced science. For our readers here, can you briefly recap them? How did you two first come in contact with the idea of crowdsourced science?

Randy: Sure. Probably the first thing people notice about multi-site projects such as the Many Labs projects or Registered Replication Reports is the overall sample sizes are large when compared to our typical psychology studies. That’s what caught my eye initially. To the extent these studies successfully created the necessary conditions to test their hypotheses, these large samples are extremely informative.

Whereas the large samples and high statistical power are obvious benefits, there are several less obvious benefits to crowdsourced research. Here are two additional benefits that may be less obvious at first blush.

First, when you have several researchers contributing to these projects, each sample, recruitment procedure, physical environment of the lab, etc. slightly differs from one another. Thus, multi-site projects test the effects across a range of slight permutations of these variables, which is arguably a more robust test of the effect than the traditional “single sample-single site” approach. Second, the resulting manuscripts are a very efficient way to communicate a lot of information. Editors, peer-reviewers, and readers all appreciate fewer-but-more-informative manuscripts.

I first started hearing about these projects at the SPSP conference about five or six years ago when the results from Many Labs 1 were presented. The concept of multiple researchers coordinating their efforts into a joint project just made so much sense. And the more I thought about it, the more sense it made. I’ve contributed a little bit of my research time to these projects ever since. I’ve met a lot of great people from a very diverse set of universities and colleges. It’s been a lot of fun.

Chris: I was asked by an old friend from my time at Miami University (Charlie Ebersole), if I’d like to get involved in ManyLabs 3, and subsequently, the Reproducibility Project: Psychology. I was amazed at the size of the contributions we were able to make to the field by combining forces into these large, coordinated projects. What immediately struck me was the potential for massive projects to get done with relatively little investment from individual labs. While there is a heavy coordination and planning burden on the leading team, the data collection burden can be quite low at each individual site, and yet when combined, the overall data sets are hugely informative.

You created StudySwap, an online platform to facilitate crowdsourced research. How does it work, and how can researchers join in? What have you planned for the future of StudySwap?

Randy: StudySwap is an online platform to support collaborative research. Researchers can find collaborators by posting descriptions of resources that are available for others to use (“Haves”), posting descriptions of resources they need that another researcher may have (“Needs”), or to coordinate collaborative research projects across several research teams. People who are interested should check us out on social media. In addition to contributing to research projects, spreading the word is a great way to help.

Chris: We hope more and more researchers start to use StudySwap to find fruitful collaborations. By decoupling ideas from the means of testing them, the site can help to involve more researchers in the process of evidence gathering in psychological science. I think that one of its big benefits is in providing a place where researchers can “offer up” their own findings for independent replication in another lab. It could turn replication attempts into a much more collaborative, and less adversarial, endeavor.

Chris, in August this year you proposed building a large, distributed network of psychology labs. To date, you already have more than 170 labs signed up to the Psychological Science Accelerator. That’s amazing! What will those labs be doing? How is their collaboration structured?

Chris: I, semi-jokingly, suggested that the time seemed right to build something of a “CERN for Psychology.” I thought there would be a lot of value in developing a standing collection of labs that are interested in large-scale collaborations. The Many Labs projects did an amazing job of gathering together many labs (duh!), but it looked like the effort to recruit and organize all these labs anew for each project was inefficient. I have been amazed at the huge immediate response my call for action received. Now that we have quickly developed a relatively large and broadly distributed data collection network, we are hard at work planning data collection for 2018. In fact, we just selected our first project for the Accelerator, an attempted extension of Oosterhof and Todorov’s (2008) valence-dominance model of face perception, originally found in the US, across all world regions.

We welcome submissions from any psychologists, whether they were involved in the Accelerator or not, and our Study Selection Committee and various Advisory Committees evaluate their promise for large-scale data collection before we make our final decisions. At that point, we attempt to match labs in the network with studies based on the lab capacity and study requirements. Submissions can be novel studies (even exploratory) or replication attempts.

Silberzahn and Uhlmann asked 29 research teams whether dark-skinned soccer player are more likely to receive the red card compared to light-skinned players. The answers varied starkly between the labs. Is this apparent subjectivity a bad thing? If so, how do you avoid it?

Randy: The Silberzahn and Uhlmann study examined how decisions in the data-analysis process may affect the conclusions you draw from a study. Like you said, in that particular study, the conclusions varied quite a bit from analyst-to-analyst. I’d imagine the extent to which the conclusions varied would depend on the particular effect, the dataset, the specificity of the hypothesis, etc.

I don’t think this subjectivity–or flexibility in the data analysis process–is inherently bad. We are studying complex phenomena and there are several reasonable and justifiable ways to examine these phenomena. But the Silberzahn and Uhlmann study is a good reminder that our conclusions are the result of a series of human decisions in the analysis process. For this reason, it is good to put safeguards in place to minimize the extent to which our biases contaminate the decisions leading up to our conclusions.

One obvious safeguard is to pre-register your hypotheses. This communicates to readers that your data analysis decisions were made independently of the results obtained. Other good practices are to present “robustness” analyses wherein you examine how the results change when you make other, justifiable data analysis decisions (e.g., different transformation approaches, different inclusion/exclusion criteria, “controlling” for different variables, etc.) and to make your data publicly available whenever possible.

Finally, the Silberzahn and Uhlmann study highlighted the problems with vague hypotheses. I believe a lot of our theories sometimes are phrased in a way they don’t make specific, falsifiable, and risky predictions, which makes them difficult to properly test.

Chris, as you mentioned, you were part of the Many Labs 3 project, investigating whether the quality of data changes throughout the semester. What did you find out, and what did you learn personally by being part of that project?

Chris: Overall, we found little evidence for time-of-semester effects. Our ability to draw strong conclusions was quite hampered, though, by the fact that few of the studies included in the investigation actually replicated successfully. In large part, we learned once again that the field needs to take very seriously issues of replicability and robustness of published evidence.

Personally, I learned a lot in terms of project coordination. It is amazing how much implicit subtle knowledge about experimental procedure administration exists within labs. It can be quite challenging to make all procedural details quite explicit and transparent from the outset. This hard work up front has the potential pay huge dividends in the long run, as it should make the job of future replicators much easier.

What are the limitations of crowdsourced science?

Randy: I wouldn’t call these limitations per se, but I’ll provide two concerns that I often think about. First, it can be difficult to maintain fidelity across labs for some studies. Lack of fidelity across labs is probably not too much of a concern with, for example, a basic Stroop task or with a simple anchoring-and-adjustment heuristic. But some studies probably are more difficult to “get right” (however you want to define that) across several different labs. I’ve heard this “lack of fidelity” argument raised when discussing several replication attempts and several crowdsourced project. Although I haven’t been persuaded by this argument with any of the already-completed projects, it also should not be dismissed out of hand.

Second, I think psychologists are going to have to learn how to appropriately calibrate our resources for the effects we want to study. Historically, we could use a simple heuristic that more participants is better and, when we are starting with N = 50 and trying to detect a small-to-medium sized effect, we would almost always be correct. However, now that crowdsourced research projects routinely attain large overall sample sizes (N > 1,000), researchers now have to be careful not to be inefficient by collecting too much data or using too much of participants’ time when it is not necessary. This is going to be a good topic for meta-scientists after we get more of these projects published.

Chris: I would just add that the increasing prevalence of crowdsourced research in psychological science is going to force a serious reconsideration of our current incentive structure and how we assign “credit” for scholarly contributions. It is currently unclear to what extent a hiring committee, or promotion and tenure committee, will value a researcher appearing as the 173rd author on a massive collaboration. We need to ensure that the incentive structure of the field aligns with our goals of producing reliable and generalizable evidence about human behavior and mental processes. I would personally value a 2% contribution to a massive, reliable, informative study than a 60% contribution to a small, preliminary, tentative study conducted in an isolated lab. I’m not sure how many in the field agree with me.

Is there a possibility for students to get involved? If not, are you planning anything along those lines?

Randy: There are so many different types of projects that it is difficult to make a general statement, but I believe several of the crowdsourced projects are well-suited for students to contribute to. In fact, it has been exciting that so many young people are interested in contributing to these research projects because this is the future generation of researchers.

I should say that there is probably a notion that because students are less experienced researchers that they are somehow “less good” researchers. This is not necessarily true, especially for projects where the stimuli and methods are typically decided on by other, presumably more experienced, researchers. But I think it would be another great meta-science project that could be another by-product of all of these crowdsourced research projects.

Also, Psi Chi is supporting crowdsourced research with their NICE project (https://osf.io/juupx/). Undergraduate students should definitely check that out.

Chris: I actually think that crowdsourced projects are particularly well-suited to student involvement. The current lineup of the Psychological Science Accelerator is rife with early career researchers, a fact that makes me quite optimistic about the direction of psychological science! Personally, I’ve seen that my undergraduate research assistants have derived great pleasure from being a small scale contributor to a massive scientific project. Such data collection endeavors seem a perfect fit for talented and well-trained undergraduates.

Daniel Kahneman, who first described the ***planning fallacy, estimated that finishing Thinking, Fast and Slow*** will take 3 years. All in all, it took about 7 years. That is to say, it’s already hard enough to get a project done according to plan working alone. International collaborations must only elevate these issues, I imagine. Do you have any tips for people who want to start a collaborative project? What are some pitfalls, and how does one avoid falling into them?

Randy: In my experience, although crowdsourced projects have unique features, which can add a little bit of work, these projects don’t seem to take much more time than traditional research projects I’ve been involved with. The biggest time commitments in crowdsourced research are still the conceptualizing of appropriate methods, writing the manuscript, analyzing the data, etc. And many of these time commitments have a nonlinear relationship with the size of the project.

Chris: I would challenge the notion that large-scale collaborations are inherently slower than conducting and disseminating studies conducted in individual labs. It’s an interesting question, but in my experience, it’s amazing how quickly work gets done when the effort is distributed across a large number of researchers. Lastly, all I can say is stay tuned to the Accelerator. Measure us not by the pace at which we produce individual empirical reports, but by the pace at which we contribute reliable and generalizable evidence on important questions in psychological science. My sense is that when assessed at this macro-level of analysis, we’ll find that the project does indeed constitute an Acceleration.

Fabian Dablander

Fabian Dablander is doing a PhD at the Department of Psychological Methods at the University of Amsterdam. You can find more information at https://fdabl.github.io/.