# Crowdsource your research with style

Would you like to collect data quick and efficiently? Would you like to have a sample that generalizes beyond western, educated, industrialized, rich and democratic participants? While you acknowledge social media as a powerful means to distribute your studies, you feel that there must be a “better way”? Then this practical introduction to crowdsourcing is exactly what you need. I will show you how to use Crowdflower, a crowdsourcing platform to attract participants from all over the world to take part in your experiments. However, before we get too excited, let’s quickly go through the relevant terminology.

### What is Crowdsourcing?

The term “crowdsourcing” is a composite of the words “crowd” and “outsourcing”, first mentioned in a WIRED article in 2006 (Jeff Howe, 2006). The Merriam-Webster dictionary defines it as

“The practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers.”

Following the definition, we already engage in crowdsourcing when posting experiments on social media platforms, but this approach has several major drawbacks:

• limited and biased pool of participants
• it can take some time to reach the targeted sample size
• repeated subjects, multiple submits
• no incentive structure
• no pre-screening
• no longitudinal studies possible
• excessively annoying

The problems listed above already hint at their solution: a platform or service that allows researchers to specify what needs to be specified (i.e. participation criteria) and which provides the necessary infrastructure to reward participants. The first platform which did exactly this was Amazon Mechanical Turk. It was created in 2005 and sparked the interest of researchers around 2010. To give a good overview of its mechanic and purpose, let me extensively quote Paolacci, Chandler, and Ipeirotis (2010). They have put it splendidly:

“Mechanical Turk is a crowdsourcing web service that coordinates the supply and the demand of tasks that require human intelligence to complete. Mechanical Turk is named after an 18th century chess playing “automaton” that was in fact operated by a concealed person. It is an online labor market where employees (called workers) are recruited by employers (called requesters) for the execution of tasks (called HITs, acronym for Human Intelligence Tasks) in exchange for a wage (called a reward). Both workers and requesters are anonymous although responses by a unique worker can be linked through an ID provided by Amazon. Requesters post HITs that are visible only to workers who meet predefined criteria (e.g., country of residence or accuracy in previously completed tasks). When workers access the website, they find a list of tasks sortable according to various criteria, including size of the reward and maximum time allotted for the completion. Workers can read brief descriptions and see previews of the tasks before accepting to work on them.”
– Paolacci et al.(2010, pp. 411–412)

Unfortunately for us, Amazon MTurk is only available to American citizens. However, there exists another platform called Crowdflower, which basically provides the same service[1], while also allowing requests from outside the US.

<cml:text label=”Text” data-validates-regex=”^7E93$” validates=”required regex”/> For clarification, see the image below. (click to enlarge) Of course, if you have used a different code in your questionnaire, you need to change the line accordingly. As you might have noticed, you can see all the steps you need to take before you can post the task on the left panel. Under “Preview” you can test if the code validation works. This is a crucial step, since we want our participants to be compensated for the trouble they go through! We finished designing our job. Now we have to “manage quality”. Under “Contributors” on the panel you can pre-screen participants. For example, you can filter participants based on the country they live in, which is especially interesting for cross-country studies. Additionally, if your test material is in a language other than English, you can restrict your sample to participants who live in the specific countries. Since most research is done using Amazon MTurk, which is exclusively American, you might want to target only the USA so as to increase comparability of data. Workers can be ranked according to their accuracy, i.e. how often they finish a job they have started. It is suggested to only target workers with the highest accuracy (Performance Level 3). In the “Behavior Setting”, set the maximal judgments per contributor to 1 in order to avoid multiple submissions. Under “Job settings” on the left side, you can set the payment under “Tasks” and the number of participants under “Judgments”. As you might have noticed, there are many more advanced settings. However, these are not important with respect to our experiment. ### Results 85% of 142 undergraduates gave the wrong answer to the Linda-problem in the Transparent Test, i.e. when only two options are available (Tversky & Kahneman, 1983). Let’s see how our crowdsourcing population does! I have set the number of participants to 30 and the payment to 0.10\$. In choosing the payment, I try to be as ethical as my wallet permits me to be, striving for an hourly wage of at least 6\$, depending on the task. While the average pay is about 2.30\$, 6\\$ is still quite low. The ethics of crowdsourcing is an important topic, and the reader is referred to Fort, Adda, and Cohen (2011) and Williamson (2014) for more information. Let’s launch now!

Alas! 9 minutes and 40 seconds later we have our data. Only 5 people gave the right answer, which means that about 83% were subject to the conjunction fallacy[3].

### Isn’t this too good to be true?

It is exciting, that’s what it is. Importantly, you can use it in your own studies without worrying too much about using methods outside of mainstream research. It is more or less becoming the standard in social psychology (if the experiment is not too involved, of course). More domains of psychology are going down this road as well. In fact, a recent study replicated classic experiments in cognitive psychology (Crump, McDonnell, & Gureckis, 2013). This is important because these experiments demand millisecond timing, high precision and environmental control.

In software engineering there is this law – Atwood’s Law – which states that any application that can be written in JavaScript, will eventually be written in JavaScript. Let a new law be heard! Any experiment that can be conducted on MTurk (or a similar platform), will also be conducted on MTurk (or a similar platform). Why? Because, similarly as the Web as a platform is superior to the Desktop for a large area of applications, MTurk is superior to the lab for a large set of experimental paradigms.

We have seen that simple questionnaire studies are easy to implement and crowdsource. What about reaction time critical experiments? Do they work online?

Yes – with some precautionary additions. Among others, the most extensive study showing this is Crump et al. (2013). They replicated a wide variety of cognitive psychology paradigmas (e.g. Stroop, Flanker, Inhibition of Return), arguing for “empirical validity”. If known effects can be replicated online, this would render the platform suitable for similar experiments. In a recent paper, Reimers and Stewart (2014) review the precision of online experiments that require millisecond timing. The most pertinent problem is the hardware of the users. In within-subject designs, this is not a problem. For correlational and longitudinal designs, however, these differences in the hardware can bias the results. If you want to run these kinds of studies, I suggest you read both papers cited above.

Running interactive experiments requires substantial knowledge of JavaScript programming as well as knowledge of server-side technologies (a server-side programming language, databases etc.).

### Consider crowdsourcing for your own research!

Before concluding, let me emphasize the following: Almost every questionnaire that you usually post on Facebook can also be conducted using the above outlined approach, which does not only ensure a diverse pool of participants, but also tremendously speeds up your data collection. As a side note, data from online studies are generally high in quality and more diverse than data from traditional lab experiments (Buhrmester, Kwang, & Gosling, 2011; Gosling, Sandy, John, & Potter, 2010; Gosling, Vazire, Srivastava, & John, 2004). Additionally – and with some programming experience – you can run complex experimental procedures as well. Although the method is a little harder, it makes your research better, faster and stronger.

### Some final thoughts

Crowdsourcing your research on platforms like Amazon MTurk or Crowdflower has a great deal of advantages, among them the possibility to efficiently collect data from a large and diverse sample. Importantly, this method also allows for much bigger samples, offering a way to mitigate the problem of notoriously low power of studies in psychology (Fraley & Vazire, 2014)[4].

In addition to ethical (how much should I pay the “workers”?) and technical (how do I program an experiment for the web?) challenges, there is a whole new set of issues that need to be addressed (Chandler, Mueller, & Paolacci, 2014; for a short review, see Paolacci & Chandler, 2014). Whatever comes next, crowdsourcing will be a useful tool to have in our pockets.

Paolacci, G., & Chandler, J. (2014). Inside the Turk: Understanding Mechanical Turk as a Participant Pool. Current Directions in Psychological Science, 23(3), 184-188.

Crump, M. J., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PloS One, 8(3), e57410.

http://experimentalturk.wordpress.com/resources/

References

Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk a new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1), 3–5.

Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46(1), 112–130.

Crump, M. J., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PloS One, 8(3), e57410.

Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics, 37(2), 413–420.

Fraley, R. C., & Vazire, S. (2014). The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power. PloS One, 9(10), e109019.

Gosling, S. D., Sandy, C. J., John, O. P., & Potter, J. (2010). Wired but not WEIRD: The promise of the Internet in reaching more diverse samples. Behavioral and Brain Sciences, 33(2-3), 94–95.

Gosling, S. D., Vazire, S., Srivastava, S., & John, O. P. (2004). Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. American Psychologist, 59(2), 93.

Jeff Howe. (2006, June). The Rise of Crowdsourcing. http://www.wired.com. Retrieved from http://archive.wired.com/wired/archive/14.06/crowds.html

Paolacci, G., & Chandler, J. (2014). Inside the Turk: Understanding Mechanical Turk as a Participant Pool. Current Directions in Psychological Science, 23(3), 184–188.

Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on amazon mechanical turk. Judgment and Decision Making, 5(5), 411–419.

Reimers, S., & Stewart, N. (2014). Presentation and response timing accuracy in Adobe Flash and HTML5/JavaScript Web experiments. Behavior Research Methods, 1–19.

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90(4), 293.

Williamson, V. (2014). On the Ethics of Crowd-sourced Research. Retrieved from http://scholar.harvard.edu/files/williamson/files/mturk_ps_081014.pdf

[1]     However, studies assessing data quality have all been conducted using MTurk. One used to be able to target the MTurk population via Crowdflower, but this is not possible anymore. This might be a problem – at least it makes me feel a little uneasy.

[2]     CML is the Crowdflower Markup Language, a dialect of HTML.

[3]     I am sure you’re dying to know the demographics of the population. Ha! Follow the tutorial and run a simple survey yourself ;)

[4]     For more and important information about this, see Simonsohn’s talk at SPSP 2014 and this blog post.

### Fabian Dablander

Fabian Dablander just finished his Masters in Cognitive Science at the University of Tübingen. He is interested in innovative ways of data collection, Bayesian statistics, and open science. You can find him on Twitter @fdabl.