Would you like to collect data quick and efficiently? Would you like to have a sample that generalizes beyond western, educated, industrialized, rich and democratic participants? While you acknowledge social media as a powerful means to distribute your studies, you feel that there must be a “better way”? Then this practical introduction to crowdsourcing is exactly what you need. I will show you how to use Crowdflower, a crowdsourcing platform to attract participants from all over the world to take part in your experiments. However, before we get too excited, let’s quickly go through the relevant terminology.
What is Crowdsourcing?
The term “crowdsourcing” is a composite of the words “crowd” and “outsourcing”, first mentioned in a WIRED article in 2006 (Jeff Howe, 2006). The Merriam-Webster dictionary defines it as
“The practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers.”
Following the definition, we already engage in crowdsourcing when posting experiments on social media platforms, but this approach has several major drawbacks:
- limited and biased pool of participants
- it can take some time to reach the targeted sample size
- repeated subjects, multiple submits
- no incentive structure
- no pre-screening
- no longitudinal studies possible
- excessively annoying
The problems listed above already hint at their solution: a platform or service that allows researchers to specify what needs to be specified (i.e. participation criteria) and which provides the necessary infrastructure to reward participants. The first platform which did exactly this was Amazon Mechanical Turk. It was created in 2005 and sparked the interest of researchers around 2010. To give a good overview of its mechanic and purpose, let me extensively quote Paolacci, Chandler, and Ipeirotis (2010). They have put it splendidly:
“Mechanical Turk is a crowdsourcing web service that coordinates the supply and the demand of tasks that require human intelligence to complete. Mechanical Turk is named after an 18th century chess playing “automaton” that was in fact operated by a concealed person. It is an online labor market where employees (called workers) are recruited by employers (called requesters) for the execution of tasks (called HITs, acronym for Human Intelligence Tasks) in exchange for a wage (called a reward). Both workers and requesters are anonymous although responses by a unique worker can be linked through an ID provided by Amazon. Requesters post HITs that are visible only to workers who meet predefined criteria (e.g., country of residence or accuracy in previously completed tasks). When workers access the website, they find a list of tasks sortable according to various criteria, including size of the reward and maximum time allotted for the completion. Workers can read brief descriptions and see previews of the tasks before accepting to work on them.”
– Paolacci et al.(2010, pp. 411–412)
Unfortunately for us, Amazon MTurk is only available to American citizens. However, there exists another platform called Crowdflower, which basically provides the same service[1], while also allowing requests from outside the US.
Let’s do it!
Note on financials
Since you have to pay the workers to participate in the experiment, you need to add funds to your account (credit card only). Making matters worse, Crowdflower requires you to add at least 10$ to your account. If you don’t want to do this, you can still follow the steps below, except for actually posting the experiment. Once you have a real research question and the funds available, you can always revisit this tutorial at a later point in time.
The experiment
Our goal in this tutorial will be to replicate a classic finding in psychology, the conjunction fallacy (Tversky & Kahneman, 1983). Although Crowdflower provides us with a lot of templates for our jobs, we will create an external questionnaire and link to it. This method provides the most flexibility, since you are basically free to choose whatever tool you want to build your experiment with(e.g. LimeSurvey, Qualtrics, or programmed interactive experiments). How do we know that a participant has completed the task? You must add a completion code to the end of the experiment which then can be copy and pasted into Crowdflower – but more on that later. Before we proceed, look at the problem I crowdsource yourself and be sure you know the right answer.
Connecting Crowdflower with the experiment
After creating your experiment, visit Crowdflower and create a new “requesters” account. The free trial ends when you have used 5000 data records. One data record equals one experiment. After that, go to your account page and add some funds (10$ at least). Now we need to create a “Job” which will be our experiment. Click on “Create Job”. You can browse all templates that Crowdflower provides. However, to be flexible, we will “Start from Scratch” with a “Survey Job”. We will have to provide a title, instructions and a question. Since the Graphical User Interface is limited, click on the button “Switch to CLM-Editor” to use a text based interface[2]. Fill in the title and the instructions. Don’t forget to insert a Hyperlink to your external questionnaire! In the CML section, we need to add the code validation question:
<cml:text label=”Text” data-validates-regex=”^7E93$” validates=”required regex”/>
For clarification, see the image below. (click to enlarge)
Of course, if you have used a different code in your questionnaire, you need to change the line accordingly. As you might have noticed, you can see all the steps you need to take before you can post the task on the left panel. Under “Preview” you can test if the code validation works. This is a crucial step, since we want our participants to be compensated for the trouble they go through!
We finished designing our job. Now we have to “manage quality”. Under “Contributors” on the panel you can pre-screen participants. For example, you can filter participants based on the country they live in, which is especially interesting for cross-country studies. Additionally, if your test material is in a language other than English, you can restrict your sample to participants who live in the specific countries. Since most research is done using Amazon MTurk, which is exclusively American, you might want to target only the USA so as to increase comparability of data.
Workers can be ranked according to their accuracy, i.e. how often they finish a job they have started. It is suggested to only target workers with the highest accuracy (Performance Level 3). In the “Behavior Setting”, set the maximal judgments per contributor to 1 in order to avoid multiple submissions. Under “Job settings” on the left side, you can set the payment under “Tasks” and the number of participants under “Judgments”. As you might have noticed, there are many more advanced settings. However, these are not important with respect to our experiment.
Results
85% of 142 undergraduates gave the wrong answer to the Linda-problem in the Transparent Test, i.e. when only two options are available (Tversky & Kahneman, 1983). Let’s see how our crowdsourcing population does! I have set the number of participants to 30 and the payment to 0.10\$. In choosing the payment, I try to be as ethical as my wallet permits me to be, striving for an hourly wage of at least 6\$, depending on the task. While the average pay is about 2.30\$, 6\$ is still quite low. The ethics of crowdsourcing is an important topic, and the reader is referred to Fort, Adda, and Cohen (2011) and Williamson (2014) for more information. Let’s launch now!
Alas! 9 minutes and 40 seconds later we have our data. Only 5 people gave the right answer, which means that about 83% were subject to the conjunction fallacy[3].
Isn’t this too good to be true?
It is exciting, that’s what it is. Importantly, you can use it in your own studies without worrying too much about using methods outside of mainstream research. It is more or less becoming the standard in social psychology (if the experiment is not too involved, of course). More domains of psychology are going down this road as well. In fact, a recent study replicated classic experiments in cognitive psychology (Crump, McDonnell, & Gureckis, 2013). This is important because these experiments demand millisecond timing, high precision and environmental control.
In software engineering there is this law – Atwood’s Law – which states that any application that can be written in JavaScript, will eventually be written in JavaScript. Let a new law be heard! Any experiment that can be conducted on MTurk (or a similar platform), will also be conducted on MTurk (or a similar platform). Why? Because, similarly as the Web as a platform is superior to the Desktop for a large area of applications, MTurk is superior to the lab for a large set of experimental paradigms.
What about Donders?
We have seen that simple questionnaire studies are easy to implement and crowdsource. What about reaction time critical experiments? Do they work online?
Yes – with some precautionary additions. Among others, the most extensive study showing this is Crump et al. (2013). They replicated a wide variety of cognitive psychology paradigmas (e.g. Stroop, Flanker, Inhibition of Return), arguing for “empirical validity”. If known effects can be replicated online, this would render the platform suitable for similar experiments. In a recent paper, Reimers and Stewart (2014) review the precision of online experiments that require millisecond timing. The most pertinent problem is the hardware of the users. In within-subject designs, this is not a problem. For correlational and longitudinal designs, however, these differences in the hardware can bias the results. If you want to run these kinds of studies, I suggest you read both papers cited above.
Running interactive experiments requires substantial knowledge of JavaScript programming as well as knowledge of server-side technologies (a server-side programming language, databases etc.).
Consider crowdsourcing for your own research!
Before concluding, let me emphasize the following: Almost every questionnaire that you usually post on Facebook can also be conducted using the above outlined approach, which does not only ensure a diverse pool of participants, but also tremendously speeds up your data collection. As a side note, data from online studies are generally high in quality and more diverse than data from traditional lab experiments (Buhrmester, Kwang, & Gosling, 2011; Gosling, Sandy, John, & Potter, 2010; Gosling, Vazire, Srivastava, & John, 2004). Additionally – and with some programming experience – you can run complex experimental procedures as well. Although the method is a little harder, it makes your research better, faster and stronger.
Some final thoughts
Crowdsourcing your research on platforms like Amazon MTurk or Crowdflower has a great deal of advantages, among them the possibility to efficiently collect data from a large and diverse sample. Importantly, this method also allows for much bigger samples, offering a way to mitigate the problem of notoriously low power of studies in psychology (Fraley & Vazire, 2014)[4].
In addition to ethical (how much should I pay the “workers”?) and technical (how do I program an experiment for the web?) challenges, there is a whole new set of issues that need to be addressed (Chandler, Mueller, & Paolacci, 2014; for a short review, see Paolacci & Chandler, 2014). Whatever comes next, crowdsourcing will be a useful tool to have in our pockets.
Suggested Readings:
Paolacci, G., & Chandler, J. (2014). Inside the Turk: Understanding Mechanical Turk as a Participant Pool. Current Directions in Psychological Science, 23(3), 184-188.
Crump, M. J., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PloS One, 8(3), e57410.
Suggested Links:
http://experimentalturk.wordpress.com/resources/
References
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk a new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1), 3–5.
Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46(1), 112–130.
Crump, M. J., McDonnell, J. V., & Gureckis, T. M. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PloS One, 8(3), e57410.
Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics, 37(2), 413–420.
Fraley, R. C., & Vazire, S. (2014). The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power. PloS One, 9(10), e109019.
Gosling, S. D., Sandy, C. J., John, O. P., & Potter, J. (2010). Wired but not WEIRD: The promise of the Internet in reaching more diverse samples. Behavioral and Brain Sciences, 33(2-3), 94–95.
Gosling, S. D., Vazire, S., Srivastava, S., & John, O. P. (2004). Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. American Psychologist, 59(2), 93.
Jeff Howe. (2006, June). The Rise of Crowdsourcing. http://www.wired.com. Retrieved from http://archive.wired.com/wired/archive/14.06/crowds.html
Paolacci, G., & Chandler, J. (2014). Inside the Turk: Understanding Mechanical Turk as a Participant Pool. Current Directions in Psychological Science, 23(3), 184–188.
Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on amazon mechanical turk. Judgment and Decision Making, 5(5), 411–419.
Reimers, S., & Stewart, N. (2014). Presentation and response timing accuracy in Adobe Flash and HTML5/JavaScript Web experiments. Behavior Research Methods, 1–19.
Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 90(4), 293.
Williamson, V. (2014). On the Ethics of Crowd-sourced Research. Retrieved from http://scholar.harvard.edu/files/williamson/files/mturk_ps_081014.pdf
[1] However, studies assessing data quality have all been conducted using MTurk. One used to be able to target the MTurk population via Crowdflower, but this is not possible anymore. This might be a problem – at least it makes me feel a little uneasy.