Category Archives: Research Methodology

Not solely about that Bayes: Interview with Prof. Eric-Jan Wagenmakers

Last summer saw the publication of the most important work in psychology in decades: the Reproducibility Project (Open Science Collaboration, 2015; see here and here for context). It stirred up the community, resulting in many constructive discussions but also in verbally violent disagreement. What unites all parties, however, is the call for more transparency and openness in research.

Eric-Jan “EJ” Wagenmakers has argued for pre-registration of research (Wagenmakers et al., 2012; see also here) and direct replications (e.g., Boekel et al., 2015; Wagenmakers et al., 2015), for a clearer demarcation of exploratory and confirmatory research (de Groot, 1954/2013), and for a change in the way we analyze our data (Wagenmakers et al., 2011; Wagenmakers et al., in press).

Concerning the latter point, EJ is a staunch advocate of Bayesian statistics. With his many collaborators, he writes the clearest and wittiest exposures to the topic (e.g., Wagenmakers et al., 2016; Wagenmakers et al., 2010). Crucially, he is also a key player in opening Bayesian inference up to social and behavioral scientists more generally; in fact, the software JASP is EJ’s brainchild (see also our previous interview).

EJ

In sum, psychology is changing rapidly, both in how researchers communicate and do science, but increasingly also in how they analyze their data. This makes it nearly impossible for university curricula to keep up; courses in psychology are often years, if not decades, behind. Statistics classes in particular are usually boringly cookbook oriented and often fraught with misconceptions (Wagenmakers, 2014). At the University of Amsterdam, Wagenmakers succeeds in doing differently. He has previously taught a class called “Good Science, Bad Science”, discussing novel developments in methodology as well as supervising students in preparing and conducting direct replications of recent research findings (cf. Frank & Saxe, 2012).

Now, at the end of the day, testing undirected hypotheses using p values or Bayes factors only gets you so far – even if you preregister the heck out of it. To move the field forward, we need formal models that instantiate theories and make precise quantitative predictions. Together with Michael Lee, Eric-Jan Wagenmakers has written an amazing practical cognitive modeling book, harnessing the power of computational Bayesian methods to estimate arbitrarily complex models (for an overview, see Lee, submitted). More recently, he has co-edited a book on model-based cognitive neuroscience on how formal models can help bridge the gap between brain measurements and cognitive processes (Forstmann & Wagenmakers, 2015).

Long-term readers of the JEPS bulletin will note that topics ranging from openness of research, pre-registration and replication, and research methodology and Bayesian statistics are recurring themes. It has thus been only a matter of time for us to interview Eric-Jan Wagenmakers and ask him questions concerning all areas above. In addition, we ask: how does he stay so immensely productive? What tips does he have for students interested in an academic career; and what can instructors learn from “Good Science, Bad Science”? Enjoy the ride!


Bobby Fischer, the famous chess player, once said that he does not believe in psychology. You actually switched from playing chess to pursuing a career in psychology; tell us how this came about. Was it a good move?

It was an excellent move, but I have to be painfully honest: I simply did not have the talent and the predisposition to make a living out of playing chess. Several of my close friends did have that talent and went on to become international grandmasters; they play chess professionally. But I was actually lucky. For players outside of the world top-50, professional chess is a career trap. The pay is poor, the work insanely competitive, and the life is lonely. And society has little appreciation for professional chess players. In terms of creativity, hard work, and intellectual effort, an international chess grandmaster easily outdoes the average tenured professor. People who do not play chess themselves do not realize this.

Your list of publications gets updated so frequently, it should have its own RSS feed! How do you grow and cultivate such an impressive network of collaborators? Do you have specific tips for early career researchers?

At the start of my career I did not publish much. For instance, when I finished my four years of grad studies I think I had two papers. My current publication rate is higher, and part of that is due to an increase in expertise. It is just easier to write papers when you know (or think you know) what you’re talking about. But the current productivity is mainly due to the quality of my collaborators. First, at the psychology department of the University of Amsterdam we have a fantastic research master program. Many of my graduate students come from this program, having been tried and tested in the lab as RAs. When you have, say, four excellent graduate students, and each publishes one article a year, that obviously helps productivity. Second, the field of Mathematical Psychology has several exceptional researchers that I have somehow managed to collaborate with. In the early stages I was a graduate student with Jeroen Raaijmakers, and this made it easy to start work with Rich Shiffrin and Roger Ratcliff. So I was privileged and I took the opportunities that were given. But I also work hard, of course.

There is a lot of advice that I could give to early career researchers but I will have to keep it short. First, in order to excel in whatever area of life, commitment is key. What this usually means is that you have to enjoy what you are doing. Your drive and your enthusiasm will act as a magnet for collaborators. Second, you have to take initiative. So read broadly, follow the latest articles (I remain up to date through Twitter and Google Scholar), get involved with scientific organizations, coordinate a colloquium series, set up a reading group, offer your advisor to review papers with him/her, attend summer schools, etc. For example, when I started my career I had seen a new book on memory and asked the editor of Acta Psychologica whether I could review it for them. Another example is Erik-Jan van Kesteren, an undergraduate student from a different university who had attended one of my talks about JASP. He later approached me and asked whether he could help out with JASP. He is now a valuable member of the JASP team. Third, it helps if you are methodologically strong. When you are methodologically strong –in statistics, mathematics, or programming– you have something concrete to offer in a collaboration.

Considering all projects you are involved in, JASP is probably the one that will have most impact on psychology, or the social and behavioral sciences in general. How did it all start?

In 2005 I had a conversation with Mark Steyvers. I had just shown him a first draft of a paper that summarized the statistical drawbacks of p-values. Mark told me “it is not enough to critique p-values. You should also offer a concrete alternative”. I agreed and added a section about BIC (the Bayesian Information Criterion). However, the BIC is only a rough approximation to the Bayesian hypothesis test. Later I became convinced that social scientists will only use Bayesian tests when these are readily available in a user-friendly software package. About 5 years ago I submitted an ERC grant proposal “Bayes or Bust! Sensible hypothesis tests for social scientists” that contained the development of JASP (or “Bayesian SPSS” as I called it in the proposal) as a core activity. I received the grant and then we were on our way.

I should acknowledge that much of the Bayesian computations in JASP depend on the R BayesFactor package developed by Richard Morey and Jeff Rouder. I should also emphasize the contribution by JASPs first software engineer, Jonathon Love, who suggested that JASP ought to feature classical statistics as well. In the end we agreed that by including classical statistics, JASP could act as a Trojan horse and boost the adoption of Bayesian procedures. So the project started as “Bayesian SPSS”, but the scope was quickly broadened to include p-values.

JASP is already game-changing software, but it is under continuous development and improvement. More concretely, what do you plan to add in the near future? What do you hope to achieve in the long-term?

In terms of the software, we will shortly include several standard procedures that are still missing, such as logistic regression and chi-square tests. We also want to upgrade the popular Bayesian procedures we have already implemented, and we are going to create new modules. Before too long we hope to offer a variable views menu and a data-editing facility. When all this is done it would be great if we could make it easier for other researchers to add their own modules to JASP.

One of my tasks in the next years is to write a JASP manual and JASP books. In the long run, the goal is to have JASP be financially independent of government grants and university support. I am grateful for the support that the psychology department at the University of Amsterdam offers now, and for the support they will continue to offer in the future. However, the aim of JASP is to conquer the world, and this requires that we continue to develop the program “at break-neck speed”. We will soon be exploring alternative sources of funding. JASP will remain free and open-source, of course.

You are a leading advocate of Bayesian statistics. What do researchers gain by changing the way they analyze their data?

They gain intellectual hygiene, and a coherent answer to questions that makes scientific sense. A more elaborate answer is outlined in a paper that is currently submitted to a special issue for Psychonomic Bulletin & Review: https://osf.io/m6bi8/ (Part I).

The Reproducibility Project used different metrics to quantify the success of a replication – none of them really satisfactory. How can a Bayesian perspective help illuminate the “crisis of replication”?

As a theory of knowledge updating, Bayesian statistics is ideally suited to address questions of replication. However, the question “did the effect replicate?” is underspecified. Are the effect sizes comparable? Does the replication provide independent support for the presence of the effect? Does the replication provide support for the position of the proponents versus the skeptics? All these questions are slightly different, but each receives the appropriate answer within the Bayesian framework. Together with Josine Verhagen, I have explored a method –the replication Bayes factor– in which the prior distribution for the replication test is the posterior distribution obtained from the original experiment (e.g., Verhagen & Wagenmakers, 2014). We have applied this intuitive procedure to a series of recent experiments, including the multi-lab Registered Replication Report of Fritz Strack’s Facial Feedback hypothesis. In Strack’s original experiment, participants who held a pen with their teeth (causing a smile) judged cartoons to be funnier than participants who held a pen with their lips (causing a pout). I am not allowed to tell you the result of this massive replication effort, but the paper will be out soon.

You have recently co-edited a book on model-based cognitive neuroscience. What is the main idea here, and what developments in this area are most exciting to you?

The main idea is that much of experimental psychology, mathematical psychology, and the neurosciences pursue a common goal: to learn more about human cognition. So ultimately the interest is in latent constructs such as intelligence, confidence, memory strength, inhibition, and attention. The models that have been developed in mathematical psychology are able to link these latent constructs to specific model parameters. These parameters may in turn be estimated by behavioral data, by neural data, or by both data sets jointly. Brandon Turner is one of the early career mathematical psychologists who has made great progress in this area. So the mathematical models are a vehicle to achieve an integration of data from different sources. Moreover, insights from neuroscience can provide important constraints that help inform mathematical modeling. The relation is therefore mutually beneficial. This is summarized in the following paper: http://www.ejwagenmakers.com/2011/ForstmannEtAl2011TICS.pdf

One thing that distinguishes science from sophistry is replication; yet it is not standard practice. In “Good Science, Bad Science”, you had students prepare a registered replication plan. What was your experience teaching this class? What did you learn from the students?

This was a great class to teach. The students were highly motivated and oftentimes it felt more like lab-meeting than like a class. The idea was to develop four Registered Report submissions. Some time has passed, but the students and I still intend to submit the proposals for publication.

The most important lesson this class has taught me is that our research master students want to learn relevant skills and conduct real research. In the next semester I will teach a related course, “Good Research Practices”, and I hope to attain the same high levels of student involvement. For the new course, I plan to have students read a classic methods paper that identifies a fallacy; next the students will conduct a literature search to assess the current prevalence of the fallacy. I have done several similar projects, but never with master students (e.g., http://www.ejwagenmakers.com/2011/NieuwenhuisEtAl2011.pdf and http://link.springer.com/article/10.3758/s13423-015-0913-5).

What tips and tricks can you share with instructors planning to teach a similar class?

The first tip is to set your aims high. For a research master class, the goal should be publication. Of course this may not always be realized, but it should be the goal. It helps if you can involve colleagues or graduate students. If you set your aims high, the students know that you take them seriously, and that their work matters. The second tip is to arrange the teaching so that the students do most of the work. The students need to develop a sense of ownership about their projects, and they need to learn. This will not happen if you treat the students as passive receptacles. I am reminded of a course that I took as an undergraduate. In this course I had to read chapters, deliver presentations, and prepare questions. It was one of the most enjoyable and inspiring courses I had ever taken, and it took me decades to realize that the professor who taught the course actually did not have to do much at all.

Many scholarly discussions these days take place on social media and blogs. You’ve joined twitter yourself over a year ago. How do you navigate the social media jungle, and what resources can you recommend to our readers?

I am completely addicted to Twitter, but I also feel it makes me a better scientist. When you are new to Twitter, I recommend that you start by following a few people that have interesting things to say. Coming from a Bayesian perspective, I recommend Alexander Etz (@AlxEtz) and Richard Morey (@richarddmorey). And of course it is essential to follow JASP (@JASPStats). As is the case for all social media, the most valuable resource you have is the “mute” option. Prevent yourself from being swamped by holiday pictures and exercise it ruthlessly.

Facebooktwitterrss

Replicability and Registered Reports

Last summer saw the publication of a monumental piece of work: the reproducibility project (Open Science Collaboration, 2015). In a huge community effort, over 250 researchers directly replicated 100 experiments initially conducted in 2008. Only 39% of the replications were significant at the 5% level. Average effect size estimates were halved. The study design itself—conducting direct replications on a large scale—as well as its outcome are game-changing to the way we view our discipline, but students might wonder: what game were we playing before, and how did we get here?

In this blog post, I provide a selective account of what has been dubbed the “reproducibility crisis”, discussing its potential causes and possible remedies. Concretely, I will argue that adopting Registered Reports, a new publishing format recently also implemented in JEPS (King et al., 2016; see also here), increases scientific rigor, transparency, and thus replicability of research. Wherever possible, I have linked to additional resources and further reading, which should help you contextualize current developments within psychological science and the social and behavioral sciences more general.

How did we get here?

In 2005, Ioannidis made an intriguing argument. Because the prior probability of any hypothesis being true is low, researchers continuously running low powered experiments, and as the current publishing system is biased toward significant results, most published research findings are false. Within this context, spectacular fraud cases like Diederik Stapel (see here) and the publication of a curious paper about people “feeling the future” (Bem, 2011) made 2011 a “year of horrors” (Wagenmakers, 2012), and toppled psychology into a “crisis of confidence” (Pashler & Wagenmakers, 2012). As argued below, Stapel and Bem are emblematic of two highly interconnected problems of scientific research in general.

Publication bias

Stapel, who faked results of more than 55 papers, is the reductio ad absurdum of the current “publish or perish” culture[1]. Still, the gold standard to merit publication, certainly in a high impact journal, is p < .05, which results in publication bias (Sterling, 1959) and file-drawers full of nonsignificant results (Rosenthal, 1979; see Lane et al., 2016, for a brave opening; and #BringOutYerNulls). This leads to a biased view of nature, distorting any conclusion we draw from the published literature. In combination with low-powered studies (Cohen, 1962; Button et al., 2013; Fraley & Vazire; 2014), effect size estimates are seriously inflated and can easily point in the wrong direction (Yarkoni, 2009; Gelman & Carlin, 2014). A curious consequence is what Lehrer has titled “the truth wears off” (Lehrer, 2010). Initially high estimates of effect size attenuate over time, until nothing is left of them. Just recently, Kaplan and Lirvin reported that the proportion of positive effects in large clinical trials shrank from 57% before 2000 to 8% after 2000 (Kaplan & Lirvin, 2015). Even a powerful tool like meta-analysis cannot clear the view of a landscape filled with inflated and biased results (van Elk et al., 2015). For example, while meta-analyses concluded that there is a strong effect of ego-depletion of Cohen’s d=.63, recent replications failed to find an effect (Lurquin et al., 2016; Sripada et al., in press)[2].

Garden of forking paths

In 2011, Daryl Bem reported nine experiments on people being able to “feel to future” in the Journal of Social and Personality Psychology, the flagship journal of its field (Bem, 2011). Eight of them yielded statistical significance, p < .05. We could dismissively say that extraordinary claims require extraordinary evidence, and try to sail away as quickly as possible from this research area, but Bem would be quick to steal our thunder.

A recent meta-analysis of 90 experiments on precognition yielded overwhelming evidence in favor of an effect (Bem et al., 2015). Alan Turing, discussing research on psi related phenomena, famously stated that

“These disturbing phenomena seem to deny all our usual scientific ideas. How we should like to discredit them! Unfortunately, the statistical evidence, at least of telepathy, is overwhelming.” (Turing, 1950, p. 453; cf. Wagenmakers et al., 2015)

How is this possible? It’s simple: Not all evidence is created equal. Research on psi provides us with a mirror of “questionable research practices” (John, Loewenstein, & Prelec, 2012) and researchers’ degrees of freedom (Simmons, Nelson, & Simonsohn, 2011), obscuring the evidential value of individual experiments as well as whole research areas[3]. However, it would be foolish to dismiss this as being a unique property of obscure research areas like psi. The problem is much more subtle.

The main issue is that there is a one-to-many mapping from scientific to statistical hypotheses[4]. When doing research, there are many parameters one must set; for example, should observations be excluded? Which control variables should be measured? How to code participants’ responses? What dependent variables should be analyzed? By varying only a small number of these, Simmons et al. (2011) found that the nominal false positive rate of 5% skyrocketed to over 60%. They conclude that the “increased flexibility allows researchers to present anything as significant.” These issues are elevated by providing insufficient methodological detail in research articles, by a low percentage of researchers sharing their data (Wicherts et al., 2006; Wicherts, Bakker, & Molenaar, 2011), and in fields that require complicated preprocessing steps like neuroimaging (Carp, 2012; Cohen, 2016; Luck and Gaspelin, in press).

An important amendment is that researchers need not be aware of this flexibility; a p value might be misleading even when there is no “p-hacking”, and the hypothesis was posited ahead of time (i.e. was not changed after the fact—HARKing; Kerr, 1992). When decisions are contingent on the data are made in an environment in which different data would lead to different decisions, even when these decisions “just make sense,” there is a hidden multiple comparison problem lurking (Gelman & Loken, 2014). Usually, when conducting N statistical tests, we control for the number of tests in order to keep the false positive rate at, say, 5%. However, in the aforementioned setting, it is not clear what N should be exactly. Thus, results of statistical tests lose their meaning and carry little evidential value in such exploratory settings; they only do so in confirmatory settings (de Groot, 1954/2014; Wagenmakers et al., 2012). This distinction is at the heart of the problem, and gets obscured because many results in the literature are reported as confirmatory, when in fact they may very well be exploratory—most frequently, because of the way scientific reporting is currently done, there is no way for us to tell the difference.

To get a feeling for the many choices possible in statistical analysis, consider a recent paper in which data analysis was crowdsourced from 29 teams (Silberzahn et al., submitted). The question posited to them was whether dark-skinned soccer players are red-carded more frequently. The estimated effect size across teams ranged from .83 to 2.93 (odds ratios). Nineteen different analysis strategies were used in total, with 21 unique combinations of covariates; 69% found a significant relationship, while 31% did not.

A reanalysis of Berkowitz et al. (2016) by Michael Frank (2016; blog here) is another, more subtle example. Berkowitz and colleagues report a randomized controlled trial, claiming that solving short numerical problems increase children’s math achievement across the school year. The intervention was well designed and well conducted, but still, Frank found that, as he put it, “the results differ by analytic strategy, suggesting the importance of preregistration.”

Frequently, the issue is with measurement. Malte Elson—whose twitter is highly germane to our topic—has created a daunting website that lists how researchers use the Competitive Reaction Time Task (CRTT), one of the most commonly used tools to measure aggressive behavior. It states that there are 120 publications using the CRTT, which in total analyze the data in 147 different ways!

This increased awareness of researchers’ degrees of freedom and the garden of forking paths is mostly a product of this century, although some authors have expressed this much earlier (e.g., de Groot, 1954/2014; Meehl, 1985; see also Gelman’s comments here). The next point considers an issue much older (e.g., Berkson, 1938), but which nonetheless bears repeating.

Statistical inference

In psychology and much of the social and behavioral sciences in general, researchers overly rely on null hypothesis significance testing and p values to draw inferences from data. However, the statistical community has long known that p values overestimate the evidence against H0 (Berger & Delampady, 1987; Wagenmakers, 2007; Nuzzo, 2014). Just recently, the American Statistical Association released a statement drawing attention to this fact (Wasserstein & Lazar, 2016); that is, in addition to it being easy to obtain p < .05 (Simmons, Nelson, & Simonsohn, 2011), it is also quite a weak standard of evidence overall.

The last point is quite pertinent because the statement that 39% of replications in the reproducibility project were “successful” is misleading. A recent Bayesian reanalysis concluded that the original studies themselves found weak evidence in support of an effect (Etz & Vandekerckhove, 2016), reinforcing all points I have made so far.

Notwithstanding the above, p < .05 is still the gold standard in psychology, and is so for intricate historical reasons (cf., Gigerenzer, 1993). At JEPS, we certainly do not want to echo calls nor actions to ban p values (Trafimow & Marks, 2015), but we urge students and their instructors to bring more nuance to their use (cf., Gigerenzer, 2004).

Procedures based on classical statistics provide different answers from what most researchers and students expect (Oakes, 1986; Haller & Krauss; 2002; Hoekstra et al., 2014). To be sure, p values have their place in model checking (e.g., Gelman, 2006—are the data consistent with the null hypothesis?), but they are poorly equipped to measure the relative evidence for H1 or H0 brought about by the data; for this, researchers need to use Bayesian inference (Wagenmakers et al., in press). Because university curricula often lag behind current developments, students reading this are encouraged to advance their methodological toolbox by browsing through Etz et al. (submitted) and playing with JASP[5].

Teaching the exciting history of statistics (cf. Gigerenzer et al., 1989; McGrayne, 2012), or at least contextualizing the developments of currently dominating statistical ideas, is a first step away from their cookbook oriented application.

Registered reports to the rescue

While we can only point to the latter, statistical issue, we can actually eradicate the issue of publication bias and the garden of forking paths by introducing a new publishing format called Registered Reports. This format was initially introduced to the journal Cortex by Chris Chambers (Chambers, 2013), and it is now offered by more than two dozen journals in the fields of psychology, neuroscience, psychiatry, and medicine (link). Recently, we have also introduced this publishing format at JEPS (see King et al., 2016).

Specifically, researchers submit a document including the introduction, theoretical motivation, experimental design, data preprocessing steps (e.g., outlier removal criteria), and the planned statistical analyses prior to data collection. Peer review only focuses on the merit of the proposed study and the adequacy of the statistical analyses[5]. If there is sufficient merit to the planned study, the authors are guaranteed in-principle acceptance (Nosek & Lakens, 2014). Upon receiving this acceptance, researchers subsequently carry out the experiment, and submit the final manuscript. Deviations from the first submissions must be discussed, and additional statistical analyses are labeled exploratory.

In sum, by publishing regardless of the outcome of the statistical analysis, registered reports eliminate publication bias; by specifying the hypotheses and analysis plan beforehand, they make apparent the distinction between exploratory and confirmatory studies (de Groot 1954/2014), avoid the garden of forking paths (Gelman & Loken, 2014), and guard against post-hoc theorizing (Kerr, 1998).

Even though registered reports are commonly associated with high power (80-95%), this is unfeasible for student research. However, note that a single study cannot be decisive in any case. Reporting sound, hypothesis-driven, not-cherry-picked research can be important fuel for future meta-analysis (for an example, see Scheibehenne, Jamil, & Wagenmakers, in press).

To avoid possible confusion, note that preregistration is different from Registered Reports: The former is the act of specifying the methodology before data collection, while the latter is a publishing format. You can preregister your study on several platforms such as the Open Science Framework or AsPredicted. Registered reports include preregistration but go further and have the additional benefits such as peer review prior to data collection and in-principle acceptance.

Conclusion

In sum, there are several issues impeding progress in psychological science, most pressingly the failure to distinguish between exploratory and confirmatory research, and publication bias. A new publishing format, Registered Reports, provides a powerful means to address them both, and, to borrow a phrase from Daniel Lakens, enable us to “sail away from the seas of chaos into a corridor of stability” (Lakens & Evers, 2014).

Suggested Readings

  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
  • Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632-638.
  • Gelman, A., & Loken, E. (2014). The Statistical Crisis in Science. American Scientist, 102(6), 460-465.
  • King, M., Dablander, F., Jakob, L., Agan, M., Huber, F., Haslbeck, J., & Brecht, K. (2016). Registered Reports for Student Research. Journal of European Psychology Students, 7(1), 20-23
  • Twitter (or you might miss out)

Footnotes

[1] Incidentally, Diederik Stapel published a book about his fraud. See here for more.

[2] Baumeister (2016) is a perfect example of how not to respond to such a result. Michael Inzlicht shows how to respond adequately here.

[3] For a discussion of these issues with respect to the precognition meta-analysis, see Lakens (2015) and Gelman (2014).

[4] Another related, crucial point is the lack of theory in psychology. However, as this depends on whether you read the Journal of Mathematical Psychology or, say, Psychological Science, it is not addressed further. For more on this point, see for example Meehl (1978), Gigerenzer (1998), and a class by Paul Meehl which has been kindly converted to mp3 by Uri Simonsohn.

[5] However, it would be premature to put too much blame on p. More pressingly, the misunderstandings and misuse of this little fellow point towards a catastrophic failure in undergraduate teaching of statistics and methods classes (for the latter, see Richard Morey’s recent blog post). Statistics classes in psychology are often boringly cookbook oriented, and so students just learn the cookbook. If you are an instructor, I urge you to have a look at “Statistical Rethinking” by Richard McElreath. In general, however, statistics is hard, and there are many issues transcending the frequentist versus Bayesian debate (for examples, see Judd, Westfall, and Kenny, 2012; Westfall & Yarkoni, 2016).

[6] Note that JEPS already publishes research regardless of whether p < .05. However, this does not discourage us from drawing attention to this benefit of Registered Reports, especially because most other journals have a different policy.

This post was edited by Altan Orhon.

Facebooktwitterrss

Structural equation modeling: What is it, what does it have in common with hippie music, and why does it eat cake to get rid of measurement error?

Do you want a statistics tool that is powerful; easy to learn; allows you to model complex data structures; combines the test, analysis of variance,  and multiple regression; and puts even more on top? Here it is! Statistics courses in psychology today often cover structural equation modeling (SEM), a statistical tool that allows one to go beyond classical statistical models by combining them and adding more. Let’s explore what this means, what SEM really is, and SEM’s surprising parallels with the hippie culture! Continue reading

Facebooktwitterrss

Introducing JASP: A free and intuitive statistics software that might finally replace SPSS

Are you tired of SPSS’s confusing menus and of the ugly tables it generates? Are you annoyed by having statistical software only at university computers? Would you like to use advanced techniques such as Bayesian statistics, but you lack the time to learn a programming language (like R or Python) because you prefer to focus on your research?

While there was no real solution to this problem for a long time, there is now good news for you! A group of researchers at the University of Amsterdam are developing JASP, a free open-source statistics package that includes both standard and more advanced techniques and puts major emphasis on providing an intuitive user interface.

The current version already supports a large array of analyses, including the ones typically used by researchers in the field of psychology (e.g. ANOVA, t-tests, multiple regression).

In addition to being open source, freely available for all platforms, and providing a considerable number of analyses, JASP also comes with several neat, distinctive features, such as real-time computation and display of all results. For example, if you decide that you want not only the mean but also the median in the table, you can tick “Median” to have the medians appear immediately in the results table. For comparison, think how this works in SPSS: First, you must navigate a forest of menus (or edit the syntax), then, you execute the new syntax. A new window appears and you get a new (ugly) table.

JASP_screenshoot_2

In JASP, you get better-looking tables in no time. Click here to see a short demonstration of this feature. But it gets even better—the tables are already in APA format and you can copy and paste them into Word. Sounds too good to be true, doesn’t it? It does, but it works!

Interview with lead developer Jonathon Love

Where is this software project coming from? Who pays for all of this? And what plans are there for the future? There is nobody who could answer these questions better than the lead developer of JASP, Jonathon Love, who was so kind as to answer a few questions about JASP.
J_love

How did development on JASP start? How did you get involved in the project?

All through my undergraduate program, we used SPSS, and it struck me just how suboptimal it was. As a software designer, I find poorly designed software somewhat distressing to use, and so SPSS was something of a thorn in my mind for four years. I was always thinking things like, “Oh, what? I have to completely re-run the analysis, because I forgot X?,” “Why can’t I just click on the output to see what options were used?,” “Why do I have to read this awful syntax?,” or “Why have they done this like this? Surely they should do this like that!”

At the same time, I was working for Andrew Heathcote, writing software for analyzing response time data. We were using the R programming language and so I was exposed to this vast trove of statistical packages that R provides. On one hand, as a programmer, I was excited to gain access to all these statistical techniques. On the other hand, as someone who wants to empower as many people as possible, I was disappointed by the difficulty of using R and by the very limited options to provide a good user interface with it.

So I saw that there was a real need for both of these things—software providing an attractive, free, and open statistics package to replace SPSS, and a platform for methodologists to publish their analyses with rich, accessible user interfaces. However, the project was far too ambitious to consider without funding, and so I couldn’t see any way to do it.

Then I met E.J. Wagenmakers, who had just received a European Research Council grant to develop an SPSS-like software package to provide Bayesian methods, and he offered me the position to develop it. I didn’t know a lot about Bayesian methods at the time, but I did see that our goals had a lot of overlap.

So I said, “Of course, we would have to implement classical statistics as well,” and E.J.’s immediate response was, “Nooooooooooo!” But he quickly saw how significant this would be. If we can liberate the underlying platform that scientists use, then scientists (including ourselves) can provide whatever analyses we like.

And so that was how the JASP project was born, and how the three goals came together:

  • to provide a liberated (free and open) alternative to SPSS
  • to provide Bayesian analyses in an accessible way
  • to provide a universal platform for publishing analyses with accessible user interfaces

 

What are the biggest challenges for you as a lead developer of JASP?

Remaining focused. There are hundreds of goals, and hundreds of features that we want to implement, but we must prioritize ruthlessly. When will we implement factor analysis? When will we finish the SEM module? When will data entry, editing, and restructuring arrive? Outlier exclusion? Computing of variables? These are all such excellent, necessary features; it can be really hard to decide what should come next. Sometimes it can feel a bit overwhelming too. There’s so much to do! I have to keep reminding myself how much progress we’re making.

Maintaining a consistent user experience is a big deal too. The JASP team is really large, to give you an idea, in addition to myself there’s:

  • Ravi Selker, developing the frequentist analyses
  • Maarten Marsman, developing the Bayesian ANOVAs and Bayesian linear regression
  • Tahira Jamil, developing the classical and Bayesian contingency tables
  • Damian Dropmann, developing the file save, load functionality, and the annotation system
  • Alexander Ly, developing the Bayesian correlation
  • Quentin Gronau, developing the Bayesian plots and the classical linear regression
  • Dora Matzke, developing the help system
  • Patrick Knight, developing the SPSS importer
  • Eric-Jan Wagenmakers, coming up with new Bayesian techniques and visualizations

With such a large team, developing the software and all the analyses in a consistent and coherent way can be really challenging. It’s so easy for analyses to end up a mess of features, and for every subsequent analysis we add to look nothing like the last. Of course, providing as elegant and consistent a user-experience is one of our highest priorities, so we put a lot of effort into this.

 

How do you imagine JASP five years from now?

JASP will provide the same, silky, sexy user experience that it does now. However, by then it will have full data entering, editing, cleaning, and restructuring facilities. It will provide all the common analyses used through undergraduate and postgraduate psychology programs. It will provide comprehensive help documentation, an abundance of examples, and a number of online courses. There will be textbooks available. It will have a growing community of methodologists publishing the analyses they are developing as additional JASP modules, and applied researchers will have access to the latest cutting-edge analyses in a way that they can understand and master. More students will like statistics than ever before.

 

How can JASP stay up to date with state-of-the-art statistical methods? Even when borrowing implementations written in R and the like, these always have to be implemented by you in JASP. Is there a solution to this problem?

Well, if SPSS has taught us anything, you really don’t need to stay up to date to be a successful statistical product, ha-ha! The plan is to provide tools for methodologists to write add-on modules for JASP—tools for creating user interfaces and tools to connect these user interfaces to their underlying analyses. Once an add-on module is developed, it can appear in a directory, or a sort of “App Store,” and people will be able to rate the software for different things: stability, user-friendliness, attractiveness of output, and so forth. In this way, we hope to incentivize a good user experience as much as possible.

Some people think this will never work—that methodologists will never put in all that effort to create nice, useable software (because it does take substantial effort). But I think that once methodologists grasp the importance of making their work accessible to as wide an audience as possible, it will become a priority for them. For example, consider the following scenario: Alice provides a certain analysis with a nice user interface. Bob develops an analysis that is much better than Alice’s analysis, but everyone uses Alice’s, because hers is so easy and convenient to use. Bob is upset because everyone uses Alice’s instead of his. Bob then realizes that he has to provide a nice, accessible user experience for people to use his analysis.

I hope that we can create an arms race in which methodologists will strive to provide as good a user experience as possible. If you develop a new method and nobody can use it, have you really developed a new method? Of course, this sort of add-on facility isn’t ready yet, but I don’t think it will be too far away.

 

You mention on your website that many more methods will be included, such as structural equation modeling (SEM) or tools for data manipulation. How can you both offer a large amount of features without cluttering the user interface in the future?

Currently, JASP uses a ribbon arrangement; we have a “File” tab for file operations, and we have a “Common” tab that provides common analyses. As we add more analyses (and as other people begin providing additional modules), these will be provided as additional tabs. The user will be able to toggle on or off which tabs they are interested in. You can see this in the current version of JASP: we have a proof-of-concept SEM module that you can toggle on or off on the options page. JASP thus provides you only with what you actually need, and the user interface can be kept as simple as you like.

 

Students who are considering switching to JASP might want to know whether the future of JASP development is secured or dependent on getting new grants. What can you tell us about this?

JASP is currently funded by a European Research Council (ERC) grant, and we’ve also received some support from the Centre for Open Science. Additionally, the University of Amsterdam has committed to providing us a software developer on an ongoing basis, and we’ve just run our first annual Bayesian Statistics in JASP workshop. The money we charge for these workshops is plowed straight back into JASP’s development.

We’re also developing a number of additional strategies to increase the funding that the JASP project receives. Firstly, we’re planning to provide technical support to universities and businesses that make use of JASP, for a fee. Additionally, we’re thinking of simply asking universities to contribute the cost of a single SPSS license to the JASP project. It would represent an excellent investment; it would allow us to accelerate development, achieve feature parity with SPSS sooner, and allow universities to abandon SPSS and its costs sooner. So I don’t worry about securing JASP’s future, I’m thinking about how we can expand JASP’s future.

Of course, all of this depends on people actually using JASP, and that will come down to the extent that the scientific community decides to use and get behind the JASP project. Indeed, the easiest way that people can support the JASP project is by simply using and citing it. The more users and the more citations we have, the easier it is for us to obtain funding.

Having said all that, I’m less worried about JASP’s future development than I’m worried about SPSS’s! There’s almost no evidence that any development work is being done on it at all! Perhaps we should pass the hat around for IBM.

 

What is the best way to get started with JASP? Are there tutorials and reproducible examples?

For classical statistics, if you’ve used SPSS, or if you have a book on statistics in SPSS, I don’t think you’ll have any difficulty using JASP. It’s designed to be familiar to users of SPSS, and our experience is that most people have no difficulty moving from SPSS to JASP. We also have a video on our website that demonstrates some basic analyses, and we’re planning to create a whole series of these.

As for the Bayesian statistics, that’s a little more challenging. Most of our effort has been going in to getting the software ready, so we don’t have as many resources for learning Bayesian statistics ready as we would like. This is something we’ll be looking at addressing in the next six to twelve months. E.J. has at least one (maybe three) books planned.

That said, there are a number of resources available now, such as:

  • Alexander Etz’s blog
  • E.J.’s website provides a number of papers on Bayesian statistics (his website also serves as a reminder of what the internet looked like in the ’80s)
  • Zoltan Dienes book is a great for Bayesian statistics as well

However, the best way to learn Bayesian statistics is to come to one of our Bayesian Statistics with JASP workshops. We’ve run two so far and they’ve been very well received. Some people have been reluctant to attend—because JASP is so easy to use, they didn’t see the point of coming and learning it. Of course, that’s the whole point! JASP is so easy to use, you don’t need to learn the software, and you can completely concentrate on learning the Bayesian concepts. So keep an eye out on the JASP website for the next workshop. Bayes is only going to get more important in the future. Don’t be left behind!

 

Facebooktwitterrss

Bayesian Statistics: Why and How

bayes_hot_scaled

Bayesian statistics is what all the cool kids are talking about these days. Upon closer inspection, this does not come as a surprise. In contrast to classical statistics, Bayesian inference is principled, coherent, unbiased, and addresses an important question in science: in which of my hypothesis should I believe in, and how strongly, given the collected data?  Continue reading

Facebooktwitterrss

Of Elephants and Effect Sizes – Interview with Geoff Cumming

We all know these crucial moments while analysing our hard-earned data – the moment of truth – is there a star above the small p? Maybe even two? Can you write a nice and simple paper or do you have to bend your back to explain why people do not, surprisingly, behave the way you thought they would? It all depends on those little stars, below or above .05, significant or not, black or white. Continue reading

Facebooktwitterrss

A Psychologist’s Guide to Reading a Neuroimaging Paper

Psychological research is benefiting from advances in neuroimaging techniques. This has been achieved through the validation and falsification of established hypothesis in psychological science (Cacioppo, Berntson, & Nusbaum, 2008). It has also helped nurture links with neuroscience, leading to more comprehensive explanations of established theories. Positron Emission Tomography (PET), functional MRI (fMRI), structural MRI (sMRI), electroencephalography (EEG), diffusion tensor imaging (DTI) and numerous other lesser-known neuroimaging techniques can provide information complimentary to behavioural data (Wager, 2006). With these modalities of research becoming more prevalent, ranging from investigating the neural effects of mindfulness training to neuro-degeneration, it is worth taking a moment to highlight some points to help discern what may be good or poor research. Like any other methodology, neuroimaging is a great tool that can be used poorly. As with all areas of science, one must exercise a good degree of caution when reading neuroimaging papers. Continue reading

Facebooktwitterrss

Bayesian Statistics: What is it and Why do we Need it?

prlipohellThere is a revolution in statistics happening: The Bayesian revolution. Psychology students who are interested in research methods (which I hope everyone is!) should know what this revolution is about. Gaining this knowledge now instead of later might spare you lots of misconceptions about statistics as it is usually instructed in psychology, and it might help you gain a deeper understanding of the foundations of statistics. To make sure that you can try out everything you learn immediately, I conducted analysis in the free statistics software R (www.r-project.org; click HERE for a tutorial how to get started with R, and install RStudio for an enhanced R-experience) and I provide the syntax for the analysis directly in the article so you can easily try them out. So let’s jump in: What is “Bayesian Statistics”, and why do we need it? Continue reading

Facebooktwitterrss

Ethics – The Science of Morals, Rules and Behaviour

 

ethicsEthical boards are in place to evaluate the ethical feasibility of a study by weighing the possible negative effects against the possible positive effects of the research project (Barret, 2006). When designing your research project, it could be that you need to apply for ethical approval. This is a challenging task as there are strict guidelines to abide to when drawing up a proposal. This is where your supervisor can help – with their experience, they have a clear idea of what would be accepted for someone applying for ethical approval for an undergraduate or master study. There is a great importance to abiding by ethics, in research and in practice.The importance lies in the fact that care is taken for the participant, researcher and wider society. It creates a filter for good standard of research with as minimal harm being done as possible. Continue reading

Facebooktwitterrss

Bias in Conducting Research: Guidelines for Young Researchers Regarding Gender Differences

Every scientific discipline is determined by the object of measurement and the selection of appropriate methods of data collection and statistical analysis. Faulty methodology can lead to incorrect information in the results, without the researcher being aware of this. Taking incorrect knowledge as correct into account while conducting further research has far-reaching negative consequences. One of these errors present, to some degree, in every single research is bias. It is a particularly dangerous one, because it usually goes undetected by the researcher. But if you are aware of its threat there are ways to avoid it.  In research, it occurs when systematic error is introduced into sampling or testing by selecting or encouraging one outcome or answer over others. It comes in numerous ways and forms. The rest of this post will focus on causes of bias in the field of gender studies.

Continue reading

Facebooktwitterrss