Blog Contributors

Ivan Flis

Author image for Ivan Flis    			    			Ivan Flis is a PhD student in History and Philosophy of Science at the Descartes Centre, Utrecht University. His research focuses on quantitative methodology in psychology, its history and application, and its relation to theory construction in psychological research. He has been an editor of JEPS for three years in the previous mandates.

Hide this bio

What Do Whigs Have To Do With History of Psychology? on Thursday, January 30th, 2014

The state of Open Access in Europe – Horizon 2020 on Monday, October 1st, 2012

The State of Open Access in Europe – Finch Report on Wednesday, August 1st, 2012

Self-archiving and psychology journals on Sunday, June 10th, 2012

Research oriented social networking? on Thursday, May 10th, 2012

Podcast with Nick Shockey: Open Access and psychology students on Tuesday, May 1st, 2012

The implications of bite-size science on Tuesday, March 20th, 2012

ACTA, SOPA, PIPA, RWA and what do they have to do with psychologists on Monday, February 20th, 2012

What happens to studies that accept the null hypothesis? on Sunday, January 1st, 2012

Who publishes the most reputable journals in psychology? on Thursday, November 10th, 2011

Martin Vasilev

Author image for Martin Vasilev    			    			Martin Vasilev is an Editor in JEPS. He is a final year undergraduate student of Psychology at the University of Sofia, Bulgaria, and the author of some of the most popular posts at the JEPS Bulletin (see for example, his post on writing literature reviews, which was reprinted in the MBA Edge, a magazine for Malaysian prospective postgraduate students).

Hide this bio

What Are The Most Common APA Style Mistakes Done By Students? on Tuesday, January 15th, 2013

How to write a good title for journal articles on Saturday, September 1st, 2012

APA Style: Abbreviations on Saturday, March 10th, 2012

How to critically evaluate internet-based sources? on Saturday, August 20th, 2011

How to write a good literature review article? on Wednesday, July 20th, 2011

Zorana Zupan

Author image for Zorana Zupan    			    			
    		University of Belgrade, Serbia and Montenegro
Graduated from University of Belgrade, Dipl. in Psychology
Msc Research in Psychology
Research interests: Developmental Psychology, Developmental psychopathology, Cognitive Psychology

Hide this bio

Perfect references in no time: An introduction to free referencing software on Wednesday, February 1st, 2012

How to critically evaluate the quality of a research article? on Monday, August 1st, 2011

Deirdre Walsh

Deirdre Walsh is a doctoral student in Counselling Psychology at Trinity College, Dublin. She is a recent MSc Clinical Child Psychology graduate of Anglia Ruskin University, United Kingdom. She has various experiences conducting psychology studies using qualitative research methods. She aspires to be a counselling psychologist, and hopes to apply the knowledge gained from her research experiences into practice.

Hide this bio

The journey towards discovering people: Why I love qualitative research on Thursday, September 20th, 2012

Zoey Hudson

Zoey Hudson graduated from Anglia Ruskin University with a BSc (Hons) Psychology in 2012. She is currently working with Dr. Richard Piech researching on reciprocity in the trust. When Zoey is not at work, she is volunteering for a charity, helping them to find means of funding. Rationality is a particular interest of Zoey’s and one which she would like to pursue further research in. As applying for financial funding is necessary to conduct research, Zoey hopes that her work and volunteering experiences will be useful in her career in psychology research.

Hide this bio

Life is a box of chocolates on Saturday, October 20th, 2012

Maris Vainre

Managing and organising literature on Friday, April 20th, 2012

Tips for effective literature search on Tuesday, April 10th, 2012

APA style: How to format the references list? on Friday, January 20th, 2012

How to format headings in APA Style? on Tuesday, December 20th, 2011

Common mistakes made in APA style on Sunday, November 20th, 2011

What makes a presentation good? on Saturday, October 1st, 2011

Tomatoes against procrastination on Saturday, September 10th, 2011

Lost in translating? on Wednesday, August 10th, 2011

Can you find an article in 5 sec? The world of DOIs on Friday, June 10th, 2011

How to make (scientific) texts sound professional? on Wednesday, June 1st, 2011

Chris Noone

Author image for Chris Noone    			    			Chris Noone is a PhD student at the School of Psychology at the National University of Ireland, Galway. His research focuses on the effects of mood on higher-order cognition. He is the Member Representative Coordinator on the Board of Management of EFPSA.

Hide this bio

Student Action for Open Access on Wednesday, October 30th, 2013

Replication Studies: It’s Time to Clean Up Your Act, Psychologists! on Wednesday, January 30th, 2013

The state of Open Access in Europe – Right to Research Coalition on Monday, August 20th, 2012

Yee Row Liew

Author image for Yee Row Liew    			    			Yee Row Liew is an Editor of the JEPS Bulletin, who has a wide research background and experience that range from plant genetics to psychology. Having completed her postgraduate study just recently in Psychological Research Methods from Anglia Ruskin University, United Kingdom, she is now working as a research assistant at the Global Sustainability Institute. She hopes to gain further knowledge in the study of emotion, cognition, and motivation, in pursuit of her love for scientific research.

Hide this bio

Confessions of a Research Blog Editor on Monday, April 15th, 2013

Say again?: Scientific writing and publishing in non-English speaking countries on Sunday, December 30th, 2012

What makes a good research question? on Monday, September 10th, 2012

Sina Scherer

Author image for Sina Scherer    			    			As being part of EFPSA's JEPS team, Sina Scherer works as JEPS Bulletin's editor and is currently enrolled in the last year of her Master programme in Work and Organizational Psychology at the Westfälische Wilhelmsuniversität Münster. Her fields of interest cover the areas of Intercultural Psychology, Personality and Organizational Psychology such as Health Psychology.

Hide this bio

How to Collect Data Effectively? An Overview of the Best Online Survey Providers on Friday, November 15th, 2013

The structure of an APA research paper on Thursday, November 15th, 2012

The transformation of science on Friday, August 10th, 2012

Bias in psychology: Bring in all significant results on Friday, June 1st, 2012

Scaring European developments threaten Open Access on Sunday, April 1st, 2012

A revolution in scientific publishing? on Friday, February 10th, 2012

Journals in Psychology on Tuesday, January 10th, 2012

How to search for literature? on Saturday, December 10th, 2011

Lessons from a published fake study on Tuesday, November 1st, 2011

Written by the hands of a ghost on Tuesday, September 20th, 2011

Pedro Almeida

Author image for Pedro Almeida    			    			Pedro Almeida is a graduate student at the University of Coimbra, Portugal. His main research interests are group development and intergroup relations. He is an Editor and Webmaster for the Journal of European Psychology Students (JEPS).

Hide this bio

The Best of JEPS Bulletin in 2013 on Thursday, December 26th, 2013

Looking for New Contributors on Monday, September 23rd, 2013

Why We Publish: The Past, Present, and Future of Science Communication on Tuesday, April 30th, 2013

The origins of APA style (and why there are so many rules) on Tuesday, July 10th, 2012

Peter Edelsbrunner

Author image for Peter Edelsbrunner    			    			Peter Edelsbrunner is a PhD student at the Institute for Behavioural Sciences at the ETH Zurich. He completed his Master's degree in Psychology at the University of Graz. He is interested in conceptual change, reasoning processes, and strutural equation modelling. With his strong methodological background, he hopes to combine both cognitive theory and psychometrics in his future research pursuits.

Hide this bio

Structural equation modeling: What is it, what does it have in common with hippie music, and why does it eat cake to get rid of measurement error? on Monday, December 14th, 2015

Bayesian Statistics: What is it and Why do we Need it? on Monday, November 17th, 2014

Advice for the Next Generation of Researchers in Psychology from an Experienced Editor on Friday, November 30th, 2012

Research as an international project on Thursday, December 1st, 2011

Julia Ouzia

Author image for Julia Ouzia    			    			Julia Ouzia is a German national who has lived in the United Kingdom for over seven years. Since then she has completed a Bachelor's degree in Psychology and a Master's degree in Clinical Child Psychology. Julia is currently interested in bilingual learning and cognition doing a PhD in Brain and Cognition at Anglia Ruskin University. She has also been part of the Executive Board and the Board of Management of EFPSA.

Hide this bio

Keep calm and be creative: Use mixed methods! on Wednesday, October 10th, 2012

How to be an academic rock star via poster presentation on Friday, July 20th, 2012

In the shoes of a peer-reviewer on Thursday, March 1st, 2012

Not solely about that Bayes: Interview with Prof. Eric-Jan Wagenmakers

Last summer saw the publication of the most important work in psychology in decades: the Reproducibility Project (Open Science Collaboration, 2015; see here and here for context). It stirred up the community, resulting in many constructive discussions but also in verbally violent disagreement. What unites all parties, however, is the call for more transparency and openness in research.

Eric-Jan “EJ” Wagenmakers has argued for pre-registration of research (Wagenmakers et al., 2012; see also here) and direct replications (e.g., Boekel et al., 2015; Wagenmakers et al., 2015), for a clearer demarcation of exploratory and confirmatory research (de Groot, 1954/2013), and for a change in the way we analyze our data (Wagenmakers et al., 2011; Wagenmakers et al., in press).

Concerning the latter point, EJ is a staunch advocate of Bayesian statistics. With his many collaborators, he writes the clearest and wittiest exposures to the topic (e.g., Wagenmakers et al., 2016; Wagenmakers et al., 2010). Crucially, he is also a key player in opening Bayesian inference up to social and behavioral scientists more generally; in fact, the software JASP is EJ’s brainchild (see also our previous interview).

EJ

In sum, psychology is changing rapidly, both in how researchers communicate and do science, but increasingly also in how they analyze their data. This makes it nearly impossible for university curricula to keep up; courses in psychology are often years, if not decades, behind. Statistics classes in particular are usually boringly cookbook oriented and often fraught with misconceptions (Wagenmakers, 2014). At the University of Amsterdam, Wagenmakers succeeds in doing differently. He has previously taught a class called “Good Science, Bad Science”, discussing novel developments in methodology as well as supervising students in preparing and conducting direct replications of recent research findings (cf. Frank & Saxe, 2012).

Now, at the end of the day, testing undirected hypotheses using p values or Bayes factors only gets you so far – even if you preregister the heck out of it. To move the field forward, we need formal models that instantiate theories and make precise quantitative predictions. Together with Michael Lee, Eric-Jan Wagenmakers has written an amazing practical cognitive modeling book, harnessing the power of computational Bayesian methods to estimate arbitrarily complex models (for an overview, see Lee, submitted). More recently, he has co-edited a book on model-based cognitive neuroscience on how formal models can help bridge the gap between brain measurements and cognitive processes (Forstmann & Wagenmakers, 2015).

Long-term readers of the JEPS bulletin will note that topics ranging from openness of research, pre-registration and replication, and research methodology and Bayesian statistics are recurring themes. It has thus been only a matter of time for us to interview Eric-Jan Wagenmakers and ask him questions concerning all areas above. In addition, we ask: how does he stay so immensely productive? What tips does he have for students interested in an academic career; and what can instructors learn from “Good Science, Bad Science”? Enjoy the ride!


Bobby Fischer, the famous chess player, once said that he does not believe in psychology. You actually switched from playing chess to pursuing a career in psychology; tell us how this came about. Was it a good move?

It was an excellent move, but I have to be painfully honest: I simply did not have the talent and the predisposition to make a living out of playing chess. Several of my close friends did have that talent and went on to become international grandmasters; they play chess professionally. But I was actually lucky. For players outside of the world top-50, professional chess is a career trap. The pay is poor, the work insanely competitive, and the life is lonely. And society has little appreciation for professional chess players. In terms of creativity, hard work, and intellectual effort, an international chess grandmaster easily outdoes the average tenured professor. People who do not play chess themselves do not realize this.

Your list of publications gets updated so frequently, it should have its own RSS feed! How do you grow and cultivate such an impressive network of collaborators? Do you have specific tips for early career researchers?

At the start of my career I did not publish much. For instance, when I finished my four years of grad studies I think I had two papers. My current publication rate is higher, and part of that is due to an increase in expertise. It is just easier to write papers when you know (or think you know) what you’re talking about. But the current productivity is mainly due to the quality of my collaborators. First, at the psychology department of the University of Amsterdam we have a fantastic research master program. Many of my graduate students come from this program, having been tried and tested in the lab as RAs. When you have, say, four excellent graduate students, and each publishes one article a year, that obviously helps productivity. Second, the field of Mathematical Psychology has several exceptional researchers that I have somehow managed to collaborate with. In the early stages I was a graduate student with Jeroen Raaijmakers, and this made it easy to start work with Rich Shiffrin and Roger Ratcliff. So I was privileged and I took the opportunities that were given. But I also work hard, of course.

There is a lot of advice that I could give to early career researchers but I will have to keep it short. First, in order to excel in whatever area of life, commitment is key. What this usually means is that you have to enjoy what you are doing. Your drive and your enthusiasm will act as a magnet for collaborators. Second, you have to take initiative. So read broadly, follow the latest articles (I remain up to date through Twitter and Google Scholar), get involved with scientific organizations, coordinate a colloquium series, set up a reading group, offer your advisor to review papers with him/her, attend summer schools, etc. For example, when I started my career I had seen a new book on memory and asked the editor of Acta Psychologica whether I could review it for them. Another example is Erik-Jan van Kesteren, an undergraduate student from a different university who had attended one of my talks about JASP. He later approached me and asked whether he could help out with JASP. He is now a valuable member of the JASP team. Third, it helps if you are methodologically strong. When you are methodologically strong –in statistics, mathematics, or programming– you have something concrete to offer in a collaboration.

Considering all projects you are involved in, JASP is probably the one that will have most impact on psychology, or the social and behavioral sciences in general. How did it all start?

In 2005 I had a conversation with Mark Steyvers. I had just shown him a first draft of a paper that summarized the statistical drawbacks of p-values. Mark told me “it is not enough to critique p-values. You should also offer a concrete alternative”. I agreed and added a section about BIC (the Bayesian Information Criterion). However, the BIC is only a rough approximation to the Bayesian hypothesis test. Later I became convinced that social scientists will only use Bayesian tests when these are readily available in a user-friendly software package. About 5 years ago I submitted an ERC grant proposal “Bayes or Bust! Sensible hypothesis tests for social scientists” that contained the development of JASP (or “Bayesian SPSS” as I called it in the proposal) as a core activity. I received the grant and then we were on our way.

I should acknowledge that much of the Bayesian computations in JASP depend on the R BayesFactor package developed by Richard Morey and Jeff Rouder. I should also emphasize the contribution by JASPs first software engineer, Jonathon Love, who suggested that JASP ought to feature classical statistics as well. In the end we agreed that by including classical statistics, JASP could act as a Trojan horse and boost the adoption of Bayesian procedures. So the project started as “Bayesian SPSS”, but the scope was quickly broadened to include p-values.

JASP is already game-changing software, but it is under continuous development and improvement. More concretely, what do you plan to add in the near future? What do you hope to achieve in the long-term?

In terms of the software, we will shortly include several standard procedures that are still missing, such as logistic regression and chi-square tests. We also want to upgrade the popular Bayesian procedures we have already implemented, and we are going to create new modules. Before too long we hope to offer a variable views menu and a data-editing facility. When all this is done it would be great if we could make it easier for other researchers to add their own modules to JASP.

One of my tasks in the next years is to write a JASP manual and JASP books. In the long run, the goal is to have JASP be financially independent of government grants and university support. I am grateful for the support that the psychology department at the University of Amsterdam offers now, and for the support they will continue to offer in the future. However, the aim of JASP is to conquer the world, and this requires that we continue to develop the program “at break-neck speed”. We will soon be exploring alternative sources of funding. JASP will remain free and open-source, of course.

You are a leading advocate of Bayesian statistics. What do researchers gain by changing the way they analyze their data?

They gain intellectual hygiene, and a coherent answer to questions that makes scientific sense. A more elaborate answer is outlined in a paper that is currently submitted to a special issue for Psychonomic Bulletin & Review: https://osf.io/m6bi8/ (Part I).

The Reproducibility Project used different metrics to quantify the success of a replication – none of them really satisfactory. How can a Bayesian perspective help illuminate the “crisis of replication”?

As a theory of knowledge updating, Bayesian statistics is ideally suited to address questions of replication. However, the question “did the effect replicate?” is underspecified. Are the effect sizes comparable? Does the replication provide independent support for the presence of the effect? Does the replication provide support for the position of the proponents versus the skeptics? All these questions are slightly different, but each receives the appropriate answer within the Bayesian framework. Together with Josine Verhagen, I have explored a method –the replication Bayes factor– in which the prior distribution for the replication test is the posterior distribution obtained from the original experiment (e.g., Verhagen & Wagenmakers, 2014). We have applied this intuitive procedure to a series of recent experiments, including the multi-lab Registered Replication Report of Fritz Strack’s Facial Feedback hypothesis. In Strack’s original experiment, participants who held a pen with their teeth (causing a smile) judged cartoons to be funnier than participants who held a pen with their lips (causing a pout). I am not allowed to tell you the result of this massive replication effort, but the paper will be out soon.

You have recently co-edited a book on model-based cognitive neuroscience. What is the main idea here, and what developments in this area are most exciting to you?

The main idea is that much of experimental psychology, mathematical psychology, and the neurosciences pursue a common goal: to learn more about human cognition. So ultimately the interest is in latent constructs such as intelligence, confidence, memory strength, inhibition, and attention. The models that have been developed in mathematical psychology are able to link these latent constructs to specific model parameters. These parameters may in turn be estimated by behavioral data, by neural data, or by both data sets jointly. Brandon Turner is one of the early career mathematical psychologists who has made great progress in this area. So the mathematical models are a vehicle to achieve an integration of data from different sources. Moreover, insights from neuroscience can provide important constraints that help inform mathematical modeling. The relation is therefore mutually beneficial. This is summarized in the following paper: http://www.ejwagenmakers.com/2011/ForstmannEtAl2011TICS.pdf

One thing that distinguishes science from sophistry is replication; yet it is not standard practice. In “Good Science, Bad Science”, you had students prepare a registered replication plan. What was your experience teaching this class? What did you learn from the students?

This was a great class to teach. The students were highly motivated and oftentimes it felt more like lab-meeting than like a class. The idea was to develop four Registered Report submissions. Some time has passed, but the students and I still intend to submit the proposals for publication.

The most important lesson this class has taught me is that our research master students want to learn relevant skills and conduct real research. In the next semester I will teach a related course, “Good Research Practices”, and I hope to attain the same high levels of student involvement. For the new course, I plan to have students read a classic methods paper that identifies a fallacy; next the students will conduct a literature search to assess the current prevalence of the fallacy. I have done several similar projects, but never with master students (e.g., http://www.ejwagenmakers.com/2011/NieuwenhuisEtAl2011.pdf and http://link.springer.com/article/10.3758/s13423-015-0913-5).

What tips and tricks can you share with instructors planning to teach a similar class?

The first tip is to set your aims high. For a research master class, the goal should be publication. Of course this may not always be realized, but it should be the goal. It helps if you can involve colleagues or graduate students. If you set your aims high, the students know that you take them seriously, and that their work matters. The second tip is to arrange the teaching so that the students do most of the work. The students need to develop a sense of ownership about their projects, and they need to learn. This will not happen if you treat the students as passive receptacles. I am reminded of a course that I took as an undergraduate. In this course I had to read chapters, deliver presentations, and prepare questions. It was one of the most enjoyable and inspiring courses I had ever taken, and it took me decades to realize that the professor who taught the course actually did not have to do much at all.

Many scholarly discussions these days take place on social media and blogs. You’ve joined twitter yourself over a year ago. How do you navigate the social media jungle, and what resources can you recommend to our readers?

I am completely addicted to Twitter, but I also feel it makes me a better scientist. When you are new to Twitter, I recommend that you start by following a few people that have interesting things to say. Coming from a Bayesian perspective, I recommend Alexander Etz (@AlxEtz) and Richard Morey (@richarddmorey). And of course it is essential to follow JASP (@JASPStats). As is the case for all social media, the most valuable resource you have is the “mute” option. Prevent yourself from being swamped by holiday pictures and exercise it ruthlessly.

Facebooktwitterrss

Replicability and Registered Reports

Last summer saw the publication of a monumental piece of work: the reproducibility project (Open Science Collaboration, 2015). In a huge community effort, over 250 researchers directly replicated 100 experiments initially conducted in 2008. Only 39% of the replications were significant at the 5% level. Average effect size estimates were halved. The study design itself—conducting direct replications on a large scale—as well as its outcome are game-changing to the way we view our discipline, but students might wonder: what game were we playing before, and how did we get here?

In this blog post, I provide a selective account of what has been dubbed the “reproducibility crisis”, discussing its potential causes and possible remedies. Concretely, I will argue that adopting Registered Reports, a new publishing format recently also implemented in JEPS (King et al., 2016; see also here), increases scientific rigor, transparency, and thus replicability of research. Wherever possible, I have linked to additional resources and further reading, which should help you contextualize current developments within psychological science and the social and behavioral sciences more general.

How did we get here?

In 2005, Ioannidis made an intriguing argument. Because the prior probability of any hypothesis being true is low, researchers continuously running low powered experiments, and as the current publishing system is biased toward significant results, most published research findings are false. Within this context, spectacular fraud cases like Diederik Stapel (see here) and the publication of a curious paper about people “feeling the future” (Bem, 2011) made 2011 a “year of horrors” (Wagenmakers, 2012), and toppled psychology into a “crisis of confidence” (Pashler & Wagenmakers, 2012). As argued below, Stapel and Bem are emblematic of two highly interconnected problems of scientific research in general.

Publication bias

Stapel, who faked results of more than 55 papers, is the reductio ad absurdum of the current “publish or perish” culture[1]. Still, the gold standard to merit publication, certainly in a high impact journal, is p < .05, which results in publication bias (Sterling, 1959) and file-drawers full of nonsignificant results (Rosenthal, 1979; see Lane et al., 2016, for a brave opening; and #BringOutYerNulls). This leads to a biased view of nature, distorting any conclusion we draw from the published literature. In combination with low-powered studies (Cohen, 1962; Button et al., 2013; Fraley & Vazire; 2014), effect size estimates are seriously inflated and can easily point in the wrong direction (Yarkoni, 2009; Gelman & Carlin, 2014). A curious consequence is what Lehrer has titled “the truth wears off” (Lehrer, 2010). Initially high estimates of effect size attenuate over time, until nothing is left of them. Just recently, Kaplan and Lirvin reported that the proportion of positive effects in large clinical trials shrank from 57% before 2000 to 8% after 2000 (Kaplan & Lirvin, 2015). Even a powerful tool like meta-analysis cannot clear the view of a landscape filled with inflated and biased results (van Elk et al., 2015). For example, while meta-analyses concluded that there is a strong effect of ego-depletion of Cohen’s d=.63, recent replications failed to find an effect (Lurquin et al., 2016; Sripada et al., in press)[2].

Garden of forking paths

In 2011, Daryl Bem reported nine experiments on people being able to “feel to future” in the Journal of Social and Personality Psychology, the flagship journal of its field (Bem, 2011). Eight of them yielded statistical significance, p < .05. We could dismissively say that extraordinary claims require extraordinary evidence, and try to sail away as quickly as possible from this research area, but Bem would be quick to steal our thunder.

A recent meta-analysis of 90 experiments on precognition yielded overwhelming evidence in favor of an effect (Bem et al., 2015). Alan Turing, discussing research on psi related phenomena, famously stated that

“These disturbing phenomena seem to deny all our usual scientific ideas. How we should like to discredit them! Unfortunately, the statistical evidence, at least of telepathy, is overwhelming.” (Turing, 1950, p. 453; cf. Wagenmakers et al., 2015)

How is this possible? It’s simple: Not all evidence is created equal. Research on psi provides us with a mirror of “questionable research practices” (John, Loewenstein, & Prelec, 2012) and researchers’ degrees of freedom (Simmons, Nelson, & Simonsohn, 2011), obscuring the evidential value of individual experiments as well as whole research areas[3]. However, it would be foolish to dismiss this as being a unique property of obscure research areas like psi. The problem is much more subtle.

The main issue is that there is a one-to-many mapping from scientific to statistical hypotheses[4]. When doing research, there are many parameters one must set; for example, should observations be excluded? Which control variables should be measured? How to code participants’ responses? What dependent variables should be analyzed? By varying only a small number of these, Simmons et al. (2011) found that the nominal false positive rate of 5% skyrocketed to over 60%. They conclude that the “increased flexibility allows researchers to present anything as significant.” These issues are elevated by providing insufficient methodological detail in research articles, by a low percentage of researchers sharing their data (Wicherts et al., 2006; Wicherts, Bakker, & Molenaar, 2011), and in fields that require complicated preprocessing steps like neuroimaging (Carp, 2012; Cohen, 2016; Luck and Gaspelin, in press).

An important amendment is that researchers need not be aware of this flexibility; a p value might be misleading even when there is no “p-hacking”, and the hypothesis was posited ahead of time (i.e. was not changed after the fact—HARKing; Kerr, 1992). When decisions are contingent on the data are made in an environment in which different data would lead to different decisions, even when these decisions “just make sense,” there is a hidden multiple comparison problem lurking (Gelman & Loken, 2014). Usually, when conducting N statistical tests, we control for the number of tests in order to keep the false positive rate at, say, 5%. However, in the aforementioned setting, it is not clear what N should be exactly. Thus, results of statistical tests lose their meaning and carry little evidential value in such exploratory settings; they only do so in confirmatory settings (de Groot, 1954/2014; Wagenmakers et al., 2012). This distinction is at the heart of the problem, and gets obscured because many results in the literature are reported as confirmatory, when in fact they may very well be exploratory—most frequently, because of the way scientific reporting is currently done, there is no way for us to tell the difference.

To get a feeling for the many choices possible in statistical analysis, consider a recent paper in which data analysis was crowdsourced from 29 teams (Silberzahn et al., submitted). The question posited to them was whether dark-skinned soccer players are red-carded more frequently. The estimated effect size across teams ranged from .83 to 2.93 (odds ratios). Nineteen different analysis strategies were used in total, with 21 unique combinations of covariates; 69% found a significant relationship, while 31% did not.

A reanalysis of Berkowitz et al. (2016) by Michael Frank (2016; blog here) is another, more subtle example. Berkowitz and colleagues report a randomized controlled trial, claiming that solving short numerical problems increase children’s math achievement across the school year. The intervention was well designed and well conducted, but still, Frank found that, as he put it, “the results differ by analytic strategy, suggesting the importance of preregistration.”

Frequently, the issue is with measurement. Malte Elson—whose twitter is highly germane to our topic—has created a daunting website that lists how researchers use the Competitive Reaction Time Task (CRTT), one of the most commonly used tools to measure aggressive behavior. It states that there are 120 publications using the CRTT, which in total analyze the data in 147 different ways!

This increased awareness of researchers’ degrees of freedom and the garden of forking paths is mostly a product of this century, although some authors have expressed this much earlier (e.g., de Groot, 1954/2014; Meehl, 1985; see also Gelman’s comments here). The next point considers an issue much older (e.g., Berkson, 1938), but which nonetheless bears repeating.

Statistical inference

In psychology and much of the social and behavioral sciences in general, researchers overly rely on null hypothesis significance testing and p values to draw inferences from data. However, the statistical community has long known that p values overestimate the evidence against H0 (Berger & Delampady, 1987; Wagenmakers, 2007; Nuzzo, 2014). Just recently, the American Statistical Association released a statement drawing attention to this fact (Wasserstein & Lazar, 2016); that is, in addition to it being easy to obtain p < .05 (Simmons, Nelson, & Simonsohn, 2011), it is also quite a weak standard of evidence overall.

The last point is quite pertinent because the statement that 39% of replications in the reproducibility project were “successful” is misleading. A recent Bayesian reanalysis concluded that the original studies themselves found weak evidence in support of an effect (Etz & Vandekerckhove, 2016), reinforcing all points I have made so far.

Notwithstanding the above, p < .05 is still the gold standard in psychology, and is so for intricate historical reasons (cf., Gigerenzer, 1993). At JEPS, we certainly do not want to echo calls nor actions to ban p values (Trafimow & Marks, 2015), but we urge students and their instructors to bring more nuance to their use (cf., Gigerenzer, 2004).

Procedures based on classical statistics provide different answers from what most researchers and students expect (Oakes, 1986; Haller & Krauss; 2002; Hoekstra et al., 2014). To be sure, p values have their place in model checking (e.g., Gelman, 2006—are the data consistent with the null hypothesis?), but they are poorly equipped to measure the relative evidence for H1 or H0 brought about by the data; for this, researchers need to use Bayesian inference (Wagenmakers et al., in press). Because university curricula often lag behind current developments, students reading this are encouraged to advance their methodological toolbox by browsing through Etz et al. (submitted) and playing with JASP[5].

Teaching the exciting history of statistics (cf. Gigerenzer et al., 1989; McGrayne, 2012), or at least contextualizing the developments of currently dominating statistical ideas, is a first step away from their cookbook oriented application.

Registered reports to the rescue

While we can only point to the latter, statistical issue, we can actually eradicate the issue of publication bias and the garden of forking paths by introducing a new publishing format called Registered Reports. This format was initially introduced to the journal Cortex by Chris Chambers (Chambers, 2013), and it is now offered by more than two dozen journals in the fields of psychology, neuroscience, psychiatry, and medicine (link). Recently, we have also introduced this publishing format at JEPS (see King et al., 2016).

Specifically, researchers submit a document including the introduction, theoretical motivation, experimental design, data preprocessing steps (e.g., outlier removal criteria), and the planned statistical analyses prior to data collection. Peer review only focuses on the merit of the proposed study and the adequacy of the statistical analyses[5]. If there is sufficient merit to the planned study, the authors are guaranteed in-principle acceptance (Nosek & Lakens, 2014). Upon receiving this acceptance, researchers subsequently carry out the experiment, and submit the final manuscript. Deviations from the first submissions must be discussed, and additional statistical analyses are labeled exploratory.

In sum, by publishing regardless of the outcome of the statistical analysis, registered reports eliminate publication bias; by specifying the hypotheses and analysis plan beforehand, they make apparent the distinction between exploratory and confirmatory studies (de Groot 1954/2014), avoid the garden of forking paths (Gelman & Loken, 2014), and guard against post-hoc theorizing (Kerr, 1998).

Even though registered reports are commonly associated with high power (80-95%), this is unfeasible for student research. However, note that a single study cannot be decisive in any case. Reporting sound, hypothesis-driven, not-cherry-picked research can be important fuel for future meta-analysis (for an example, see Scheibehenne, Jamil, & Wagenmakers, in press).

To avoid possible confusion, note that preregistration is different from Registered Reports: The former is the act of specifying the methodology before data collection, while the latter is a publishing format. You can preregister your study on several platforms such as the Open Science Framework or AsPredicted. Registered reports include preregistration but go further and have the additional benefits such as peer review prior to data collection and in-principle acceptance.

Conclusion

In sum, there are several issues impeding progress in psychological science, most pressingly the failure to distinguish between exploratory and confirmatory research, and publication bias. A new publishing format, Registered Reports, provides a powerful means to address them both, and, to borrow a phrase from Daniel Lakens, enable us to “sail away from the seas of chaos into a corridor of stability” (Lakens & Evers, 2014).

Suggested Readings

  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
  • Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632-638.
  • Gelman, A., & Loken, E. (2014). The Statistical Crisis in Science. American Scientist, 102(6), 460-465.
  • King, M., Dablander, F., Jakob, L., Agan, M., Huber, F., Haslbeck, J., & Brecht, K. (2016). Registered Reports for Student Research. Journal of European Psychology Students, 7(1), 20-23
  • Twitter (or you might miss out)

Footnotes

[1] Incidentally, Diederik Stapel published a book about his fraud. See here for more.

[2] Baumeister (2016) is a perfect example of how not to respond to such a result. Michael Inzlicht shows how to respond adequately here.

[3] For a discussion of these issues with respect to the precognition meta-analysis, see Lakens (2015) and Gelman (2014).

[4] Another related, crucial point is the lack of theory in psychology. However, as this depends on whether you read the Journal of Mathematical Psychology or, say, Psychological Science, it is not addressed further. For more on this point, see for example Meehl (1978), Gigerenzer (1998), and a class by Paul Meehl which has been kindly converted to mp3 by Uri Simonsohn.

[5] However, it would be premature to put too much blame on p. More pressingly, the misunderstandings and misuse of this little fellow point towards a catastrophic failure in undergraduate teaching of statistics and methods classes (for the latter, see Richard Morey’s recent blog post). Statistics classes in psychology are often boringly cookbook oriented, and so students just learn the cookbook. If you are an instructor, I urge you to have a look at “Statistical Rethinking” by Richard McElreath. In general, however, statistics is hard, and there are many issues transcending the frequentist versus Bayesian debate (for examples, see Judd, Westfall, and Kenny, 2012; Westfall & Yarkoni, 2016).

[6] Note that JEPS already publishes research regardless of whether p < .05. However, this does not discourage us from drawing attention to this benefit of Registered Reports, especially because most other journals have a different policy.

This post was edited by Altan Orhon.

Facebooktwitterrss

Bayesian Statistics: Why and How

bayes_hot_scaled

Bayesian statistics is what all the cool kids are talking about these days. Upon closer inspection, this does not come as a surprise. In contrast to classical statistics, Bayesian inference is principled, coherent, unbiased, and addresses an important question in science: in which of my hypothesis should I believe in, and how strongly, given the collected data?  (more…)

Facebooktwitterrss

Crowdsource your research with style

Would you like to collect data quick and efficiently? Would you like to have a sample that generalizes beyond western, educated, industrialized, rich and democratic participants? While you acknowledge social media as a powerful means to distribute your studies, you feel that there must be a “better way”? Then this practical introduction to crowdsourcing is exactly what you need. I will show you how to use Crowdflower, a crowdsourcing platform to attract participants from all over the world to take part in your experiments. However, before we get too excited, let’s quickly go through the relevant terminology. (more…)

Facebooktwitterrss